Outils pour utilisateurs

Outils du site


public:perseus

Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

Les deux révisions précédentes Révision précédente
Prochaine révision
Révision précédente
public:perseus [2017/05/12 08:09]
benedicte.pincemin@ens-lyon.fr
public:perseus [2017/12/01 17:54]
benedicte.pincemin@ens-lyon.fr
Ligne 1: Ligne 1:
-This page is dedicated to project ​using TXM on texts taken from the Perseus Digital Library :+This page is dedicated to projects ​using TXM on texts taken from the Perseus Digital Library :
   * [[http://​www.perseus.tufts.edu/​hopper|Perseus Digital Library]]   * [[http://​www.perseus.tufts.edu/​hopper|Perseus Digital Library]]
     * XML edition (Github)     * XML edition (Github)
   * [[ https://​perseusdl.github.io/​treebank_data|The Ancient Greek and Latin Dependency Treebank]] (Github)   * [[ https://​perseusdl.github.io/​treebank_data|The Ancient Greek and Latin Dependency Treebank]] (Github)
  
-Please ​take care that this is a public page.+Please ​note that this is a public page.
  
 Anybody who has subscribed to txm-users mailing list can edit this page. Anybody who has subscribed to txm-users mailing list can edit this page.
  
-====== ​CICERO corpus : demontration of Perseus Latin texts in TXM ======+====== ​Projects ​======
  
-===== Project presentation ===== +  ​* [[public:perseus_201707_plato|July 2017, 29 greek texts from Plato.]] Context ​paper submitted to [[https://chs.harvard.edu/CHS/article/display/1167?​menuId=66|Classics@]]. 
- +  * [[public:perseus_201705_cicero|May 201729 latin texts from Cicero.]] Context ​Conference [[http://​www.altphil.uni-freiburg.de/texte-messen/digital-classics-iii-2013-re-thinking-text-analysis|Digital Classics III – Re-thinking Text Analysis]], Concluding conference on the project ​//Der digital turn in den AltertumswissenschaftenWahrnehmung ​Dokumentation ​Reflexion//, Heidelberg, May 11–132017
-  ​context : Heidelberg, May 2017 : [[http://www.altphil.uni-freiburg.de/​texte-messen/​digital-classics-iii-2013-re-thinking-text-analysis]] +  * [[public:perseus_agdt_201705_plato|May 2017, greek annotated text from Plato (AGDT2).]] Context ​Conference [[http://​www.altphil.uni-freiburg.de/texte-messen/digital-classics-iii-2013-re-thinking-text-analysis|Digital Classics III – Re-thinking Text Analysis]], Concluding conference on the project ​//Der digital turn in den AltertumswissenschaftenWahrnehmung ​Dokumentation ​Reflexion//, HeidelbergMay 11–13, 2017. 
- +  * [[public:perseus_201212_plautus|December 201220 latin plays from Plautus.]] Context ​presentation at the [[http://www.dh.uni-leipzig.de/wo/e-humanities-seminar/​|University of Leipzig eHumanities Seminar]] ​on December 5th, 2012.
-  * objectif : +
-    * demonstrating that one can work on texts available from Perseus project in TXM +
-    * TEI compliant import +
-    * if possible, nice editions (could be shown through another corpus) +
- +
-  * corpus +
-    * Cicero'​s texts, latin edition : a copy is here : [[https://sharedocs.huma-num.fr/#/948/3789/Projets/​Textom%C3%A9trie/​Corpus/​src/​perseus/​Cicero/​170502latin]] +
-      * we get all files ending with _lat, except cic.pet_lat.xml because it's a text from Q. Tullius Cicero instead of M. Tullius Cicero. +
- +
-  * Available ressources (approximate list) +
-    * txm-filter-perseus-tei-xtz.xsl +
-      * p4 to p5 conversion +
-      * management of numbered div div1div2 +
-      * management of nested <​text>​ : when <​group>​ then includes <​subtext>​ instead of <​text>​ +
-        * teiheader-to-metadata.xsl (?) : gets information from teiHeader and adds them as attribute to <​text>​ element. +
-    * a useful macro : text2metadata à vérifier(to be checked) : generates a metadata.csv from the XML-TXM files of a corpus +
- +
-===== Specifications ===== +
- +
-Conversion from TEI P4 to TEI P5 (Sebastian Ratz stylesheet). +
- +
-Metadata : from <​teiHeader><​fileDesc><​titleStmt>,​ get +
-  * first <​title>​ content, +
-  * first <​author>​ content, +
-  * first <​editor>​ content. +
- +
-Manage XML-TEI features which wouldn'​t work with CQP : +
-  * div1, div2 -> div +
-  * <​text><​group><​text>​ -> <​text><​group><​textgroupitem>​ (or other better tag name) +
- +
-Distribute <​milestone>​ attributes'​ information on word tokens (when available). +
- +
-Get page number when available, put it as an @n attibute on <pb> element so thant TXM can use it to number pages in HTML Edition. +
- +
-Render foreign words (tagged with <​foreign>​ element) and titles (<​title>​ elements content) as italics. +
- +
-===== Solution ===== +
- +
-Make a directory (e.g. "​cicero"​). +
- +
-This directory includes : +
-  * a copy of every XML file for latin texts of Cicero ​downloaded from Perseus DL. +
-  * a directory named "​xsl",​ which includes ​: +
-    * a directory named "​2-front",​ which includes : +
-      * p4top5.xsl +
-      * txm-front-teiperseus-xtz.xsl +
-    * a directory named "​3-posttok",​ which includes : +
-      * txm-posttok-addRef-perseus.xsl +
- +
-Then run the TXM command File>​Import>​XML-XTZ + CSV with the following settings : +
- +
-1. Source directory is "​cicero"​ (in our example). +
- +
-2. Import parameters : +
-  * Main Language : la (to use Treetagger with Latin parameter if TreeTagger has been setup and associated with TXM) +
-  * Lexical Segmentation : no change - Default settings +
-  * Editions : Build edition, Words per page = 750, Page break tag = pb +
-  * Display font : default setting (Font name = <​default>​) +
-  * Commands : Concordance context structure limits = text +
-  * Textual planes : +
-    * Outside-text = teiHeader,​front,​back +
-    * Outside-text to edit = bibl +
-    * Note elements = note +
-    * Milestone elements = [nothing, leave blank] +
-    * Options : default (= remove temporary directories) +
- +
-3. Click on "Start corpus import"​ (above - beginning of the page) +
- +
- +
-Another import can be done, adding a metadata.csv file in order to get more metadata than only the ones automatically extracted from teiHeader (title, first author, first editor). +
- +
-===== Feedback ===== +
- +
-Some features of XML-XTZ import have not been implemented yet, especially @rend attribute seems is not used to interpret <​emph>​ and <hi> elements. So, through the front XSL (import step #2), we have changed some <hi> into <​emph>​ for cases for which we wanted italics in HTML edition. +
- +
-<​note>​ content looses all its markup, this is really a drawback as tagged foreign words and italics are very often use in notes. +
- +
-===== XSL Perseus stylesheets used for this import ===== +
- +
-==== txm-front-teiperseus-xtz.xsl ==== +
- +
-<​code>​ +
-<?xml version="​1.0"?>​ +
-<​xsl:​stylesheet +
-  xmlns:​xd="​http://www.pnp-software.com/​XSLTdoc"​ +
-  xmlns:​edate="​http://​exslt.org/​dates-and-times"​ +
-  xmlns:​xsl="​http://​www.w3.org/1999/​XSL/​Transform"​ xmlns:​tei="​http://​www.tei-c.org/ns/​1.0"​ +
-  exclude-result-prefixes="​tei edate xd" version="​2.0">​ +
-   +
-  <xd:doc type="​stylesheet">​ +
-    <​xd:​short>​ +
-      A stylesheet to prepare PERSEUS XML-TEI texts to TXM import. +
-    </​xd:​short>​ +
-    <​xd:​detail>​ +
-      This stylesheet is free software; you can redistribute it and/or +
-      modify it under the terms of the GNU Lesser General Public +
-      License as published by the Free Software Foundation; either +
-      version 3 of the License, or (at your option) any later version. +
-       +
-      This stylesheet is distributed in the hope that it will be useful, +
-      but WITHOUT ANY WARRANTY; without even the implied warranty of +
-      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ​ See the GNU +
-      Lesser General Public License for more details. +
-       +
-      You should have received a copy of GNU Lesser Public License with +
-      this stylesheet. If not, see http://​www.gnu.org/​licenses/​lgpl.html +
-    </​xd:​detail>​ +
-    <​xd:​author>​Alexei Lavrentiev alexei.lavrentev@ens-lyon.fr</​xd:​author>​ +
-    <​xd:​copyright>​2017,​ CNRS / IHRIM (Groupe CACTUS)</​xd:​copyright>​ +
-  </​xd:​doc>​ +
-   +
- +
-  <​xsl:​output method="​xml"​ encoding="​utf-8" omit-xml-declaration="​no"/>​ +
-   +
-  <​xsl:​template match="​node()|@*">​ +
-    <!-- Copy the current node --> +
-    <​xsl:​copy>​ +
-      <!-- Including any attributes it has and any child nodes --> +
-      <​xsl:​apply-templates select="​@*|node()"/>​ +
-    </​xsl:​copy>​ +
-  </​xsl:​template>​ +
-   +
-<!-- This template had better be commented if one uses a metadata file with the same information : --> +
-  <​xsl:​template match="/​tei:​TEI/​tei:​text">​ +
-    <​xsl:​copy>​ +
-      <​xsl:​copy-of select="​@*"/>​ +
-      <​xsl:​attribute name="​author"><​xsl:​value-of select="//​tei:​teiHeader/​tei:​fileDesc/​tei:​titleStmt/​tei:​author[1]"/></​xsl:​attribute>​ +
-      <​xsl:​attribute name="​title"><​xsl:​value-of select="//​tei:​teiHeader/​tei:​fileDesc/​tei:​titleStmt/​tei:​title[1]"/></xsl:attribute>​ +
-      <​xsl:​attribute name="​editor"><​xsl:​value-of select="//​tei:​teiHeader/​tei:​fileDesc/​tei:​titleStmt/​tei:​editor[1]"/></​xsl:​attribute>​ +
-      <​xsl:​apply-templates/+
-    </xsl:​copy>​ +
-  </​xsl:​template>​ +
- +
-<​xsl:​template match="​tei:​group/​tei:​text">​ +
-  <​xsl:​element name="​subtext">​ +
-    <​xsl:​apply-templates select="​@*|node()"/>​ +
-  </​xsl:​element>​ +
-</​xsl:​template>​ +
-   +
-  <​xsl:​template match="​tei:​pb">​ +
-    <​xsl:​copy>​ +
-      <​xsl:​attribute name="​n">​ +
-        <​xsl:​choose>​ +
-          <​xsl:​when test="​@n"><​xsl:​value-of select="​@n"/></​xsl:​when>​ +
-          <​xsl:​when test="​@*:​id">​ +
-            <​xsl:​value-of select="​replace(@*:​id,'​^p\.'​,''​)"/>​ +
-          </​xsl:​when>​ +
-          <​xsl:​otherwise><​xsl:​text>​[s.n.]</​xsl:​text></​xsl:​otherwise>​ +
-        </​xsl:​choose>​ +
-      </​xsl:​attribute>​ +
-    </​xsl:​copy>​ +
-  ​</​xsl:​template>​ +
- +
-<​xsl:​template match="​tei:​div1|tei:​div2|tei:​div3|tei:​div4|tei:​div5|tei:​div6|tei:​div7">​ +
-  <​xsl:​element name="​div"​ namespace="​http://​www.tei-c.org/​ns/​1.0">​ +
-    <​xsl:​apply-templates select="​@*|node()"/>​ +
-  </xsl:element>​ +
-</​xsl:​template>​ +
- +
-<​xsl:​template match="​tei:​choice">​ +
-  <​xsl:​apply-templates select="​tei:​expan|tei:​corr|tei:​reg"/>​ +
-</​xsl:​template>​ +
- +
-<​xsl:​template match="​tei:​choice/​tei:​expan">​ +
-  <w xmlns="​http://​www.tei-c.org/​ns/​1.0">​ +
-    <​xsl:​attribute name="​abbr"><​xsl:​value-of select="​normalize-space(parent::​tei:​choice/​tei:​abbr)"/></​xsl:​attribute>​ +
-    <​xsl:​apply-templates select="​@*|node()"/>​ +
-  </​w>​ +
-</​xsl:​template>​ +
-   +
-  <​xsl:​template match="​tei:​choice/​tei:​corr">​ +
-    <​xsl:​copy>​ +
-      <​xsl:​attribute name="​sic"><​xsl:​value-of select="​normalize-space(parent::​tei:​choice/​tei:​sic)"/></​xsl:​attribute>​ +
-      <​xsl:​apply-templates select="​@*|node()"/>​ +
-    </​xsl:​copy>​ +
-  </​xsl:​template>​ +
-   +
-  <​xsl:​template match="​tei:​choice/​tei:​reg">​ +
-    <​xsl:​copy>​ +
-      <​xsl:​attribute name="​orig"><​xsl:​value-of select="​normalize-space(parent::​tei:​choice/​tei:​orig)"/></​xsl:​attribute>​ +
-      <​xsl:​apply-templates select="​@*|node()"/>​ +
-    </​xsl:​copy>​ +
-  </​xsl:​template>​ +
- +
-<!-- Temporary patch for TXM indexing quote elements in notes --> +
- +
-  <​xsl:​template match="​tei:​note//​tei:​quote">​ +
-    <​quote-note>​ +
-      <​xsl:​apply-templates select="​@*|node()"/>​ +
-    </​quote-note>​ +
-  </​xsl:​template>​ +
- +
-<!--  +
-(i) adding an <​emph>​ element in order to point out some elements'​ content (e.g. foreign, title) in TXM edition ; +
-(ii) adding a <w> element to prevent tokenisation from analysing some content (e.g. foreign)  +
---> +
- +
-<​xsl:​template match="​tei:foreign[not(ancestor::​tei:​note)]">​ +
-<emph rend="​italic"​ xmlns="​http://www.tei-c.org/​ns/​1.0">​ +
-  <​xsl:​copy>​ +
-    <w xmlns="​http://​www.tei-c.org/ns/​1.0"> ​  +
-    <​xsl:​apply-templates select="​@*|node()"​/+
-    </​w> ​  +
-  </​xsl:​copy>​ +
-</​emph>​ +
-</​xsl:​template>​ +
- +
-<​xsl:​template match="​tei:​title">​ +
-<emph rend="​italic"​ xmlns="​http://​www.tei-c.org/​ns/​1.0">​ +
-  <​xsl:​copy>​ +
-    <​xsl:​apply-templates select="​@*|node()"/>​ +
-  </​xsl:​copy>​ +
-</​emph>​ +
-</​xsl:​template>​ +
- +
-<!-- Temporary patch to get the correct rendering for <hi @rend="​italic">​ content in TXM editions : must use <​emph>​ instead of <​hi> ​--+
- +
-<​xsl:​template match="​tei:​hi[matches(@rend,'​italic'​)]"​ priority="​1">​ +
-  <​xsl:​element name="​emph"​ namespace="​http://​www.tei-c.org/​ns/​1.0">​ +
-    <​xsl:​apply-templates select="​@*|node()"/>​ +
-  </​xsl:​element>​ +
-</​xsl:​template>​ +
- +
-</​xsl:​stylesheet>​ +
-</​code>​ +
- +
-==== txm-posttok-addRef-perseus.xsl ==== +
- +
-<​code>​ +
-<?xml version="​1.0"?>​ +
-<​xsl:​stylesheet xmlns:​edate="​http:​//exslt.org/​dates-and-times"​ +
-  xmlns:xsl="​http://​www.w3.org/​1999/​XSL/​Transform"​ xmlns:​tei="​http://​www.tei-c.org/​ns/​1.0"​ +
-  xmlns:​txm="​http://​textometrie.org/​ns/​1.0"​ +
-  exclude-result-prefixes="​tei edate" xpath-default-namespace="​http:​//www.tei-c.org/​ns/​1.0"​ version="​2.0">​ +
- +
-  <!-- +
-This software is dual-licensed:​ +
- +
-1. Distributed under a Creative Commons Attribution-ShareAlike 3.0 +
-Unported License http://​creativecommons.org/​licenses/​by-sa/​3.0/​  +
- +
-2. http://​www.opensource.org/​licenses/​BSD-2-Clause +
-  +
-All rights reserved. +
- +
-Redistribution and use in source and binary formswith or without +
-modificationare permitted provided that the following conditions are +
-met: +
- +
-* Redistributions of source code must retain the above copyright +
-notice, this list of conditions and the following disclaimer. +
- +
-* Redistributions in binary form must reproduce the above copyright +
-notice, this list of conditions and the following disclaimer in the +
-documentation and/or other materials provided with the distribution. +
- +
-This software is provided by the copyright holders and contributors +
-"as is" and any express or implied warranties, including, but not +
-limited to, the implied warranties of merchantability and fitness for +
-a particular purpose are disclaimed. In no event shall the copyright +
-holder or contributors be liable for any direct, indirect, incidental,​ +
-special, exemplary, or consequential damages (including, but not +
-limited to, procurement of substitute goods or services; loss of use, +
-data, or profits; or business interruption) however caused and on any +
-theory of liability, whether in contract, strict liability, or tort +
-(including negligence or otherwise) arising in any way out of the use +
-of this software, even if advised of the possibility of such damage. +
- +
-      +
-This stylesheet adds a ref attribute to w elements that will be used for +
-references in TXM concordances. Can be used with TXM XTZ import module. +
- +
-Written by Alexei Lavrentiev, UMR 5317 IHRIM, 2017 +
-  --> +
- +
- +
-  <​xsl:​output method="​xml"​ encoding="​utf-8"​ omit-xml-declaration="​no"/>​  +
-   +
-   +
-  <!-- General patterns: all elements, attributes, comments and processing instructions are copied --> +
-   +
-  <​xsl:​template match="​*"> ​      +
-        <​xsl:​copy>​ +
-          <​xsl:​apply-templates select="​*|@*|processing-instruction()|comment()|text()"/>​ +
-        </​xsl:​copy> ​    +
-  </​xsl:​template>​ +
-   +
-  <​xsl:​template match="​*"​ mode="​position"><​xsl:​value-of select="​count(preceding-sibling::​*)"/></​xsl:​template>​ +
- +
-  <​xsl:​template match="​@*|comment()|processing-instruction()">​ +
-    <​xsl:​copy/>​ +
-  </​xsl:​template>​ +
-   +
-  <​xsl:​variable name="​filename">​ +
-    <​xsl:​analyze-string select="​document-uri(.)" regex="​^(.*)/​([^/​]+)\.xml$">​ +
-      <​xsl:​matching-substring>​ +
-        <​xsl:​value-of select="​regex-group(2)"/>​ +
-      </​xsl:​matching-substring>​ +
-    </​xsl:​analyze-string>​ +
-  ​</​xsl:​variable>​ +
-   +
-   +
-  <​xsl:​template match="​tei:​w">​ +
-    <​xsl:​variable name="​ref">​ +
-      <​xsl:​choose>​ +
-        <​xsl:​when test="​ancestor::​tei:​text/​@*:​id">​ +
-          <​xsl:​value-of select="​ancestor::​tei:​text[1]/@*:id[1]"/>​ +
-        </xsl:when> +
-        <​xsl:​otherwise>​ +
-          <​xsl:​value-of select="​$filename"/>​ +
-        </​xsl:​otherwise>​ +
-      </​xsl:​choose>​ +
-      <!-- ajout Perseus --> +
-      <xsl:if test="​preceding::​tei:​milestone[@unit='​chapter'​][1][@n]">​ +
-        <​xsl:​text>​c</​xsl:​text>​ +
-        <​xsl:​value-of select="​preceding::​tei:​milestone[@unit='​chapter'​][1]/​@n"/>​ +
-      </xsl:if> +
-      <xsl:if test="​preceding::​tei:​milestone[@unit='​section'​][1][@n]">​ +
-        <xsl:text>, s. </xsl:​text>​ +
-        <​xsl:​value-of select="​preceding::​tei:​milestone[@unit='​section'​][1]/@n"/>​ +
-      </​xsl:​if>​ +
-      <!-- fin ajout Perseus --> +
-       +
-      <xsl:if test="​preceding::​tei:​pb[1]/​@n">​ +
-        <​xsl:​text>,​ p</​xsl:​text>​ +
-        <​xsl:​value-of select="​preceding::​tei:​pb[1]/​@n"/>​ +
-      </​xsl:​if>​ +
-      <xsl:if test="​ancestor::​tei:​p[@n]">​ +
-        <​xsl:​text>,​ § </​xsl:​text>​ +
-        <​xsl:​value-of select="​ancestor::​tei:​p/​@n"/>​ +
-      </​xsl:​if>​ +
-      <​!--<​xsl:​if test="​preceding::​tei:​lb[1]/​@n">​ +
-        <​xsl:​text>,​ l</xsl:​text>​ +
-        <​xsl:​value-of select="​preceding::​tei:​lb[1]/@n"/>​ +
-      </​xsl:​if>​--+
-    </xsl:​variable>​ +
-    <​xsl:​copy>​ +
-      <​xsl:​apply-templates select="​@*"/>​ +
-      <​xsl:​attribute name="​ref"><​xsl:​value-of select="​$ref"/></​xsl:​attribute>​ +
-      <​xsl:​apply-templates select="​*|processing-instruction()|comment()|text()"/>​ +
-    </​xsl:​copy>​ +
-  </​xsl:​template>​ +
- +
-</​xsl:​stylesheet>​ +
-</​code>​ +
- +
-    * txm-filter-perseustreebank-xmlw.xsl +
- +
-====== PLAUTELAT & PLAUTEEN TXM demo ====== +
- +
-===== Goal ===== +
- +
-  * Context is 2012-12-05 ​University of Leipzig eHumanities Seminar +
-  * goal was to demo TXM on Latin and English translations of Plaute'​ plays from Perseus +
- +
-===== Corpus ===== +
- +
-Corpus au Plaute'​s plays in Latin and their translation in English from Perseus. +
- +
-Import parameters (updated from XML/w to XTZ): +
-  * 2-front : +
-    * txm-filter-teiperseus-xmlw.xsl +
-    * txm-filter-teip5-xmlw-preserve.xsl +
-  * lat.par TreeTagger model +
- +
-  * PLAUTELAT: corpus of Plaute'​ Latin plays +
-    * source: [[https://​sharedocs.huma-num.fr/​wl/?​id=qftriVBBeFES4jmt2BIobq1IqtypXGnK|davs://​sharedocs.huma-num.fr/​dav.php/​@Shares/​(948)%20Cactus/​(3792)%20Cactus/​Projets/​Textométrie/​Corpus/​src/​plautelat-src.zip]] +
-    * binary: [[https://​sharedocs.huma-num.fr/​wl/?​id=eOLdijlvM50Qep1BQTz7UICvYHS3bPDq|davs://​sharedocs.huma-num.fr/​dav.php/​@Shares/​(948)%20Cactus/​(3792)%20Cactus/​Projets/​Textométrie/​Corpus/​bin/​PLAUTELAT.txm]] +
-  * PLAUTEEN: corpus of Plaute'​ English translation of plays +
-    * todo +
- +
----- +
--> [[:|Retour à la liste des projets]].+
  
public/perseus.txt · Dernière modification: 2017/12/01 17:54 par benedicte.pincemin@ens-lyon.fr