Outils pour utilisateurs

Outils du site


public:perseus

Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

Les deux révisions précédentes Révision précédente
Prochaine révision
Révision précédente
Dernière révision Les deux révisions suivantes
public:perseus [2017/05/03 19:45]
slh@ens-lyon.fr
public:perseus [2017/12/01 17:53]
benedicte.pincemin@ens-lyon.fr
Ligne 1: Ligne 1:
-Cette page sert au suivi de projets mobilisant ​TXM pour analyser des corpus de textes issus de Perseus :+This page is dedicated to projects using TXM on texts taken from the Perseus ​Digital Library ​:
   * [[http://​www.perseus.tufts.edu/​hopper|Perseus Digital Library]]   * [[http://​www.perseus.tufts.edu/​hopper|Perseus Digital Library]]
-    * version ​XML dans Github+    * XML edition (Github)
   * [[ https://​perseusdl.github.io/​treebank_data|The Ancient Greek and Latin Dependency Treebank]] (Github)   * [[ https://​perseusdl.github.io/​treebank_data|The Ancient Greek and Latin Dependency Treebank]] (Github)
  
-Pour chacune de ces sources on dispose déjà de plusieurs feuilles XSL utiles : +Please note that this is a public page.
-  * txm-filter-perseus-tei-xtz.xsl +
-    * conversion p4 à p5 +
-    * div1, div2 +
-    * group -> subtext +
-      * teiheader-to-metadata.xsl (injecte des attributs de text à partir de données du teiHeader : premier auteur, premier titre, premier éditeur) +
-  * feuille pour les références +
-  * txm-filter-perseustreebank-xmlw.xsl+
  
-Il y a des macros utiles : +Anybody who has subscribed to txm-users mailing list can edit this page.
-  * text2metadata à vérifier : produit un metadata.csv à partir des XML-TXM d'un corpus (demander à Matthieu)+
  
-Elles seront disponibles dans SF.+====== Projects ======
  
-Dans la mesure où cette page est actuellement publiqueil est recommandé ​de veiller à maintenir un niveau d'​anonymat raisonnable. (merci de nous contacter si cette page doit évoluer vers plus de confidentialité)+  * [[public:​perseus_201707_plato|July 201729 greek texts from Plato.]] Context : paper submitted to [[https://​chs.harvard.edu/​CHS/​article/​display/​1167?​menuId=66|Classics@]]. 
 +  * [[public:​perseus_201705_cicero|May 2017, 29 latin texts from Cicero.]] Context : Conference [[http://​www.altphil.uni-freiburg.de/​texte-messen/​digital-classics-iii-2013-re-thinking-text-analysis|Digital Classics III – Re-thinking Text Analysis]], Concluding conference on the project //Der digital turn in den Altertumswissenschaften:​ Wahrnehmung - Dokumentation - Reflexion//,​ Heidelberg, May 11–13, 2017. 
 +  * [[public:​perseus_agdt_201705_plato|May 2017, 1 greek annotated text from Plato (AGDT2).]] Context : Conference [[http://​www.altphil.uni-freiburg.de/​texte-messen/​digital-classics-iii-2013-re-thinking-text-analysis|Digital Classics III – Re-thinking Text Analysis]], Concluding conference on the project //Der digital turn in den Altertumswissenschaften:​ Wahrnehmung - Dokumentation - Reflexion//,​ Heidelberg, May 11–13, 2017. 
 +  * [[public:​perseus_201212_plautus|December 2012, 20 latin plays from Plautus.]] Context : presentation at the [[http://​www.dh.uni-leipzig.de/​wo/​e-humanities-seminar/​|University of Leipzig eHumanities Seminar]] on December 5th, 2012.
  
-Pour éditer cette page, il suffit d'​être abonné à la liste de diffusion '​txm-users'​.+====== CICERO corpus : demontration of Perseus Latin texts in TXM ======
  
-====== Projet corpus démo Perseus ​Latin ======+**[[public:​perseus|>>>​ Back to TXM Perseus ​Projects main page]]**
  
-===== Descriptif du projet ​=====+===== Project presentation ​=====
  
   * context : Heidelberg, May 2017 : [[http://​www.altphil.uni-freiburg.de/​texte-messen/​digital-classics-iii-2013-re-thinking-text-analysis]]   * context : Heidelberg, May 2017 : [[http://​www.altphil.uni-freiburg.de/​texte-messen/​digital-classics-iii-2013-re-thinking-text-analysis]]
  
-  * objectif ​:+  * goal :
     * demonstrating that one can work on texts available from Perseus project in TXM     * demonstrating that one can work on texts available from Perseus project in TXM
     * TEI compliant import     * TEI compliant import
Ligne 37: Ligne 32:
       * we get all files ending with _lat, except cic.pet_lat.xml because it's a text from Q. Tullius Cicero instead of M. Tullius Cicero.       * we get all files ending with _lat, except cic.pet_lat.xml because it's a text from Q. Tullius Cicero instead of M. Tullius Cicero.
  
-===== Spécifications ​=====+  * Available ressources (approximate list) 
 +    * p4top5.xsl 
 +      * TEI P4 to P5 conversion 
 +    * txm-filter-perseus-tei-xtz.xsl 
 +      * management of numbered div: div1, div2 
 +      * management of nested <​text>:​ when <​group>​ then includes <​subtext>​ instead of <​text>​ 
 +    * teiheader-to-metadata.xsl:​ gets information from teiHeader and adds them as attribute to <​text>​ element. 
 +    * a useful macro : text2metadata:​ generates a metadata.csv from the XML-TXM files of a corpus. Must be used before starting import process. 
 + 
 +===== Specifications ​===== 
 + 
 +Conversion from TEI P4 to TEI P5 (Sebastian Ratz stylesheet).
  
 Metadata : from <​teiHeader><​fileDesc><​titleStmt>,​ get Metadata : from <​teiHeader><​fileDesc><​titleStmt>,​ get
Ligne 50: Ligne 56:
 Distribute <​milestone>​ attributes'​ information on word tokens (when available). Distribute <​milestone>​ attributes'​ information on word tokens (when available).
  
-===== Recettes ​=====+Get page number when available, put it as an @n attibute on <pb> element so that TXM can use it to number pages in HTML Edition. 
 + 
 +Render foreign words (tagged with <​foreign>​ element) and titles (<​title>​ elements content) as italics. 
 + 
 +===== Solution ​===== 
 + 
 +Make a directory (e.g. "​cicero"​). 
 + 
 +This directory includes : 
 +  * a copy of every XML file for latin texts of Cicero downloaded from Perseus DL. 
 +  * a directory named "​xsl",​ which includes : 
 +    * a subdirectory named "​2-front",​ which includes : 
 +      * p4top5.xsl 
 +      * txm-front-teiperseus-xtz.xsl 
 +    * a subdirectory named "​3-posttok",​ which includes : 
 +      * txm-posttok-addRef-perseus.xsl 
 + 
 +Then run the TXM command File>​Import>​XML-XTZ + CSV with the following settings : 
 + 
 +1. Source directory is "​cicero"​ (in our example). 
 + 
 +2. Import parameters : 
 +  * Main Language : la (to use Treetagger with Latin parameter if TreeTagger has been setup and associated with TXM) 
 +  * Lexical Segmentation : no change - Default settings 
 +  * Editions : Build edition, Words per page = 750, Page break tag = pb 
 +  * Display font : default setting (Font name = <​default>​) 
 +  * Commands : Concordance context structure limits = text 
 +  * Textual planes : 
 +    * Outside-text = teiHeader,​front,​back 
 +    * Outside-text to edit = bibl 
 +    * Note elements = note 
 +    * Milestone elements = [nothing, leave blank] 
 +    * Options : default (= remove temporary directories) 
 + 
 +3. Click on "Start corpus import"​ (above - beginning of the page) 
 + 
 + 
 +Another import can be done, adding a metadata.csv file in order to get more metadata than only the ones automatically extracted from teiHeader (title, first author, first editor). 
 + 
 +===== Feedback ===== 
 + 
 +Some features of XML-XTZ import have not been implemented yet, especially @rend attribute seems is not used to interpret <​emph>​ and <hi> elements. So, through the front XSL (import step #2), we have changed some <hi> into <​emph>​ for cases for which we wanted italics in HTML edition. 
 + 
 +<​note>​ content looses all its markup, this is really a drawback as tagged foreign words and italics are very often use in notes. 
 + 
 +**[[public:​perseus|>>>​ Back to TXM Perseus Projects main page]]** 
 + 
 +===== XSL Perseus stylesheets used for this import ===== 
 + 
 +==== txm-front-teiperseus-xtz.xsl ==== 
 + 
 +<code XML> 
 +<?xml version="​1.0"?>​ 
 +<​xsl:​stylesheet 
 +  xmlns:​xd="​http://​www.pnp-software.com/​XSLTdoc"​ 
 +  xmlns:​edate="​http://​exslt.org/​dates-and-times"​ 
 +  xmlns:​xsl="​http://​www.w3.org/​1999/​XSL/​Transform"​ xmlns:​tei="​http://​www.tei-c.org/​ns/​1.0"​ 
 +  exclude-result-prefixes="​tei edate xd" version="​2.0">​ 
 +   
 +  <xd:doc type="​stylesheet">​ 
 +    <​xd:​short>​ 
 +      A stylesheet to prepare PERSEUS XML-TEI texts to TXM import. 
 +    </​xd:​short>​ 
 +    <​xd:​detail>​ 
 +      This stylesheet is free software; you can redistribute it and/or 
 +      modify it under the terms of the GNU Lesser General Public 
 +      License as published by the Free Software Foundation; either 
 +      version 3 of the License, or (at your option) any later version. 
 +       
 +      This stylesheet is distributed in the hope that it will be useful, 
 +      but WITHOUT ANY WARRANTY; without even the implied warranty of 
 +      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ​ See the GNU 
 +      Lesser General Public License for more details. 
 +       
 +      You should have received a copy of GNU Lesser Public License with 
 +      this stylesheet. If not, see http://​www.gnu.org/​licenses/​lgpl.html 
 +    </​xd:​detail>​ 
 +    <​xd:​author>​Alexei Lavrentiev alexei.lavrentev@ens-lyon.fr</​xd:​author>​ 
 +    <​xd:​copyright>​2017,​ CNRS / IHRIM (Groupe CACTUS)</​xd:​copyright>​ 
 +  </​xd:​doc>​ 
 +   
 + 
 +  <​xsl:​output method="​xml"​ encoding="​utf-8"​ omit-xml-declaration="​no"/>​ 
 +   
 +  <​xsl:​template match="​node()|@*">​ 
 +    <!-- Copy the current node --> 
 +    <​xsl:​copy>​ 
 +      <!-- Including any attributes it has and any child nodes --> 
 +      <​xsl:​apply-templates select="​@*|node()"/>​ 
 +    </​xsl:​copy>​ 
 +  </​xsl:​template>​ 
 +   
 +<!-- This template had better be commented if one uses a metadata file with the same information : --> 
 +  <​xsl:​template match="/​tei:​TEI/​tei:​text">​ 
 +    <​xsl:​copy>​ 
 +      <​xsl:​copy-of select="​@*"/>​ 
 +      <​xsl:​attribute name="​author"><​xsl:​value-of select="//​tei:​teiHeader/​tei:​fileDesc/​tei:​titleStmt/​tei:​author[1]"/></​xsl:​attribute>​ 
 +      <​xsl:​attribute name="​title"><​xsl:​value-of select="//​tei:​teiHeader/​tei:​fileDesc/​tei:​titleStmt/​tei:​title[1]"/></​xsl:​attribute>​ 
 +      <​xsl:​attribute name="​editor"><​xsl:​value-of select="//​tei:​teiHeader/​tei:​fileDesc/​tei:​titleStmt/​tei:​editor[1]"/></​xsl:​attribute>​ 
 +      <​xsl:​apply-templates/>​ 
 +    </​xsl:​copy>​ 
 +  </​xsl:​template>​ 
 + 
 +<​xsl:​template match="​tei:​group/​tei:​text">​ 
 +  <​xsl:​element name="​subtext">​ 
 +    <​xsl:​apply-templates select="​@*|node()"/>​ 
 +  </​xsl:​element>​ 
 +</​xsl:​template>​ 
 +   
 +  <​xsl:​template match="​tei:​pb">​ 
 +    <​xsl:​copy>​ 
 +      <​xsl:​attribute name="​n">​ 
 +        <​xsl:​choose>​ 
 +          <​xsl:​when test="​@n"><​xsl:​value-of select="​@n"/></​xsl:​when>​ 
 +          <​xsl:​when test="​@*:​id">​ 
 +            <​xsl:​value-of select="​replace(@*:​id,'​^p\.',''​)"/>​ 
 +          </​xsl:​when>​ 
 +          <​xsl:​otherwise><​xsl:​text>​[s.n.]</​xsl:​text></​xsl:​otherwise>​ 
 +        </​xsl:​choose>​ 
 +      </​xsl:​attribute>​ 
 +    </​xsl:​copy>​ 
 +  </​xsl:​template>​ 
 + 
 +<​xsl:​template match="​tei:​div1|tei:​div2|tei:​div3|tei:​div4|tei:​div5|tei:​div6|tei:​div7">​ 
 +  <​xsl:​element name="​div"​ namespace="​http://​www.tei-c.org/​ns/​1.0">​ 
 +    <​xsl:​apply-templates select="​@*|node()"/>​ 
 +  </​xsl:​element>​ 
 +</​xsl:​template>​ 
 + 
 +<​xsl:​template match="​tei:​choice">​ 
 +  <​xsl:​apply-templates select="​tei:​expan|tei:​corr|tei:​reg"/>​ 
 +</​xsl:​template>​ 
 + 
 +<​xsl:​template match="​tei:​choice/​tei:​expan">​ 
 +  <w xmlns="​http://​www.tei-c.org/​ns/​1.0">​ 
 +    <​xsl:​attribute name="​abbr"><​xsl:​value-of select="​normalize-space(parent::​tei:​choice/​tei:​abbr)"/></​xsl:​attribute>​ 
 +    <​xsl:​apply-templates select="​@*|node()"/>​ 
 +  </​w>​ 
 +</​xsl:​template>​ 
 +   
 +  <​xsl:​template match="​tei:​choice/​tei:​corr">​ 
 +    <​xsl:​copy>​ 
 +      <​xsl:​attribute name="​sic"><​xsl:​value-of select="​normalize-space(parent::​tei:​choice/​tei:​sic)"/></​xsl:​attribute>​ 
 +      <​xsl:​apply-templates select="​@*|node()"/>​ 
 +    </​xsl:​copy>​ 
 +  </​xsl:​template>​ 
 +   
 +  <​xsl:​template match="​tei:​choice/​tei:​reg">​ 
 +    <​xsl:​copy>​ 
 +      <​xsl:​attribute name="​orig"><​xsl:​value-of select="​normalize-space(parent::​tei:​choice/​tei:​orig)"/></​xsl:​attribute>​ 
 +      <​xsl:​apply-templates select="​@*|node()"/>​ 
 +    </​xsl:​copy>​ 
 +  </​xsl:​template>​ 
 + 
 +<!-- Temporary patch for TXM indexing quote elements in notes --> 
 + 
 +  <​xsl:​template match="​tei:​note//​tei:​quote">​ 
 +    <​quote-note>​ 
 +      <​xsl:​apply-templates select="​@*|node()"/>​ 
 +    </​quote-note>​ 
 +  </​xsl:​template>​ 
 + 
 +<!--  
 +(i) adding an <​emph>​ element in order to point out some elements'​ content (e.g. foreign, title) in TXM edition ; 
 +(ii) adding a <w> element to prevent tokenisation from analysing some content (e.g. foreign)  
 +--> 
 + 
 +<​xsl:​template match="​tei:​foreign[not(ancestor::​tei:​note)]">​ 
 +<emph rend="​italic"​ xmlns="​http://​www.tei-c.org/​ns/​1.0">​ 
 +  <​xsl:​copy>​ 
 +    <w xmlns="​http://​www.tei-c.org/​ns/​1.0"> ​  
 +    <​xsl:​apply-templates select="​@*|node()"/>​ 
 +    </​w> ​  
 +  </​xsl:​copy>​ 
 +</​emph>​ 
 +</​xsl:​template>​ 
 + 
 +<​xsl:​template match="​tei:​title">​ 
 +<emph rend="​italic"​ xmlns="​http://​www.tei-c.org/​ns/​1.0">​ 
 +  <​xsl:​copy>​ 
 +    <​xsl:​apply-templates select="​@*|node()"/>​ 
 +  </​xsl:​copy>​ 
 +</​emph>​ 
 +</​xsl:​template>​ 
 + 
 +<!-- Temporary patch to get the correct rendering for <hi @rend="​italic">​ content in TXM editions : must use <​emph>​ instead of <hi> --> 
 + 
 +<​xsl:​template match="​tei:​hi[matches(@rend,'​italic'​)]"​ priority="​1">​ 
 +  <​xsl:​element name="​emph"​ namespace="​http://​www.tei-c.org/​ns/​1.0">​ 
 +    <​xsl:​apply-templates select="​@*|node()"/>​ 
 +  </​xsl:​element>​ 
 +</​xsl:​template>​ 
 + 
 +</​xsl:​stylesheet>​ 
 +</​code>​ 
 + 
 +==== txm-posttok-addRef-perseus.xsl ==== 
 + 
 +<code XML> 
 +<?xml version="​1.0"?>​ 
 +<​xsl:​stylesheet xmlns:​edate="​http://​exslt.org/​dates-and-times"​ 
 +  xmlns:​xsl="​http://​www.w3.org/​1999/​XSL/​Transform"​ xmlns:​tei="​http://​www.tei-c.org/​ns/​1.0"​ 
 +  xmlns:​txm="​http://​textometrie.org/​ns/​1.0"​ 
 +  exclude-result-prefixes="​tei edate" xpath-default-namespace="​http://​www.tei-c.org/​ns/​1.0"​ version="​2.0">​ 
 + 
 +  <!-- 
 +This software is dual-licensed:​ 
 + 
 +1. Distributed under a Creative Commons Attribution-ShareAlike 3.0 
 +Unported License http://​creativecommons.org/​licenses/​by-sa/​3.0/ ​
  
-À venir.+2. http://​www.opensource.org/​licenses/​BSD-2-Clause 
 +  
 +All rights reserved.
  
-===== Plannification =====+Redistribution and use in source and binary forms, with or without 
 +modification,​ are permitted provided that the following conditions are 
 +met:
  
-==== Étape 1 ====+* Redistributions of source code must retain the above copyright 
 +notice, this list of conditions and the following disclaimer.
  
-==== Étape 2 ====+* Redistributions in binary form must reproduce the above copyright 
 +notice, this list of conditions and the following disclaimer in the 
 +documentation and/or other materials provided with the distribution.
  
-etc.+This software is provided by the copyright holders and contributors 
 +"as is" and any express or implied warranties, including, but not 
 +limited to, the implied warranties of merchantability and fitness for 
 +a particular purpose are disclaimed. In no event shall the copyright 
 +holder or contributors be liable for any direct, indirect, incidental,​ 
 +special, exemplary, or consequential damages (including, but not 
 +limited to, procurement of substitute goods or services; loss of use, 
 +data, or profits; or business interruption) however caused and on any 
 +theory of liability, whether in contract, strict liability, or tort 
 +(including negligence or otherwise) arising in any way out of the use 
 +of this software, even if advised of the possibility of such damage.
  
-====== PLAUTELAT & PLAUTEEN ​TXM demo ======+      
 +This stylesheet adds a ref attribute to w elements that will be used for 
 +references in TXM concordances. Can be used with TXM XTZ import module.
  
-===== Goal =====+Written by Alexei Lavrentiev, UMR 5317 IHRIM, 2017 
 +  -->
  
-  * Context is 2012-12-05 Leipzig eHumanities Seminar 
-  * goal is demo TXM on Latin and English translations texts from Perseus to G. Crane 
  
-===== Corpus =====+  <​xsl:​output method="​xml"​ encoding="​utf-8"​ omit-xml-declaration="​no"/>​  
 +   
 +   
 +  <!-- General patterns: all elements, attributes, comments and processing instructions are copied --> 
 +   
 +  <​xsl:​template match="​*"> ​      
 +        <​xsl:​copy>​ 
 +          <​xsl:​apply-templates select="​*|@*|processing-instruction()|comment()|text()"/>​ 
 +        </​xsl:​copy> ​    
 +  </​xsl:​template>​ 
 +   
 +  <​xsl:​template match="​*"​ mode="​position"><​xsl:​value-of select="​count(preceding-sibling::​*)"/></​xsl:​template>​
  
-  * PLAUTELATcorpus of Plaute'​ Latin plays +  ​<​xsl:​template match="​@*|comment()|processing-instruction()">​ 
-  * PLAUTEENcorpus of Plaute'​ English translation ​of plays +    <xsl:copy/> 
-  * origin: [[http://www.perseus.tufts.edu/|Perseus]]+  ​</​xsl:​template>​ 
 +   
 +  <​xsl:​variable name="​filename">​ 
 +    <​xsl:​analyze-string select="​document-uri(.)"​ regex="​^(.*)/​([^/​]+)\.xml$">​ 
 +      <xsl:matching-substring>​ 
 +        <​xsl:​value-of select="​regex-group(2)"/>​ 
 +      </​xsl:​matching-substring>​ 
 +    </​xsl:​analyze-string>​ 
 +  ​</​xsl:​variable>​ 
 +   
 +   
 +  <​xsl:​template match="​tei:​w">​ 
 +    <​xsl:​variable name="​ref">​ 
 +      <​xsl:​choose>​ 
 +        <​xsl:​when test="​ancestor::​tei:​text/​@*:id">​ 
 +          <​xsl:​value-of select="​ancestor::​tei:​text[1]/@*:id[1]"/>​ 
 +        </xsl:when> 
 +        <​xsl:​otherwise>​ 
 +          <​xsl:​value-of select="​$filename"​/
 +        </xsl:​otherwise>​ 
 +      </​xsl:​choose>​ 
 +      <!-- ajout Perseus --> 
 +      <xsl:if test="​preceding::​tei:​milestone[@unit='​chapter'​][1][@n]">​ 
 +        <​xsl:​text>,​ c</​xsl:​text>​ 
 +        <​xsl:​value-of select="​preceding::​tei:​milestone[@unit='​chapter'​][1]/​@n"/>​ 
 +      </​xsl:​if>​ 
 +      <xsl:if test="​preceding::​tei:​milestone[@unit='​section'​][1][@n]">​ 
 +        <​xsl:​text>,​ s</xsl:​text>​ 
 +        <​xsl:​value-of select="​preceding::​tei:​milestone[@unit='​section'​][1]/​@n"/>​ 
 +      </​xsl:​if>​ 
 +      <!-- fin ajout Perseus ​--> 
 +       
 +      <xsl:if test="​preceding::​tei:​pb[1]/​@n">​ 
 +        <​xsl:​text>,​ p. </​xsl:​text>​ 
 +        <​xsl:​value-of select="​preceding::​tei:​pb[1]/​@n"/>​ 
 +      </​xsl:​if>​ 
 +      <xsl:if test="​ancestor::​tei:​p[@n]">​ 
 +        <​xsl:​text>,​ § </​xsl:​text>​ 
 +        <​xsl:​value-of select="​ancestor::​tei:​p/​@n"/>​ 
 +      </​xsl:​if>​ 
 +      <​!--<​xsl:​if test="​preceding::​tei:​lb[1]/​@n">​ 
 +        <​xsl:​text>,​ l. </​xsl:​text>​ 
 +        <​xsl:​value-of select="​preceding::​tei:​lb[1]/​@n"/>​ 
 +      </​xsl:​if>​-->​ 
 +    </​xsl:​variable>​ 
 +    <​xsl:​copy>​ 
 +      <​xsl:​apply-templates select="​@*"/>​ 
 +      <​xsl:​attribute name="​ref"><​xsl:​value-of select="​$ref"/></​xsl:​attribute>​ 
 +      <​xsl:​apply-templates select="​*|processing-instruction()|comment()|text()"/>​ 
 +    </​xsl:​copy>​ 
 +  </​xsl:​template>​
  
----- +</​xsl:​stylesheet>​ 
--[[:|Retour à la liste des projets]].+</code>
  
 +**[[public:​perseus|>>>​ Back to TXM Perseus Projects main page]]**
public/perseus.txt · Dernière modification: 2017/12/01 17:54 par benedicte.pincemin@ens-lyon.fr