Différences

Cette page vous donne les différences entre la révision choisie et la version actuelle de la page.

public:perseus [2017/12/01 17:53]
benedicte.pincemin@ens-lyon.fr
public:perseus [2017/12/01 17:54] (version actuelle)
benedicte.pincemin@ens-lyon.fr
Ligne 15: Ligne 15:
  * [[public:perseus_201212_plautus|December 2012, 20 latin plays from Plautus.]] Context : presentation at the [[http://www.dh.uni-leipzig.de/wo/e-humanities-seminar/|University of Leipzig eHumanities Seminar]] on December 5th, 2012.   * [[public:perseus_201212_plautus|December 2012, 20 latin plays from Plautus.]] Context : presentation at the [[http://www.dh.uni-leipzig.de/wo/e-humanities-seminar/|University of Leipzig eHumanities Seminar]] on December 5th, 2012.
-====== CICERO corpus : demontration of Perseus Latin texts in TXM ====== 
- 
-**[[public:perseus|>>> Back to TXM Perseus Projects main page]]** 
- 
-===== Project presentation ===== 
- 
-  * context : Heidelberg, May 2017 : [[http://www.altphil.uni-freiburg.de/texte-messen/digital-classics-iii-2013-re-thinking-text-analysis]] 
- 
-  * goal : 
-    * demonstrating that one can work on texts available from Perseus project in TXM 
-    * TEI compliant import 
-    * if possible, nice editions (could be shown through another corpus) 
- 
-  * corpus 
-    * Cicero's texts, latin edition : a copy is here : [[https://sharedocs.huma-num.fr/#/948/3789/Projets/Textom%C3%A9trie/Corpus/src/perseus/Cicero/170502latin]] 
-      * we get all files ending with _lat, except cic.pet_lat.xml because it's a text from Q. Tullius Cicero instead of M. Tullius Cicero. 
- 
-  * Available ressources (approximate list) 
-    * p4top5.xsl 
-      * TEI P4 to P5 conversion 
-    * txm-filter-perseus-tei-xtz.xsl 
-      * management of numbered div: div1, div2 
-      * management of nested <text>: when <group> then includes <subtext> instead of <text> 
-    * teiheader-to-metadata.xsl: gets information from teiHeader and adds them as attribute to <text> element. 
-    * a useful macro : text2metadata: generates a metadata.csv from the XML-TXM files of a corpus. Must be used before starting import process. 
- 
-===== Specifications ===== 
- 
-Conversion from TEI P4 to TEI P5 (Sebastian Ratz stylesheet). 
- 
-Metadata : from <teiHeader><fileDesc><titleStmt>, get 
-  * first <title> content, 
-  * first <author> content, 
-  * first <editor> content. 
- 
-Manage XML-TEI features which wouldn't work with CQP : 
-  * div1, div2 -> div 
-  * <text><group><text> -> <text><group><textgroupitem> (or other better tag name) 
- 
-Distribute <milestone> attributes' information on word tokens (when available). 
- 
-Get page number when available, put it as an @n attibute on <pb> element so that TXM can use it to number pages in HTML Edition. 
- 
-Render foreign words (tagged with <foreign> element) and titles (<title> elements content) as italics. 
- 
-===== Solution ===== 
- 
-Make a directory (e.g. "cicero"). 
- 
-This directory includes : 
-  * a copy of every XML file for latin texts of Cicero downloaded from Perseus DL. 
-  * a directory named "xsl", which includes : 
-    * a subdirectory named "2-front", which includes : 
-      * p4top5.xsl 
-      * txm-front-teiperseus-xtz.xsl 
-    * a subdirectory named "3-posttok", which includes : 
-      * txm-posttok-addRef-perseus.xsl 
- 
-Then run the TXM command File>Import>XML-XTZ + CSV with the following settings : 
- 
-1. Source directory is "cicero" (in our example). 
- 
-2. Import parameters : 
-  * Main Language : la (to use Treetagger with Latin parameter if TreeTagger has been setup and associated with TXM) 
-  * Lexical Segmentation : no change - Default settings 
-  * Editions : Build edition, Words per page = 750, Page break tag = pb 
-  * Display font : default setting (Font name = <default>) 
-  * Commands : Concordance context structure limits = text 
-  * Textual planes : 
-    * Outside-text = teiHeader,front,back 
-    * Outside-text to edit = bibl 
-    * Note elements = note 
-    * Milestone elements = [nothing, leave blank] 
-    * Options : default (= remove temporary directories) 
- 
-3. Click on "Start corpus import" (above - beginning of the page) 
- 
- 
-Another import can be done, adding a metadata.csv file in order to get more metadata than only the ones automatically extracted from teiHeader (title, first author, first editor). 
- 
-===== Feedback ===== 
- 
-Some features of XML-XTZ import have not been implemented yet, especially @rend attribute seems is not used to interpret <emph> and <hi> elements. So, through the front XSL (import step #2), we have changed some <hi> into <emph> for cases for which we wanted italics in HTML edition. 
- 
-<note> content looses all its markup, this is really a drawback as tagged foreign words and italics are very often use in notes. 
- 
-**[[public:perseus|>>> Back to TXM Perseus Projects main page]]** 
- 
-===== XSL Perseus stylesheets used for this import ===== 
- 
-==== txm-front-teiperseus-xtz.xsl ==== 
- 
-<code XML> 
-<?xml version="1.0"?> 
-<xsl:stylesheet 
-  xmlns:xd="http://www.pnp-software.com/XSLTdoc" 
-  xmlns:edate="http://exslt.org/dates-and-times" 
-  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0" 
-  exclude-result-prefixes="tei edate xd" version="2.0"> 
-   
-  <xd:doc type="stylesheet"> 
-    <xd:short> 
-      A stylesheet to prepare PERSEUS XML-TEI texts to TXM import. 
-    </xd:short> 
-    <xd:detail> 
-      This stylesheet is free software; you can redistribute it and/or 
-      modify it under the terms of the GNU Lesser General Public 
-      License as published by the Free Software Foundation; either 
-      version 3 of the License, or (at your option) any later version. 
-       
-      This stylesheet is distributed in the hope that it will be useful, 
-      but WITHOUT ANY WARRANTY; without even the implied warranty of 
-      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU 
-      Lesser General Public License for more details. 
-       
-      You should have received a copy of GNU Lesser Public License with 
-      this stylesheet. If not, see http://www.gnu.org/licenses/lgpl.html 
-    </xd:detail> 
-    <xd:author>Alexei Lavrentiev alexei.lavrentev@ens-lyon.fr</xd:author> 
-    <xd:copyright>2017, CNRS / IHRIM (Groupe CACTUS)</xd:copyright> 
-  </xd:doc> 
-   
- 
-  <xsl:output method="xml" encoding="utf-8" omit-xml-declaration="no"/> 
-   
-  <xsl:template match="node()|@*"> 
-    <!-- Copy the current node --> 
-    <xsl:copy> 
-      <!-- Including any attributes it has and any child nodes --> 
-      <xsl:apply-templates select="@*|node()"/> 
-    </xsl:copy> 
-  </xsl:template> 
-   
-<!-- This template had better be commented if one uses a metadata file with the same information : --> 
-  <xsl:template match="/tei:TEI/tei:text"> 
-    <xsl:copy> 
-      <xsl:copy-of select="@*"/> 
-      <xsl:attribute name="author"><xsl:value-of select="//tei:teiHeader/tei:fileDesc/tei:titleStmt/tei:author[1]"/></xsl:attribute> 
-      <xsl:attribute name="title"><xsl:value-of select="//tei:teiHeader/tei:fileDesc/tei:titleStmt/tei:title[1]"/></xsl:attribute> 
-      <xsl:attribute name="editor"><xsl:value-of select="//tei:teiHeader/tei:fileDesc/tei:titleStmt/tei:editor[1]"/></xsl:attribute> 
-      <xsl:apply-templates/> 
-    </xsl:copy> 
-  </xsl:template> 
- 
-<xsl:template match="tei:group/tei:text"> 
-  <xsl:element name="subtext"> 
-    <xsl:apply-templates select="@*|node()"/> 
-  </xsl:element> 
-</xsl:template> 
-   
-  <xsl:template match="tei:pb"> 
-    <xsl:copy> 
-      <xsl:attribute name="n"> 
-        <xsl:choose> 
-          <xsl:when test="@n"><xsl:value-of select="@n"/></xsl:when> 
-          <xsl:when test="@*:id"> 
-            <xsl:value-of select="replace(@*:id,'^p\.','')"/> 
-          </xsl:when> 
-          <xsl:otherwise><xsl:text>[s.n.]</xsl:text></xsl:otherwise> 
-        </xsl:choose> 
-      </xsl:attribute> 
-    </xsl:copy> 
-  </xsl:template> 
- 
-<xsl:template match="tei:div1|tei:div2|tei:div3|tei:div4|tei:div5|tei:div6|tei:div7"> 
-  <xsl:element name="div" namespace="http://www.tei-c.org/ns/1.0"> 
-    <xsl:apply-templates select="@*|node()"/> 
-  </xsl:element> 
-</xsl:template> 
- 
-<xsl:template match="tei:choice"> 
-  <xsl:apply-templates select="tei:expan|tei:corr|tei:reg"/> 
-</xsl:template> 
- 
-<xsl:template match="tei:choice/tei:expan"> 
-  <w xmlns="http://www.tei-c.org/ns/1.0"> 
-    <xsl:attribute name="abbr"><xsl:value-of select="normalize-space(parent::tei:choice/tei:abbr)"/></xsl:attribute> 
-    <xsl:apply-templates select="@*|node()"/> 
-  </w> 
-</xsl:template> 
-   
-  <xsl:template match="tei:choice/tei:corr"> 
-    <xsl:copy> 
-      <xsl:attribute name="sic"><xsl:value-of select="normalize-space(parent::tei:choice/tei:sic)"/></xsl:attribute> 
-      <xsl:apply-templates select="@*|node()"/> 
-    </xsl:copy> 
-  </xsl:template> 
-   
-  <xsl:template match="tei:choice/tei:reg"> 
-    <xsl:copy> 
-      <xsl:attribute name="orig"><xsl:value-of select="normalize-space(parent::tei:choice/tei:orig)"/></xsl:attribute> 
-      <xsl:apply-templates select="@*|node()"/> 
-    </xsl:copy> 
-  </xsl:template> 
- 
-<!-- Temporary patch for TXM indexing quote elements in notes --> 
- 
-  <xsl:template match="tei:note//tei:quote"> 
-    <quote-note> 
-      <xsl:apply-templates select="@*|node()"/> 
-    </quote-note> 
-  </xsl:template> 
- 
-<!--  
-(i) adding an <emph> element in order to point out some elements' content (e.g. foreign, title) in TXM edition ; 
-(ii) adding a <w> element to prevent tokenisation from analysing some content (e.g. foreign)  
---> 
- 
-<xsl:template match="tei:foreign[not(ancestor::tei:note)]"> 
-<emph rend="italic" xmlns="http://www.tei-c.org/ns/1.0"> 
-  <xsl:copy> 
-    <w xmlns="http://www.tei-c.org/ns/1.0">   
-    <xsl:apply-templates select="@*|node()"/> 
-    </w>   
-  </xsl:copy> 
-</emph> 
-</xsl:template> 
- 
-<xsl:template match="tei:title"> 
-<emph rend="italic" xmlns="http://www.tei-c.org/ns/1.0"> 
-  <xsl:copy> 
-    <xsl:apply-templates select="@*|node()"/> 
-  </xsl:copy> 
-</emph> 
-</xsl:template> 
- 
-<!-- Temporary patch to get the correct rendering for <hi @rend="italic"> content in TXM editions : must use <emph> instead of <hi> --> 
- 
-<xsl:template match="tei:hi[matches(@rend,'italic')]" priority="1"> 
-  <xsl:element name="emph" namespace="http://www.tei-c.org/ns/1.0"> 
-    <xsl:apply-templates select="@*|node()"/> 
-  </xsl:element> 
-</xsl:template> 
- 
-</xsl:stylesheet> 
-</code> 
- 
-==== txm-posttok-addRef-perseus.xsl ==== 
- 
-<code XML> 
-<?xml version="1.0"?> 
-<xsl:stylesheet xmlns:edate="http://exslt.org/dates-and-times" 
-  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0" 
-  xmlns:txm="http://textometrie.org/ns/1.0" 
-  exclude-result-prefixes="tei edate" xpath-default-namespace="http://www.tei-c.org/ns/1.0" version="2.0"> 
- 
-  <!-- 
-This software is dual-licensed: 
- 
-1. Distributed under a Creative Commons Attribution-ShareAlike 3.0 
-Unported License http://creativecommons.org/licenses/by-sa/3.0/  
- 
-2. http://www.opensource.org/licenses/BSD-2-Clause 
-  
-All rights reserved. 
- 
-Redistribution and use in source and binary forms, with or without 
-modification, are permitted provided that the following conditions are 
-met: 
- 
-* Redistributions of source code must retain the above copyright 
-notice, this list of conditions and the following disclaimer. 
- 
-* Redistributions in binary form must reproduce the above copyright 
-notice, this list of conditions and the following disclaimer in the 
-documentation and/or other materials provided with the distribution. 
- 
-This software is provided by the copyright holders and contributors 
-"as is" and any express or implied warranties, including, but not 
-limited to, the implied warranties of merchantability and fitness for 
-a particular purpose are disclaimed. In no event shall the copyright 
-holder or contributors be liable for any direct, indirect, incidental, 
-special, exemplary, or consequential damages (including, but not 
-limited to, procurement of substitute goods or services; loss of use, 
-data, or profits; or business interruption) however caused and on any 
-theory of liability, whether in contract, strict liability, or tort 
-(including negligence or otherwise) arising in any way out of the use 
-of this software, even if advised of the possibility of such damage. 
- 
-      
-This stylesheet adds a ref attribute to w elements that will be used for 
-references in TXM concordances. Can be used with TXM XTZ import module. 
- 
-Written by Alexei Lavrentiev, UMR 5317 IHRIM, 2017 
-  --> 
- 
- 
-  <xsl:output method="xml" encoding="utf-8" omit-xml-declaration="no"/>  
-   
-   
-  <!-- General patterns: all elements, attributes, comments and processing instructions are copied --> 
-   
-  <xsl:template match="*">       
-        <xsl:copy> 
-          <xsl:apply-templates select="*|@*|processing-instruction()|comment()|text()"/> 
-        </xsl:copy>     
-  </xsl:template> 
-   
-  <xsl:template match="*" mode="position"><xsl:value-of select="count(preceding-sibling::*)"/></xsl:template> 
- 
-  <xsl:template match="@*|comment()|processing-instruction()"> 
-    <xsl:copy/> 
-  </xsl:template> 
-   
-  <xsl:variable name="filename"> 
-    <xsl:analyze-string select="document-uri(.)" regex="^(.*)/([^/]+)\.xml$"> 
-      <xsl:matching-substring> 
-        <xsl:value-of select="regex-group(2)"/> 
-      </xsl:matching-substring> 
-    </xsl:analyze-string> 
-  </xsl:variable> 
-   
-   
-  <xsl:template match="tei:w"> 
-    <xsl:variable name="ref"> 
-      <xsl:choose> 
-        <xsl:when test="ancestor::tei:text/@*:id"> 
-          <xsl:value-of select="ancestor::tei:text[1]/@*:id[1]"/> 
-        </xsl:when> 
-        <xsl:otherwise> 
-          <xsl:value-of select="$filename"/> 
-        </xsl:otherwise> 
-      </xsl:choose> 
-      <!-- ajout Perseus --> 
-      <xsl:if test="preceding::tei:milestone[@unit='chapter'][1][@n]"> 
-        <xsl:text>, c. </xsl:text> 
-        <xsl:value-of select="preceding::tei:milestone[@unit='chapter'][1]/@n"/> 
-      </xsl:if> 
-      <xsl:if test="preceding::tei:milestone[@unit='section'][1][@n]"> 
-        <xsl:text>, s. </xsl:text> 
-        <xsl:value-of select="preceding::tei:milestone[@unit='section'][1]/@n"/> 
-      </xsl:if> 
-      <!-- fin ajout Perseus --> 
-       
-      <xsl:if test="preceding::tei:pb[1]/@n"> 
-        <xsl:text>, p. </xsl:text> 
-        <xsl:value-of select="preceding::tei:pb[1]/@n"/> 
-      </xsl:if> 
-      <xsl:if test="ancestor::tei:p[@n]"> 
-        <xsl:text>, § </xsl:text> 
-        <xsl:value-of select="ancestor::tei:p/@n"/> 
-      </xsl:if> 
-      <!--<xsl:if test="preceding::tei:lb[1]/@n"> 
-        <xsl:text>, l. </xsl:text> 
-        <xsl:value-of select="preceding::tei:lb[1]/@n"/> 
-      </xsl:if>--> 
-    </xsl:variable> 
-    <xsl:copy> 
-      <xsl:apply-templates select="@*"/> 
-      <xsl:attribute name="ref"><xsl:value-of select="$ref"/></xsl:attribute> 
-      <xsl:apply-templates select="*|processing-instruction()|comment()|text()"/> 
-    </xsl:copy> 
-  </xsl:template> 
- 
-</xsl:stylesheet> 
-</code> 
- 
-**[[public:perseus|>>> Back to TXM Perseus Projects main page]]** 
public/perseus.txt · Dernière modification: 2017/12/01 17:54 par benedicte.pincemin@ens-lyon.fr