Différences

Cette page vous donne les différences entre la révision choisie et la version actuelle de la page.

public:perseus_agdt_201705_plato [2017/12/01 17:53] (version actuelle)
benedicte.pincemin@ens-lyon.fr créée
Ligne 1: Ligne 1:
 +====== PLATO corpus : demontration of Perseus Greek & Treebank texts (AGDT 2) in TXM ======
 +**[[public:perseus|>>> Back to TXM Perseus Projects main page]]**
 +
 +===== Project presentation =====
 +
 +  * context : Heidelberg, May 2017 : [[http://www.altphil.uni-freiburg.de/texte-messen/digital-classics-iii-2013-re-thinking-text-analysis]]
 +
 +  * goal :
 +    * demonstrating that one can work on texts available from Perseus project in TXM
 +    * TEI compliant import
 +    * compatibility of TXM with greek language
 +    * showing that TXM can work on the POS annotation provided by the Treebank (TreeTagger is not the only way to get tagged texts in TXM).
 +
 +  * corpus
 +    * Plato's text Euthyphro from [[https://perseusdl.github.io/treebank_data/|AGDT 2]]: tlg0059.tlg001.perseus-grc1.tb.xml
 +
 +  * Available ressources (approximate list)
 +    * txm-filter-perseustreebank-xmlw.xsl
 +
 +===== Solution =====
 +
 +Make a directory (e.g. "plato"), and put inside the XML text file(s) downloaded from Perseus AGDT.
 +
 +Then run the TXM command File>Import>XML/w + CSV with the following settings :
 +
 +1. Source directory is "plato" (in our example).
 +
 +2. Import parameters :
 +  * Main Language : untick "Annotate the corpus" (means : don't use TreeTagger)
 +  * Lexical Segmentation : no change - Default settings
 +  * Front XSL : indicate the copy of txm-filter-perseustreebank-xmlw.xsl in your file system
 +  * Editions : default setting (Build edition, Words per page = 500, Page break tag = pb)
 +  * Display font : default setting (Font name = <default>)
 +  * Commands : default setting (Concordance context structure limits = text)
 +
 +3. Click on "Start corpus import" (above - beginning of the page)
 +
 +===== Feedback =====
 +
 +We made 2 changes in the stylesheet :
 +  * a correction : rename Perseus @id attribute on <w> words for compatibility with TXM
 +  * an improvement : add <lb/> elements after each sentence for better rendering in HTML Edition.
 +
 +===== XSL Perseus stylesheet used for this import =====
 +
 +==== txm-filter-perseustreebank-xmlw.xsl ====
 +
 +<code XML>
 +<?xml version="1.0"?>
 +<xsl:stylesheet
 +  xmlns:xd="http://www.pnp-software.com/XSLTdoc"
 +  xmlns:edate="http://exslt.org/dates-and-times"
 +  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 +  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 +  xmlns:treebank="http://nlp.perseus.tufts.edu/syntax/treebank/1.5"
 +  exclude-result-prefixes="edate xd xsi treebank" version="2.0">
 +  
 +  
 +  <xd:doc type="stylesheet">
 +    <xd:short>
 +      A stylesheet to prepare PERSEUS Treebank XML texts to TXM XML/w import.
 +    </xd:short>
 +    <xd:detail>
 +      This stylesheet is free software; you can redistribute it and/or
 +      modify it under the terms of the GNU Lesser General Public
 +      License as published by the Free Software Foundation; either
 +      version 3 of the License, or (at your option) any later version.
 +      
 +      This stylesheet is distributed in the hope that it will be useful,
 +      but WITHOUT ANY WARRANTY; without even the implied warranty of
 +      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +      Lesser General Public License for more details.
 +      
 +      You should have received a copy of GNU Lesser Public License with
 +      this stylesheet. If not, see http://www.gnu.org/licenses/lgpl.html
 +    </xd:detail>
 +    <xd:author>Alexei Lavrentiev alexei.lavrentev@ens-lyon.fr</xd:author>
 +    <xd:copyright>2012, CNRS / ICAR (ICAR3 LinCoBaTO)</xd:copyright>
 +  </xd:doc>
 +  
 +
 +  <xsl:output method="xml" encoding="utf-8" omit-xml-declaration="no"/>
 +  
 +  <xsl:template match="*">
 +    <xsl:copy>
 +      <xsl:apply-templates select="*|@*|processing-instruction()|comment()|text()"/>
 +    </xsl:copy>
 +  </xsl:template>
 +  
 +  <xsl:template match="@*|comment()">
 +    <xsl:copy/>
 +  </xsl:template>
 +  
 +  <xsl:template match="processing-instruction()"/>
 +  
 +  <xsl:template match="text()"><xsl:value-of select="."/></xsl:template>
 +  
 +<xsl:template match="treebank">
 +  <text type="treebank" version="{@version}" date="{normalize-space(child::date[1])}" annotator-short="{normalize-space(child::annotator[1]/short)}" annotator-name="{normalize-space(child::annotator[1]/name)}" annotator-address="{normalize-space(child::annotator[1]/address)}">
 +    <xsl:apply-templates select="descendant::sentence"/>
 +  </text>
 +</xsl:template>
 +
 +<xsl:template match="annotator"/>
 +  
 +<xsl:template match="sentence">
 +  <xsl:copy>
 +    <xsl:apply-templates select="@*"/>
 +    <xsl:attribute name="annotator"><xsl:value-of select="child::annotator"/></xsl:attribute>
 +    <xsl:apply-templates/>
 +  </xsl:copy>
 +  <lb/>
 +</xsl:template>
 +  
 +  <xsl:template match="word">
 +    <w>
 +      <xsl:apply-templates select="@*[not(name()='form')]"/>
 +      <xsl:value-of select="@form"></xsl:value-of>
 +    </w>
 +  </xsl:template>
 +
 +<xsl:template match="word/@id">
 + <xsl:attribute name="perseus-id"><xsl:value-of select="."/></xsl:attribute>
 +
 +</xsl:template>
 +</xsl:stylesheet>
 +</code>
 +
 +**[[public:perseus|>>> Back to TXM Perseus Projects main page]]**
public/perseus_agdt_201705_plato.txt · Dernière modification: 2017/12/01 17:53 par benedicte.pincemin@ens-lyon.fr