Outils pour utilisateurs

Outils du site


public:perseus_agdt_201705_plato

Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

public:perseus_agdt_201705_plato [2017/12/01 17:53] (Version actuelle)
benedicte.pincemin@ens-lyon.fr créée
Ligne 1: Ligne 1:
 +====== PLATO corpus : demontration of Perseus Greek & Treebank texts (AGDT 2) in TXM ======
  
 +**[[public:​perseus|>>>​ Back to TXM Perseus Projects main page]]**
 +
 +===== Project presentation =====
 +
 +  * context : Heidelberg, May 2017 : [[http://​www.altphil.uni-freiburg.de/​texte-messen/​digital-classics-iii-2013-re-thinking-text-analysis]]
 +
 +  * goal :
 +    * demonstrating that one can work on texts available from Perseus project in TXM
 +    * TEI compliant import
 +    * compatibility of TXM with greek language
 +    * showing that TXM can work on the POS annotation provided by the Treebank (TreeTagger is not the only way to get tagged texts in TXM).
 +
 +  * corpus
 +    * Plato'​s text Euthyphro from [[https://​perseusdl.github.io/​treebank_data/​|AGDT 2]]: tlg0059.tlg001.perseus-grc1.tb.xml
 +
 +  * Available ressources (approximate list)
 +    * txm-filter-perseustreebank-xmlw.xsl
 +
 +===== Solution =====
 +
 +Make a directory (e.g. "​plato"​),​ and put inside the XML text file(s) downloaded from Perseus AGDT.
 +
 +Then run the TXM command File>​Import>​XML/​w + CSV with the following settings :
 +
 +1. Source directory is "​plato"​ (in our example).
 +
 +2. Import parameters :
 +  * Main Language : untick "​Annotate the corpus"​ (means : don't use TreeTagger)
 +  * Lexical Segmentation : no change - Default settings
 +  * Front XSL : indicate the copy of txm-filter-perseustreebank-xmlw.xsl in your file system
 +  * Editions : default setting (Build edition, Words per page = 500, Page break tag = pb)
 +  * Display font : default setting (Font name = <​default>​)
 +  * Commands : default setting (Concordance context structure limits = text)
 +
 +3. Click on "Start corpus import"​ (above - beginning of the page)
 +
 +===== Feedback =====
 +
 +We made 2 changes in the stylesheet :
 +  * a correction : rename Perseus @id attribute on <w> words for compatibility with TXM
 +  * an improvement : add <lb/> elements after each sentence for better rendering in HTML Edition.
 +
 +===== XSL Perseus stylesheet used for this import =====
 +
 +==== txm-filter-perseustreebank-xmlw.xsl ====
 +
 +<code XML>
 +<?xml version="​1.0"?>​
 +<​xsl:​stylesheet
 +  xmlns:​xd="​http://​www.pnp-software.com/​XSLTdoc"​
 +  xmlns:​edate="​http://​exslt.org/​dates-and-times"​
 +  xmlns:​xsl="​http://​www.w3.org/​1999/​XSL/​Transform"​
 +  xmlns:​xsi="​http://​www.w3.org/​2001/​XMLSchema-instance" ​
 +  xmlns:​treebank="​http://​nlp.perseus.tufts.edu/​syntax/​treebank/​1.5"​
 +  exclude-result-prefixes="​edate xd xsi treebank"​ version="​2.0">​
 +  ​
 +  ​
 +  <xd:doc type="​stylesheet">​
 +    <​xd:​short>​
 +      A stylesheet to prepare PERSEUS Treebank XML texts to TXM XML/w import.
 +    </​xd:​short>​
 +    <​xd:​detail>​
 +      This stylesheet is free software; you can redistribute it and/or
 +      modify it under the terms of the GNU Lesser General Public
 +      License as published by the Free Software Foundation; either
 +      version 3 of the License, or (at your option) any later version.
 +      ​
 +      This stylesheet is distributed in the hope that it will be useful,
 +      but WITHOUT ANY WARRANTY; without even the implied warranty of
 +      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ​ See the GNU
 +      Lesser General Public License for more details.
 +      ​
 +      You should have received a copy of GNU Lesser Public License with
 +      this stylesheet. If not, see http://​www.gnu.org/​licenses/​lgpl.html
 +    </​xd:​detail>​
 +    <​xd:​author>​Alexei Lavrentiev alexei.lavrentev@ens-lyon.fr</​xd:​author>​
 +    <​xd:​copyright>​2012,​ CNRS / ICAR (ICAR3 LinCoBaTO)</​xd:​copyright>​
 +  </​xd:​doc>​
 +  ​
 +
 +  <​xsl:​output method="​xml"​ encoding="​utf-8"​ omit-xml-declaration="​no"/>​
 +  ​
 +  <​xsl:​template match="​*">​
 +    <​xsl:​copy>​
 +      <​xsl:​apply-templates select="​*|@*|processing-instruction()|comment()|text()"/>​
 +    </​xsl:​copy>​
 +  </​xsl:​template>​
 +  ​
 +  <​xsl:​template match="​@*|comment()">​
 +    <​xsl:​copy/>​
 +  </​xsl:​template>​
 +  ​
 +  <​xsl:​template match="​processing-instruction()"/>​
 +  ​
 +  <​xsl:​template match="​text()"><​xsl:​value-of select="​."/></​xsl:​template>​
 +  ​
 +<​xsl:​template match="​treebank">​
 +  <text type="​treebank"​ version="​{@version}"​ date="​{normalize-space(child::​date[1])}"​ annotator-short="​{normalize-space(child::​annotator[1]/​short)}"​ annotator-name="​{normalize-space(child::​annotator[1]/​name)}"​ annotator-address="​{normalize-space(child::​annotator[1]/​address)}">​
 +    <​xsl:​apply-templates select="​descendant::​sentence"/>​
 +  </​text>​
 +</​xsl:​template>​
 +
 +<​xsl:​template match="​annotator"/>​
 +  ​
 +<​xsl:​template match="​sentence">​
 +  <​xsl:​copy>​
 +    <​xsl:​apply-templates select="​@*"/>​
 +    <​xsl:​attribute name="​annotator"><​xsl:​value-of select="​child::​annotator"/></​xsl:​attribute>​
 +    <​xsl:​apply-templates/>​
 +  </​xsl:​copy>​
 +  <lb/>
 +</​xsl:​template>​
 +  ​
 +  <​xsl:​template match="​word">​
 +    <w>
 +      <​xsl:​apply-templates select="​@*[not(name()='​form'​)]"/>​
 +      <​xsl:​value-of select="​@form"></​xsl:​value-of>​
 +    </w>
 +  </​xsl:​template>​
 +
 +<​xsl:​template match="​word/​@id">​
 + <​xsl:​attribute name="​perseus-id"><​xsl:​value-of select="​."/></​xsl:​attribute>​
 +
 +</​xsl:​template>​
 +</​xsl:​stylesheet>​
 +</​code>​
 +
 +**[[public:​perseus|>>>​ Back to TXM Perseus Projects main page]]**
public/perseus_agdt_201705_plato.txt · Dernière modification: 2017/12/01 17:53 par benedicte.pincemin@ens-lyon.fr