PLATO corpus : demontration of Perseus Greek & Treebank texts (AGDT 2) in TXM

Project presentation

  • goal :
    • demonstrating that one can work on texts available from Perseus project in TXM
    • TEI compliant import
    • compatibility of TXM with greek language
    • showing that TXM can work on the POS annotation provided by the Treebank (TreeTagger is not the only way to get tagged texts in TXM).
  • corpus
    • Plato's text Euthyphro from AGDT 2: tlg0059.tlg001.perseus-grc1.tb.xml
  • Available ressources (approximate list)
    • txm-filter-perseustreebank-xmlw.xsl

Solution

Make a directory (e.g. “plato”), and put inside the XML text file(s) downloaded from Perseus AGDT.

Then run the TXM command File>Import>XML/w + CSV with the following settings :

1. Source directory is “plato” (in our example).

2. Import parameters :

  • Main Language : untick “Annotate the corpus” (means : don't use TreeTagger)
  • Lexical Segmentation : no change - Default settings
  • Front XSL : indicate the copy of txm-filter-perseustreebank-xmlw.xsl in your file system
  • Editions : default setting (Build edition, Words per page = 500, Page break tag = pb)
  • Display font : default setting (Font name = <default>)
  • Commands : default setting (Concordance context structure limits = text)

3. Click on “Start corpus import” (above - beginning of the page)

Feedback

We made 2 changes in the stylesheet :

  • a correction : rename Perseus @id attribute on <w> words for compatibility with TXM
  • an improvement : add <lb/> elements after each sentence for better rendering in HTML Edition.

XSL Perseus stylesheet used for this import

txm-filter-perseustreebank-xmlw.xsl

<?xml version="1.0"?>
<xsl:stylesheet
  xmlns:xd="http://www.pnp-software.com/XSLTdoc"
  xmlns:edate="http://exslt.org/dates-and-times"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns:treebank="http://nlp.perseus.tufts.edu/syntax/treebank/1.5"
  exclude-result-prefixes="edate xd xsi treebank" version="2.0">
 
 
  <xd:doc type="stylesheet">
    <xd:short>
      A stylesheet to prepare PERSEUS Treebank XML texts to TXM XML/w import.
    </xd:short>
    <xd:detail>
      This stylesheet is free software; you can redistribute it and/or
      modify it under the terms of the GNU Lesser General Public
      License as published by the Free Software Foundation; either
      version 3 of the License, or (at your option) any later version.
 
      This stylesheet is distributed in the hope that it will be useful,
      but WITHOUT ANY WARRANTY; without even the implied warranty of
      MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
      Lesser General Public License for more details.
 
      You should have received a copy of GNU Lesser Public License with
      this stylesheet. If not, see http://www.gnu.org/licenses/lgpl.html
    </xd:detail>
    <xd:author>Alexei Lavrentiev alexei.lavrentev@ens-lyon.fr</xd:author>
    <xd:copyright>2012, CNRS / ICAR (ICAR3 LinCoBaTO)</xd:copyright>
  </xd:doc>
 
 
  <xsl:output method="xml" encoding="utf-8" omit-xml-declaration="no"/>
 
  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates select="*|@*|processing-instruction()|comment()|text()"/>	
    </xsl:copy>
  </xsl:template>
 
  <xsl:template match="@*|comment()">
    <xsl:copy/>
  </xsl:template>
 
  <xsl:template match="processing-instruction()"/>
 
  <xsl:template match="text()"><xsl:value-of select="."/></xsl:template>
 
<xsl:template match="treebank">
  <text type="treebank" version="{@version}" date="{normalize-space(child::date[1])}" annotator-short="{normalize-space(child::annotator[1]/short)}" annotator-name="{normalize-space(child::annotator[1]/name)}" annotator-address="{normalize-space(child::annotator[1]/address)}">
    <xsl:apply-templates select="descendant::sentence"/>
  </text>
</xsl:template>
 
<xsl:template match="annotator"/>
 
<xsl:template match="sentence">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:attribute name="annotator"><xsl:value-of select="child::annotator"/></xsl:attribute>
    <xsl:apply-templates/>
  </xsl:copy>
  <lb/>
</xsl:template>
 
  <xsl:template match="word">
    <w>
      <xsl:apply-templates select="@*[not(name()='form')]"/>
      <xsl:value-of select="@form"></xsl:value-of>
    </w>
  </xsl:template>
 
<xsl:template match="word/@id">
	<xsl:attribute name="perseus-id"><xsl:value-of select="."/></xsl:attribute>
 
</xsl:template>
</xsl:stylesheet>

>>> Back to TXM Perseus Projects main page

public/perseus_agdt_201705_plato.txt · Dernière modification: 2017/12/01 17:53 par benedicte.pincemin@ens-lyon.fr