Outils pour utilisateurs

Outils du site


Ceci est une ancienne révision du document !

This page is dedicated to project using TXM on texts taken from the Perseus Digital Library :

Please take care that this is a public page.

Anybody who has subscribed to txm-users mailing list can edit this page.

CICERO corpus : demontration of Perseus Latin texts in TXM

Project presentation

  • objectif :
    • demonstrating that one can work on texts available from Perseus project in TXM
    • TEI compliant import
    • if possible, nice editions (could be shown through another corpus)
  • Available ressources (approximate list)
    • txm-filter-perseus-tei-xtz.xsl
      • p4 to p5 conversion
      • management of numbered div : div1, div2
      • management of nested <text> : when <group> then includes <subtext> instead of <text>
        • teiheader-to-metadata.xsl (?) : gets information from teiHeader and adds them as attribute to <text> element.
    • a useful macro : text2metadata à vérifier(to be checked) : generates a metadata.csv from the XML-TXM files of a corpus


Conversion from TEI P4 to TEI P5 (Sebastian Ratz stylesheet).

Metadata : from <teiHeader><fileDesc><titleStmt>, get

  • first <title> content,
  • first <author> content,
  • first <editor> content.

Manage XML-TEI features which wouldn't work with CQP :

  • div1, div2 → div
  • <text><group><text> → <text><group><textgroupitem> (or other better tag name)

Distribute <milestone> attributes' information on word tokens (when available).

Get page number when available, put it as an @n attibute on <pb> element so thant TXM can use it to number pages in HTML Edition.

Render foreign words (tagged with <foreign> element) and titles (<title> elements content) as italics.


Make a directory (e.g. “cicero”).

This directory includes :

  • a copy of every XML file for latin texts of Cicero downloaded from Perseus DL.
  • a directory named “xsl”, which includes :
    • a directory named “2-front”, which includes :
      • p4top5.xsl
      • txm-front-teiperseus-xtz.xsl
    • a directory named “3-posttok”, which includes :
      • txm-posttok-addRef-perseus.xsl

Then run the TXM command File>Import>XML-XTZ + CSV.

  • Source directory is “cicero” (in our example).
  • Import parameters :


Étape 1

Étape 2


  • txm-filter-perseustreebank-xmlw.xsl



  • Context is 2012-12-05 University of Leipzig eHumanities Seminar
  • goal was to demo TXM on Latin and English translations of Plaute' plays from Perseus


Corpus au Plaute's plays in Latin and their translation in English from Perseus.

Import parameters (updated from XML/w to XTZ):

  • 2-front :
    • txm-filter-teiperseus-xmlw.xsl
    • txm-filter-teip5-xmlw-preserve.xsl
  • lat.par TreeTagger model

Retour à la liste des projets.

public/perseus.1494567062.txt.gz · Dernière modification: 2017/05/12 07:31 par benedicte.pincemin@ens-lyon.fr