Ceci est une ancienne révision du document !


This page is dedicated to project using TXM on texts taken from the Perseus Digital Library :

Please take care that this is a public page.

Anybody who has subscribed to txm-users mailing list can edit this page.

CICERO corpus : demontration of Perseus Latin texts in TXM

Project presentation

  • objectif :
    • demonstrating that one can work on texts available from Perseus project in TXM
    • TEI compliant import
    • if possible, nice editions (could be shown through another corpus)
  • Available ressources (approximate list)
    • txm-filter-perseus-tei-xtz.xsl
      • p4 to p5 conversion
      • management of numbered div : div1, div2
      • management of nested <text> : when <group> then includes <subtext> instead of <text>
        • teiheader-to-metadata.xsl (?) : gets information from teiHeader and adds them as attribute to <text> element.
    • a useful macro : text2metadata à vérifier(to be checked) : generates a metadata.csv from the XML-TXM files of a corpus

Specifications

Conversion from TEI P4 to TEI P5 (Sebastian Ratz stylesheet).

Metadata : from <teiHeader><fileDesc><titleStmt>, get

  • first <title> content,
  • first <author> content,
  • first <editor> content.

Manage XML-TEI features which wouldn't work with CQP :

  • div1, div2 → div
  • <text><group><text> → <text><group><textgroupitem> (or other better tag name)

Distribute <milestone> attributes' information on word tokens (when available).

Get page number when available, put it as an @n attibute on <pb> element so thant TXM can use it to number pages in HTML Edition.

Render foreign words (tagged with <foreign> element) and titles (<title> elements content) as italics.

Solution.

Make a directory (e.g. “cicero”).

This directory includes :

  • a copy of every XML file for latin texts of Cicero downloaded from Perseus DL.
  • a directory named “xsl”, which includes :
    • a directory named “2-front”, which includes :
      • p4top5.xsl
      • txm-front-teiperseus-xtz.xsl
    • a directory named “3-posttok”, which includes :
      • txm-posttok-addRef-perseus.xsl

Then run the TXM command File>Import>XML-XTZ + CSV with the following settings :

  • Source directory is “cicero” (in our example).
  • Import parameters :
    • Main Language : la (to use Treetagger with Latin parameter if TreeTagger has been setup and associated with TXM)
    • Lexical Segmentation : no change - Default settings
    • Editions : Build edition, Words per page = 750, Page break tag = pb
    • Display font : default setting (Font name = <default>)
    • Commands : Concordance context structure limits = text
    • Textual planes :
      • Outside-text = teiHeader,front,back
      • Outside-text to edit = bibl
      • Note elements = note
      • Milestone elements = [nothing, leave blank]
      • Options : default (= remove temporary directories)

Planification

Étape 1

Étape 2

etc.

  • txm-filter-perseustreebank-xmlw.xsl

PLAUTELAT & PLAUTEEN TXM demo

Goal

  • Context is 2012-12-05 University of Leipzig eHumanities Seminar
  • goal was to demo TXM on Latin and English translations of Plaute' plays from Perseus

Corpus

Corpus au Plaute's plays in Latin and their translation in English from Perseus.

Import parameters (updated from XML/w to XTZ):

  • 2-front :
    • txm-filter-teiperseus-xmlw.xsl
    • txm-filter-teip5-xmlw-preserve.xsl
  • lat.par TreeTagger model

Retour à la liste des projets.

public/perseus.1494567812.txt.gz · Dernière modification: 2017/05/12 07:43 par benedicte.pincemin@ens-lyon.fr