Différences

Cette page vous donne les différences entre la révision choisie et la version actuelle de la page.

public:perseus [2017/05/12 07:31]
benedicte.pincemin@ens-lyon.fr
public:perseus [2017/12/01 17:54] (version actuelle)
benedicte.pincemin@ens-lyon.fr
Ligne 1: Ligne 1:
-This page is dedicated to project using TXM on texts taken from the Perseus Digital Library :+This page is dedicated to projects using TXM on texts taken from the Perseus Digital Library :
  * [[http://www.perseus.tufts.edu/hopper|Perseus Digital Library]]   * [[http://www.perseus.tufts.edu/hopper|Perseus Digital Library]]
    * XML edition (Github)     * XML edition (Github)
  * [[ https://perseusdl.github.io/treebank_data|The Ancient Greek and Latin Dependency Treebank]] (Github)   * [[ https://perseusdl.github.io/treebank_data|The Ancient Greek and Latin Dependency Treebank]] (Github)
-Please take care that this is a public page.+Please note that this is a public page.
Anybody who has subscribed to txm-users mailing list can edit this page. Anybody who has subscribed to txm-users mailing list can edit this page.
-====== CICERO corpus : demontration of Perseus Latin texts in TXM ======+====== Projects ======
-===== Project presentation ===== +  * [[public:perseus_201707_plato|July 2017, 29 greek texts from Plato.]] Context : paper submitted to [[https://chs.harvard.edu/CHS/article/display/1167?menuId=66|Classics@]]. 
- +  * [[public:perseus_201705_cicero|May 2017, 29 latin texts from Cicero.]] Context : Conference [[http://www.altphil.uni-freiburg.de/texte-messen/digital-classics-iii-2013-re-thinking-text-analysis|Digital Classics III – Re-thinking Text Analysis]], Concluding conference on the project //Der digital turn in den Altertumswissenschaften: Wahrnehmung - Dokumentation - Reflexion//, Heidelberg, May 11–13, 2017
-  * context : Heidelberg, May 2017 : [[http://www.altphil.uni-freiburg.de/texte-messen/digital-classics-iii-2013-re-thinking-text-analysis]] +  * [[public:perseus_agdt_201705_plato|May 2017, 1 greek annotated text from Plato (AGDT2).]] Context : Conference [[http://www.altphil.uni-freiburg.de/texte-messen/digital-classics-iii-2013-re-thinking-text-analysis|Digital Classics III – Re-thinking Text Analysis]], Concluding conference on the project //Der digital turn in den Altertumswissenschaften: Wahrnehmung - Dokumentation - Reflexion//, Heidelberg, May 11–13, 2017
- +  * [[public:perseus_201212_plautus|December 2012, 20 latin plays from Plautus.]] Context : presentation at the [[http://www.dh.uni-leipzig.de/wo/e-humanities-seminar/|University of Leipzig eHumanities Seminar]] on December 5th, 2012.
-  * objectif : +
-    * demonstrating that one can work on texts available from Perseus project in TXM +
-    * TEI compliant import +
-    * if possible, nice editions (could be shown through another corpus) +
- +
-  * corpus +
-    * Cicero's texts, latin edition : a copy is here : [[https://sharedocs.huma-num.fr/#/948/3789/Projets/Textom%C3%A9trie/Corpus/src/perseus/Cicero/170502latin]] +
-      * we get all files ending with _lat, except cic.pet_lat.xml because it's a text from Q. Tullius Cicero instead of M. Tullius Cicero. +
- +
-  * Available ressources (approximate list) +
-    * txm-filter-perseus-tei-xtz.xsl +
-      * p4 to p5 conversion +
-      * management of numbered div : div1, div2 +
-      * management of nested <text> : when <group> then includes <subtext> instead of <text> +
-        * teiheader-to-metadata.xsl (?) : gets information from teiHeader and adds them as attribute to <text> element. +
-    * a useful macro : text2metadata à vérifier(to be checked) : generates a metadata.csv from the XML-TXM files of a corpus +
- +
-===== Specifications ===== +
- +
-Conversion from TEI P4 to TEI P5 (Sebastian Ratz stylesheet). +
- +
-Metadata : from <teiHeader><fileDesc><titleStmt>, get +
-  * first <title> content, +
-  * first <author> content, +
-  * first <editor> content. +
- +
-Manage XML-TEI features which wouldn't work with CQP : +
-  * div1, div2 -> div +
-  * <text><group><text> -> <text><group><textgroupitem> (or other better tag name) +
- +
-Distribute <milestone> attributes' information on word tokens (when available). +
- +
-Get page number when available, put it as an @n attibute on <pb> element so thant TXM can use it to number pages in HTML Edition. +
- +
-Render foreign words (tagged with <foreign> element) and titles (<title> elements content) as italics. +
- +
-===== Solution. ===== +
- +
-Make a directory (e.g. "cicero"). +
- +
-This directory includes : +
-  * a copy of every XML file for latin texts of Cicero downloaded from Perseus DL. +
-  * a directory named "xsl", which includes : +
-    * a directory named "2-front", which includes : +
-      * p4top5.xsl +
-      * txm-front-teiperseus-xtz.xsl +
-    * a directory named "3-posttok", which includes : +
-      * txm-posttok-addRef-perseus.xsl +
- +
-Then run the TXM command File>Import>XML-XTZ + CSV. +
-  * Source directory is "cicero" (in our example). +
-  * Import parameters : +
-    *  +
-===== Planification ===== +
- +
-==== Étape 1 ==== +
- +
-==== Étape 2 ==== +
- +
-etc. +
- +
- +
-    * txm-filter-perseustreebank-xmlw.xsl +
- +
-====== PLAUTELAT & PLAUTEEN TXM demo ====== +
- +
-===== Goal ===== +
- +
-  * Context is 2012-12-05 University of Leipzig eHumanities Seminar +
-  * goal was to demo TXM on Latin and English translations of Plaute' plays from Perseus +
- +
-===== Corpus ===== +
- +
-Corpus au Plaute's plays in Latin and their translation in English from Perseus. +
- +
-Import parameters (updated from XML/w to XTZ): +
-  * 2-front : +
-    * txm-filter-teiperseus-xmlw.xsl +
-    * txm-filter-teip5-xmlw-preserve.xsl +
-  * lat.par TreeTagger model +
- +
-  * PLAUTELAT: corpus of Plaute' Latin plays +
-    * source: [[https://sharedocs.huma-num.fr/wl/?id=qftriVBBeFES4jmt2BIobq1IqtypXGnK|davs://sharedocs.huma-num.fr/dav.php/@Shares/(948)%20Cactus/(3792)%20Cactus/Projets/Textométrie/Corpus/src/plautelat-src.zip]] +
-    * binary: [[https://sharedocs.huma-num.fr/wl/?id=eOLdijlvM50Qep1BQTz7UICvYHS3bPDq|davs://sharedocs.huma-num.fr/dav.php/@Shares/(948)%20Cactus/(3792)%20Cactus/Projets/Textométrie/Corpus/bin/PLAUTELAT.txm]] +
-  * PLAUTEEN: corpus of Plaute' English translation of plays +
-    * todo +
- +
----- +
--> [[:|Retour à la liste des projets]].+
public/perseus.1494567062.txt.gz · Dernière modification: 2017/05/12 07:31 par benedicte.pincemin@ens-lyon.fr