Ci-dessous, les différences entre deux révisions de la page.
Les deux révisions précédentesRévision précédenteProchaine révision | Révision précédente | ||
public:perseus [01/12/2017 10:57] – benedicte.pincemin@ens-lyon.fr | public:perseus [01/12/2017 17:54] (Version actuelle) – benedicte.pincemin@ens-lyon.fr | ||
---|---|---|---|
Ligne 10: | Ligne 10: | ||
====== Projects ====== | ====== Projects ====== | ||
- | [[public:perseus|Perseus : construction de corpus de textes issus de Perseus]] | + | * [[public:perseus_201707_plato|July 2017, 29 greek texts from Plato.]] Context |
- | + | * [[public:perseus_201705_cicero|May 2017, 29 latin texts from Cicero.]] | |
- | + | * [[public:perseus_agdt_201705_plato|May 2017, 1 greek annotated | |
- | ====== CICERO corpus : demontration of Perseus Latin texts in TXM ====== | + | * [[public:perseus_201212_plautus|December |
- | + | ||
- | ===== Project presentation ===== | + | |
- | + | ||
- | * context : Heidelberg, May 2017 : [[http:// | + | |
- | + | ||
- | * goal : | + | |
- | * demonstrating that one can work on texts available from Perseus project in TXM | + | |
- | * TEI compliant import | + | |
- | * if possible, nice editions (could be shown through another corpus) | + | |
- | + | ||
- | * corpus | + | |
- | * Cicero' | + | |
- | * we get all files ending with _lat, except cic.pet_lat.xml because it's a text from Q. Tullius Cicero instead of M. Tullius Cicero. | + | |
- | + | ||
- | * Available ressources (approximate list) | + | |
- | * p4top5.xsl | + | |
- | * TEI P4 to P5 conversion | + | |
- | * txm-filter-perseus-tei-xtz.xsl | + | |
- | * management of numbered div: div1, div2 | + | |
- | * management of nested < | + | |
- | * teiheader-to-metadata.xsl: | + | |
- | * a useful macro : text2metadata: | + | |
- | + | ||
- | ===== Specifications ===== | + | |
- | + | ||
- | Conversion from TEI P4 to TEI P5 (Sebastian Ratz stylesheet). | + | |
- | + | ||
- | Metadata : from < | + | |
- | * first < | + | |
- | * first < | + | |
- | * first < | + | |
- | + | ||
- | Manage XML-TEI features which wouldn' | + | |
- | * div1, div2 -> div | + | |
- | * < | + | |
- | + | ||
- | Distribute < | + | |
- | + | ||
- | Get page number when available, put it as an @n attibute on <pb> element so that TXM can use it to number pages in HTML Edition. | + | |
- | + | ||
- | Render foreign words (tagged with < | + | |
- | + | ||
- | ===== Solution ===== | + | |
- | + | ||
- | Make a directory (e.g. " | + | |
- | + | ||
- | This directory includes : | + | |
- | * a copy of every XML file for latin texts of Cicero | + | |
- | * a directory named " | + | |
- | * a subdirectory named " | + | |
- | * p4top5.xsl | + | |
- | * txm-front-teiperseus-xtz.xsl | + | |
- | * a subdirectory named " | + | |
- | * txm-posttok-addRef-perseus.xsl | + | |
- | + | ||
- | Then run the TXM command File> | + | |
- | + | ||
- | 1. Source directory is " | + | |
- | + | ||
- | 2. Import parameters : | + | |
- | * Main Language : la (to use Treetagger with Latin parameter if TreeTagger has been setup and associated with TXM) | + | |
- | * Lexical Segmentation : no change - Default settings | + | |
- | * Editions : Build edition, Words per page = 750, Page break tag = pb | + | |
- | * Display font : default setting (Font name = < | + | |
- | * Commands : Concordance context structure limits = text | + | |
- | * Textual planes : | + | |
- | * Outside-text = teiHeader, | + | |
- | * Outside-text to edit = bibl | + | |
- | * Note elements = note | + | |
- | * Milestone elements = [nothing, leave blank] | + | |
- | * Options : default (= remove temporary directories) | + | |
- | + | ||
- | 3. Click on "Start corpus import" | + | |
- | + | ||
- | + | ||
- | Another import can be done, adding a metadata.csv file in order to get more metadata than only the ones automatically extracted from teiHeader (title, first author, first editor). | + | |
- | + | ||
- | ===== Feedback ===== | + | |
- | + | ||
- | Some features of XML-XTZ import have not been implemented yet, especially @rend attribute seems is not used to interpret < | + | |
- | + | ||
- | < | + | |
- | + | ||
- | ===== XSL Perseus stylesheets used for this import ===== | + | |
- | + | ||
- | ==== txm-front-teiperseus-xtz.xsl ==== | + | |
- | + | ||
- | <code XML> | + | |
- | <?xml version=" | + | |
- | < | + | |
- | xmlns: | + | |
- | xmlns: | + | |
- | xmlns: | + | |
- | exclude-result-prefixes=" | + | |
- | + | ||
- | <xd:doc type=" | + | |
- | < | + | |
- | A stylesheet to prepare PERSEUS XML-TEI texts to TXM import. | + | |
- | </ | + | |
- | < | + | |
- | This stylesheet is free software; you can redistribute it and/or | + | |
- | modify it under the terms of the GNU Lesser General Public | + | |
- | License as published by the Free Software Foundation; either | + | |
- | version 3 of the License, or (at your option) any later version. | + | |
- | + | ||
- | This stylesheet is distributed in the hope that it will be useful, | + | |
- | but WITHOUT ANY WARRANTY; without even the implied warranty of | + | |
- | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | + | |
- | Lesser General Public License for more details. | + | |
- | + | ||
- | You should have received a copy of GNU Lesser Public License with | + | |
- | this stylesheet. If not, see http:// | + | |
- | </ | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | + | ||
- | + | ||
- | < | + | |
- | + | ||
- | < | + | |
- | <!-- Copy the current node --> | + | |
- | < | + | |
- | <!-- Including any attributes it has and any child nodes --> | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | <!-- This template had better be commented if one uses a metadata file with the same information : --> | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | <w xmlns=" | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | <!-- Temporary patch for TXM indexing quote elements in notes --> | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | <!-- | + | |
- | (i) adding an < | + | |
- | (ii) adding a <w> element to prevent tokenisation from analysing some content (e.g. foreign) | + | |
- | --> | + | |
- | + | ||
- | < | + | |
- | <emph rend=" | + | |
- | < | + | |
- | <w xmlns=" | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | <emph rend=" | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | <!-- Temporary patch to get the correct rendering for <hi @rend=" | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | </ | + | |
- | </ | + | |
- | + | ||
- | ==== txm-posttok-addRef-perseus.xsl ==== | + | |
- | + | ||
- | <code XML> | + | |
- | <?xml version=" | + | |
- | < | + | |
- | xmlns: | + | |
- | xmlns: | + | |
- | exclude-result-prefixes=" | + | |
- | + | ||
- | <!-- | + | |
- | This software is dual-licensed: | + | |
- | + | ||
- | 1. Distributed under a Creative Commons Attribution-ShareAlike 3.0 | + | |
- | Unported License http:// | + | |
- | + | ||
- | 2. http:// | + | |
- | + | ||
- | All rights reserved. | + | |
- | + | ||
- | Redistribution and use in source and binary forms, with or without | + | |
- | modification, are permitted provided that the following conditions are | + | |
- | met: | + | |
- | + | ||
- | * Redistributions of source code must retain the above copyright | + | |
- | notice, this list of conditions and the following disclaimer. | + | |
- | + | ||
- | * Redistributions in binary form must reproduce the above copyright | + | |
- | notice, this list of conditions and the following disclaimer in the | + | |
- | documentation and/or other materials provided with the distribution. | + | |
- | + | ||
- | This software is provided by the copyright holders and contributors | + | |
- | "as is" and any express or implied warranties, including, but not | + | |
- | limited to, the implied warranties of merchantability and fitness for | + | |
- | a particular purpose are disclaimed. In no event shall the copyright | + | |
- | holder or contributors be liable for any direct, indirect, incidental, | + | |
- | special, exemplary, or consequential damages (including, but not | + | |
- | limited to, procurement of substitute goods or services; loss of use, | + | |
- | data, or profits; or business interruption) however caused and on any | + | |
- | theory of liability, whether in contract, strict liability, or tort | + | |
- | (including negligence or otherwise) arising in any way out of the use | + | |
- | of this software, even if advised of the possibility of such damage. | + | |
- | + | ||
- | + | ||
- | This stylesheet adds a ref attribute to w elements that will be used for | + | |
- | references in TXM concordances. Can be used with TXM XTZ import module. | + | |
- | + | ||
- | Written by Alexei Lavrentiev, UMR 5317 IHRIM, 2017 | + | |
- | --> | + | |
- | + | ||
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | <!-- General patterns: all elements, attributes, comments and processing instructions are copied --> | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | | + | |
- | + | ||
- | + | ||
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | </xsl:when> | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | <!-- ajout Perseus --> | + | |
- | <xsl:if test=" | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | <xsl:if test=" | + | |
- | <xsl:text>, s. </ | + | |
- | < | + | |
- | </ | + | |
- | <!-- fin ajout Perseus --> | + | |
- | + | ||
- | <xsl:if test=" | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | <xsl:if test=" | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | </ | + | |
- | </ | + | |
- | + | ||
- | ====== PLATO corpus : demontration of Perseus Greek & Treebank texts (AGDT 2) in TXM ====== | + | |
- | + | ||
- | ===== Project presentation ===== | + | |
- | + | ||
- | * context : Heidelberg, May 2017 : [[http:// | + | |
- | + | ||
- | * goal : | + | |
- | * demonstrating that one can work on texts available from Perseus project in TXM | + | |
- | * TEI compliant import | + | |
- | * compatibility of TXM with greek language | + | |
- | * showing that TXM can work on the POS annotation provided by the Treebank (TreeTagger is not the only way to get tagged texts in TXM). | + | |
- | + | ||
- | * corpus | + | |
- | * Plato' | + | |
- | + | ||
- | * Available ressources (approximate list) | + | |
- | * txm-filter-perseustreebank-xmlw.xsl | + | |
- | + | ||
- | ===== Solution ===== | + | |
- | + | ||
- | Make a directory (e.g. " | + | |
- | + | ||
- | Then run the TXM command File> | + | |
- | + | ||
- | 1. Source directory is " | + | |
- | + | ||
- | 2. Import parameters : | + | |
- | * Main Language : untick " | + | |
- | * Lexical Segmentation : no change - Default settings | + | |
- | * Front XSL : indicate the copy of txm-filter-perseustreebank-xmlw.xsl in your file system | + | |
- | * Editions : default setting (Build edition, Words per page = 500, Page break tag = pb) | + | |
- | * Display font : default setting (Font name = < | + | |
- | * Commands : default setting (Concordance context structure limits = text) | + | |
- | + | ||
- | 3. Click on "Start corpus import" | + | |
- | + | ||
- | ===== Feedback ===== | + | |
- | + | ||
- | We made 2 changes in the stylesheet : | + | |
- | * a correction : rename Perseus @id attribute on <w> words for compatibility with TXM | + | |
- | * an improvement : add <lb/> elements after each sentence for better rendering | + | |
- | + | ||
- | ===== XSL Perseus stylesheet used for this import ===== | + | |
- | + | ||
- | ==== txm-filter-perseustreebank-xmlw.xsl ==== | + | |
- | + | ||
- | <code XML> | + | |
- | <?xml version=" | + | |
- | <xsl:stylesheet | + | |
- | xmlns: | + | |
- | xmlns: | + | |
- | xmlns: | + | |
- | xmlns: | + | |
- | xmlns: | + | |
- | exclude-result-prefixes=" | + | |
- | + | ||
- | + | ||
- | <xd:doc type=" | + | |
- | < | + | |
- | A stylesheet to prepare PERSEUS Treebank XML texts to TXM XML/w import. | + | |
- | </ | + | |
- | < | + | |
- | This stylesheet is free software; you can redistribute it and/or | + | |
- | modify it under the terms of the GNU Lesser General Public | + | |
- | License as published by the Free Software Foundation; either | + | |
- | version 3 of the License, or (at your option) any later version. | + | |
- | + | ||
- | This stylesheet is distributed in the hope that it will be useful, | + | |
- | but WITHOUT ANY WARRANTY; without even the implied warranty of | + | |
- | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | + | |
- | Lesser General Public License for more details. | + | |
- | + | ||
- | You should have received a copy of GNU Lesser Public License with | + | |
- | this stylesheet. If not, see http://www.gnu.org/ | + | |
- | </ | + | |
- | < | + | |
- | < | + | |
- | | + | |
- | + | ||
- | + | ||
- | < | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | + | ||
- | < | + | |
- | + | ||
- | < | + | |
- | <text type=" | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | < | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | <w> | + | |
- | < | + | |
- | < | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | < | + | |
- | < | + | |
- | + | ||
- | </ | + | |
- | </ | + | |
- | </ | + | |
- | + | ||
- | ====== PLAUTELAT & PLAUTEEN TXM demo ====== | + | |
- | + | ||
- | ===== Goal ===== | + | |
- | + | ||
- | * Context is 2012-12-05 University of Leipzig eHumanities Seminar | + | |
- | * goal was to demo TXM on Latin and English translations of Plaute' | + | |
- | + | ||
- | ===== Corpus ===== | + | |
- | + | ||
- | Corpus au Plaute' | + | |
- | + | ||
- | Import parameters (updated from XML/w to XTZ): | + | |
- | * 2-front : | + | |
- | * txm-filter-teiperseus-xmlw.xsl | + | |
- | * txm-filter-teip5-xmlw-preserve.xsl | + | |
- | * lat.par TreeTagger model | + | |
- | + | ||
- | * PLAUTELAT: corpus of Plaute' | + | |
- | * source: [[https://sharedocs.huma-num.fr/ | + | |
- | * binary: [[https:// | + | |
- | * PLAUTEEN: corpus | + | |
- | * todo | + | |
- | + | ||
- | ---- | + | |
- | -> [[:|Retour à la liste des projets]]. | + | |