Outils pour utilisateurs

Outils du site


public:perseus_201707_plato

Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

Les deux révisions précédentes Révision précédente
public:perseus_201707_plato [2018/06/12 10:48]
benedicte.pincemin@ens-lyon.fr
public:perseus_201707_plato [2019/07/18 10:50] (Version actuelle)
benedicte.pincemin@ens-lyon.fr
Ligne 30: Ligne 30:
 ===== Principles and choices ===== ===== Principles and choices =====
  
-When we prepared this corpus in June and July 2017, TEI encoding of plato'​s texts in Perseus was heterogeneus. We had to deal with several states : last updates made in 2017, 2015, 2014, 1992. 2017 texts were clearly a new generation. Two texts (27 = Ion and 30 = Republic) had some main differences choices in encoding, for instance as regard <div> use and sections'​ marking.+When we prepared this corpus in June and July 2017, TEI encoding of plato'​s texts in Perseus was heterogeneus. We had to deal with several states : last updates made in 2017, 2015, 2014, 1992. 2017 texts were clearly a new generation. Two texts (27 = Ion and 30 = Republic) had some main differences choices in encoding, for instance as regard ​%%<div>%% use and sections'​ marking.
  
 We decided not to modify sources (which are evolving and improving thanks to perseus community), but to make automatized and limited changes included in the import processing so as to get a usable corpus, even if the TXM user has to compel with some inherited heterogeneity. We decided not to modify sources (which are evolving and improving thanks to perseus community), but to make automatized and limited changes included in the import processing so as to get a usable corpus, even if the TXM user has to compel with some inherited heterogeneity.
Ligne 36: Ligne 36:
 The most relevant import format for this Perseus corpus is the XML-XTZ + CSV import (which is available since TXM 0.7.8 version), as it deals with XML-TEI files and allows for many simple and useful settings. A [[https://​groupes.renater.fr/​wiki/​txm-info/​public/​specs_import_xtz_docu|preliminary documentation]] is available online, before being available in TXM user Manual (see [[http://​textometrie.ens-lyon.fr/​spip.php?​rubrique64|TXM Documentation]]). The most relevant import format for this Perseus corpus is the XML-XTZ + CSV import (which is available since TXM 0.7.8 version), as it deals with XML-TEI files and allows for many simple and useful settings. A [[https://​groupes.renater.fr/​wiki/​txm-info/​public/​specs_import_xtz_docu|preliminary documentation]] is available online, before being available in TXM user Manual (see [[http://​textometrie.ens-lyon.fr/​spip.php?​rubrique64|TXM Documentation]]).
  
-As a basis we take the XSL stylesheets prepared for the previous experience on Perseus texts (Cicero, Heidelberg 2017), also described on the txm-users wiki ([[public:​perseus_201705_cicero|here]]). These stylesheets already manage some XML TEI features of Perseus texts (about nested <div> or <​text>​) in order to make them compliant with TXM processing (especially for the CQP search engine component embedded in TXM).+As a basis we take the XSL stylesheets prepared for the previous experience on Perseus texts (Cicero, Heidelberg 2017), also described on the txm-users wiki ([[public:​perseus_201705_cicero|here]]). These stylesheets already manage some XML TEI features of Perseus texts (about nested ​%%<div>%% or %%<​text>​%%) in order to make them compliant with TXM processing (especially for the CQP search engine component embedded in TXM).
  
 ===== Specifications ===== ===== Specifications =====
Ligne 43: Ligne 43:
  
 Automatically get text information from teiHeader : Automatically get text information from teiHeader :
-  * title, author and editor from fileDesc<​titleStmt>​ (first mention for each element) +  * title, author and editor from fileDesc%%<​titleStmt>​%% (first mention for each element) 
-  * content of @when attribute for first (or most recent) <​change>​ element in <​revisionDesc>​+  * content of @when attribute for first (or most recent) ​%%<​change>​%% element in %%<​revisionDesc>​%%
  
 As date formulation shows big variations throughout the corpus, we also encode this information in a normalized form in a [[public:​perseus_201707_plato#​content_of_the_metadatacsv_file_used_for_this_import|metadata.csv]] file given as parameter in TXM XTZ import (this produces the update10 property on text structure, as last change date is written with 10 characters). As date formulation shows big variations throughout the corpus, we also encode this information in a normalized form in a [[public:​perseus_201707_plato#​content_of_the_metadatacsv_file_used_for_this_import|metadata.csv]] file given as parameter in TXM XTZ import (this produces the update10 property on text structure, as last change date is written with 10 characters).
Ligne 59: Ligne 59:
 To precisely localize word occurrences in TXM we would like to have by default the text title and the number of the Stephanus section. To precisely localize word occurrences in TXM we would like to have by default the text title and the number of the Stephanus section.
  
-<div> usage is heterogeneus at the moment. The best solution is to use @n attribute in <​milestone unit="​section">​ elements to get words localization in the Stephanus reference system. We just have to deal with the exception of texts 27 and 30 which code this information only on <div type="​section">​ or <div subtype="​section">​ elements.+%%<div>%% usage is heterogeneus at the moment. The best solution is to use @n attribute in %%<​milestone unit="​section">​%% elements to get words localization in the Stephanus reference system. We just have to deal with the exception of texts 27 and 30 which code this information only on %%<div type="​section">​%% or %%<div subtype="​section">​%% elements.
  
 Moreover, knowing that we may use section numbers in some sort processings,​ we want a version of this numbers that is encoded in a fixed length manner (e.g. 0015a instead of 15a), so that sorting these numbers as strings provides a relevant result. Moreover, knowing that we may use section numbers in some sort processings,​ we want a version of this numbers that is encoded in a fixed length manner (e.g. 0015a instead of 15a), so that sorting these numbers as strings provides a relevant result.
  
-<​note>​Only most recent text versions had a pattern declared in <​encodingdesc><​refsDecl n="​CTS">​ to identify sections through the CTS system. So we couldn'​t use it at the moment but this could be interesting for a later version of the corpus.+<​note>​Only most recent text versions had a pattern declared in %%<​encodingdesc><​refsDecl n="​CTS">​%% to identify sections through the CTS system. So we couldn'​t use it at the moment but this could be interesting for a later version of the corpus.
 </​note>​ </​note>​
  
-We can take into account edition pages : in all the files of our corpus the information is available in <​milestone unit="​page">​ element, with @n attribute. Solution : during XTZ import, at the 2-front stage, the [[public:​perseus_201707_plato#​txm-front-teiperseus-xtzxsl|txm-front-teiperseus-xtz.xsl]] stylesheet adds <pb> elements which can be used by TXM.+We can take into account edition pages : in all the files of our corpus the information is available in %%<​milestone unit="​page">​%% element, with @n attribute. Solution : during XTZ import, at the 2-front stage, the [[public:​perseus_201707_plato#​txm-front-teiperseus-xtzxsl|txm-front-teiperseus-xtz.xsl]] stylesheet adds %%<pb>%% elements which can be used by TXM.
  
 ==== Speech turns ==== ==== Speech turns ====
  
 Encoding of speech turns is heterogeneous too : Encoding of speech turns is heterogeneous too :
-  * done with <sp> or <​said>​ element,+  * done with %%<sp>%% or %%<​said>​%% element,
   * speaker indicated with   * speaker indicated with
-    * @who attribute only in <​said>​ elements (but not all of them) +    * @who attribute only in %%<​said>​%% elements (but not all of them) 
-    * a <​label>​ element may introduce the speech turn in <​said>​ elements, +    * a %%<​label>​%% element may introduce the speech turn in %%<​said>​%% elements, 
-    * a <​speaker>​ element introduces and encodes speaker information for <sp> elements+    * a %%<​speaker>​%% element introduces and encodes speaker information for %%<sp>%% elements
  
-<p> elements are sometimes used, sometimes not, and can be either outside <​said>,​ or inside <​said>​ and following <​label>,​...+%%<p>%% elements are sometimes used, sometimes not, and can be either outside ​%%<​said>​%%, or inside ​%%<​said>​%% and following ​%%<​label>​%%,...
  
 We want to keep and **show clearly** speech turn information and speaker information,​ **without indexing the speaker'​s name** as a word to be counted and searched a such. We want to keep and **show clearly** speech turn information and speaker information,​ **without indexing the speaker'​s name** as a word to be counted and searched a such.
  
 **Solution** : **Solution** :
-  * adding <​label>​ and <p> when missing (during XTZ import processing) ; +  * adding ​%%<​label>​%% and %%<p>%% when missing (during XTZ import processing) ; 
-  * declare <​speaker>​ and <​label>​ as Out-of-text-to-edit element in import parameters.+  * declare ​%%<​speaker>​%% and %%<​label>​%% as Out-of-text-to-edit element in import parameters.
  
 <​note>​ <​note>​
Ligne 93: Ligne 93:
 **Castlists** given in files 23 to 27 should be ignored for textometric analysis **Castlists** given in files 23 to 27 should be ignored for textometric analysis
   * Solution :   * Solution :
-    * declare <​castList>​ as an Out-of-text element in TXM XTZ import parameters.+    * declare ​%%<​castList>​%% as an Out-of-text element in TXM XTZ import parameters.
  
  
  
-**Bibliographic citation references** encoded with <​bibl>​ elements should be distinguished from ancient greek text.+**Bibliographic citation references** encoded with %%<​bibl>​%% elements should be distinguished from ancient greek text.
   * Solution :   * Solution :
-    * declare <​bibl>​ as note element in TXM XTZ import parameters ;+    * declare ​%%<​bibl>​%% as note element in TXM XTZ import parameters ;
     * display its content in gray characters (defined in [[public:​perseus_201707_plato#​perseuscss|perseus.css]] stylesheet for edition).     * display its content in gray characters (defined in [[public:​perseus_201707_plato#​perseuscss|perseus.css]] stylesheet for edition).
  
  
  
-**Versified text** (encoded with <l> elements) should be distinguished in TXM text edition.+**Versified text** (encoded with %%<l>%% elements) should be distinguished in TXM text edition.
   * Solution :   * Solution :
     * process it as blocks in [[public:​perseus_201707_plato#​perseuscss|perseus.css]].     * process it as blocks in [[public:​perseus_201707_plato#​perseuscss|perseus.css]].
Ligne 154: Ligne 154:
   * Lexical Segmentation : add — in punctuation list (for example as second character just after "​[",​ Punctuations field content looks then like this : [—\p{Ps}\p{Pe}\p{Pi}\p{Pf}\p{Po}\p{S}])   * Lexical Segmentation : add — in punctuation list (for example as second character just after "​[",​ Punctuations field content looks then like this : [—\p{Ps}\p{Pe}\p{Pi}\p{Pf}\p{Po}\p{S}])
   * Editions : Build edition, Words per page = 1000, Page break tag = pb   * Editions : Build edition, Words per page = 1000, Page break tag = pb
-  * Display font : default setting (Font name = <​default>​)+  * Display font : default setting (Font name = %%<​default>​%%)
   * Commands : Concordance context structure limits = text   * Commands : Concordance context structure limits = text
   * Textual planes :   * Textual planes :
Ligne 1727: Ligne 1727:
  <​xsl:​choose>​  <​xsl:​choose>​
  <​xsl:​when test="​matches(@n,'​^[0-9]*[05]$'​)">​  <​xsl:​when test="​matches(@n,'​^[0-9]*[05]$'​)">​
- <​!--<​a title="​{@n}"​ class="​verseline"​ style="​position:​relative">​ </​a>​-->​+ <​!--<​a title="​{@n}"​ class="​verseline"​ style="​position:​relative">​ </​a>​-->​
  <​!--<​span class="​verseline"><​span class="​verselinenumber"><​xsl:​value-of select="​@n"/></​span></​span>​-->​  <​!--<​span class="​verseline"><​span class="​verselinenumber"><​xsl:​value-of select="​@n"/></​span></​span>​-->​
  <​span class="​verselinenumber"><​xsl:​value-of select="​@n"/></​span>​  <​span class="​verselinenumber"><​xsl:​value-of select="​@n"/></​span>​
Ligne 1733: Ligne 1733:
  </​xsl:​when>​  </​xsl:​when>​
  <​xsl:​when test="​matches(@n,'​[^0-9]'​)">​  <​xsl:​when test="​matches(@n,'​[^0-9]'​)">​
- <​!--<​a title="​{@n}"​ class="​verseline"​ style="​position:​relative">​ </​a>​-->​+ <​!--<​a title="​{@n}"​ class="​verseline"​ style="​position:​relative">​ </​a>​-->​
  <​!--<​span class="​verseline"><​span class="​verselinenumber"><​xsl:​value-of select="​@n"/></​span></​span>​-->​  <​!--<​span class="​verseline"><​span class="​verselinenumber"><​xsl:​value-of select="​@n"/></​span></​span>​-->​
  <​span class="​verselinenumber"><​xsl:​value-of select="​@n"/></​span>​  <​span class="​verselinenumber"><​xsl:​value-of select="​@n"/></​span>​
public/perseus_201707_plato.1528793339.txt.gz · Dernière modification: 2018/06/12 10:48 par benedicte.pincemin@ens-lyon.fr