Ceci est une ancienne révision du document !


PLATO170720 corpus : how we put 29 Perseus texts from Plato into TXM corpus analysis software

Project presentation

  • Context : paper submitted to Classics@ : “Introduction to Textometric Methodology -illustrated by a first exploration of the Gorgias in the context of Plato's work” (Bénédicte Pincemin, Stéphane Marchand).
  • Goal :
    • Introducing the Hellenic scientific community to Textometric Methodology with examples taken from one famous ancient greek author.
    • Demonstrating that one can work on texts freely available from Perseus project in TXM
    • TEI compliant import : how to use and parameter one available import, so as to take account of all useful information encoded in XML-TEI text files.
    • Nice editions (text display from within TXM) : one can have both a rich digital edition of Plato's work, and advanced functionnalities to search and analyse the full text.

Digital text sources

For this experience we have selected every text, except numbers 15, 16, 17, 29, 33, 35, 36 (for scientific reasons, not technical ones -the solution should work with these texts too, but has not been completely tested).

All these XML-TEI files of Plato's texts are then grouped into one directory named plato170720 (that is, the name we have choosen to give to the TXM corpus).

Principles and choices

When we prepared this corpus in June and July 2017, TEI encoding of plato's texts in Perseus was heterogeneus. We had to deal with several states : last updates made in 2017, 2015, 2014, 1992. 2017 texts were clearly a new generation. Two texts (27 = Ion and 30 = Republic) had some main differences choices in encoding, for instance as regard <div> use and sections' marking.

We decided not to modify sources (which are evolving and improving thanks to perseus community), but to make automatized and limited changes included in the import processing so as to get a usable corpus, even if the TXM user has to compel with some inherited heterogeneity.

The most relevant import format for this Perseus corpus is the XML-XTZ + CSV import (which is available since TXM 0.7.8 version), as it deals with XML-TEI files and allows for many simple and useful settings. A preliminary documentation is available online, before being available in TXM user Manual (see TXM Documentation).

As a basis we take the XSL stylesheets prepared for the previous experience on Perseus texts (Cicero, Heidelberg 2017), also described on the txm-users wiki (here). These stylesheets already manage some XML TEI features of Perseus texts (about nested <div> or <text>) in order to make them compliant with TXM processing (especially for the CQP search engine component embedded in TXM).

Specifications

Title, date of edition

Automatically get text information from teiHeader :

  • title, author and editor from fileDesc<titleStmt> (first mention for each element)
  • content of @when attribute for first (or most recent) <change> element in <revisionDesc>

As date formulation shows big variations throughout the corpus, we also encode this information in a normalized form in a metadata.csv file given as parameter in TXM XTZ import (this produces the update10 property on text structure, as last change date is written with 10 characters).

The title information is interesting as default identifier of the text for references in concordance view. Special cases : We have to deal with some long titles like “Republic (Greek). Machine readable text” (30), idem for Laws (34). We developped 2 solutions :

  • automatic processing : cut them before the first punctuation.
  • hand-coded declaration of titles in a metadata.csv file in TXM XTZ import (which produces “title1” property on text structure, as title is then encoded as 1 ou 2 words).

Nevertheless, CTS URN information must still be available and can be choosen to localize words in the corpus (cf. ctsurn and ctsurn5 property for word structure, and id property for text structure).

Word localizations and references

To precisely localize word occurrences in TXM we would like to have by default the text title and the number of the Stephanus section.

<div> usage is heterogeneus at the moment. The best solution is to use @n attribute in <milestone unit=“section”> elements to get words localization in the Stephanus reference system. We just have to deal with the exception of texts 27 and 30 which code this information only on <div type=“section”> or <div subtype=“section”> elements.

Moreover, knowing that we may use section numbers in some sort processings, we want a version of this numbers that is encoded in a fixed length manner (e.g. 0015a instead of 15a), so that sorting these numbers as strings provides a relevant result.

Only most recent text versions had a pattern declared in <encodingdesc><refsDecl n=“CTS”> to identify sections through the CTS system. So we couldn't use it at the moment but this could be interesting for a later version of the corpus.

We can take into account edition pages : in all the files of our corpus the information is available in <milestone unit=“page”> element, with @n attribute. Solution : during XTZ import, at the 2-front stage, the xsl stylesheet adds <pb> elements which can be used by TXM.

Speech turns

Encoding of speech turns is heterogeneous too :

  • done with <sp> or <said> element,
  • speaker indicated with
    • @who attribute only in <said> elements (but not all of them)
    • a <label> element may introduce the speech turn in <said> elements,
    • a <speaker> element introduces and encodes speaker information for <sp> elements

<p> elements are sometimes used, sometimes not, and can be either outside <said>, or inside <said> and following <label>,…

We want to keep and show clearly speech turn information and speaker information, without indexing the speaker's name as a word to be counted and searched a such.

Solution :

  • adding <label> and <p> when missing (during XTZ import processing) ;
  • declare <speaker> and <label> as Out-of-text-to-edit element in import parameters.

Speaker information is here available in TXM but is heterogeneously encoded. We decided not to edit ourselves this transitional state of Perseus texts ; when source texts will be homogeneous in Perseus then in TXM they will be homogeneous too.

Miscellaneous

Castlists given in files 23 to 27 should be ignored for textometric analysis

  • Solution :
    • declare <castList> as an Out-of-text element in TXM XTZ import parameters.

Bibliographic citation references encoded with <bibl> elements should be distinguished from ancient greek text.

  • Solution :
    • declare <bibl> as note element in TXM XTZ import parameters ;
    • display its content in gray characters (defined in CSS stylesheet for edition).

Versified text (encoded with <l> elements) should be distinguished in TXM text edition.

  • Solution :
    • process it as blocks in CSS.

Solution : how to import

Data preparation

Make a directory (e.g. “plato”).

Add files and subdirectories in it, so that this directory includes :

  • a copy of every XML file for greek texts of Plato downloaded from Perseus DL.
  • (optional) a file named “import.xml” (This file is automatically created or updated during the import processing, it records all the parameters values used for the import, see below.)
  • (optional) a file named “metadata.csv”, which brings additional information to describe texts (i.e. normalized titles, normalized edition date, etc.)
  • a directory named “css”, which includes :
    • perseus.css
  • a directory named “xsl”, which includes :
    • (depending on your TXM version, see note below) a subdirectory named “1-split-merge”, which includes :
      • rename-no-dots.xsl
    • a subdirectory named “2-front”, which includes :
      • p4top5.xsl
      • txm-front-teiperseus-xtz.xsl
    • a subdirectory named “3-posttok”, which includes :
      • txm-posttok-addRef-perseus.xsl
    • a subdirectory named “4-edition”, which includes :
      • 1-default-html.xsl
      • 2-default-pager.xsl

Note about the 1-split-merge directory

About the 1-split-merge directory :

  • A bug in the TXM 0.7.8 version that we had in July 2017 prevented us from keeping dots in filenames, the rename-no-dots.xsl stylesheet is a solution to this bug : dots are replaced by underscores at the first stage of the import.
  • Later, another bug in TXM version delivered in August 2017 happens at this first stage of the import. A solution is to skip this first stage (one can rename the folder 1-split-merge for instance, so that it is not recognize and then not taken into account ; or delete this folder) and do the renaming text files manually, or automatically before TXM import (this can be done from within TXM with the ExecXSL macro).

These two bugs have been reported and should be corrected in one of the next versions of TXM, so try and see if file name corrections are still needed.

Executing the import process

Then run the TXM command File>Import>XML-XTZ + CSV with the following settings :

1. Source directory is “plato” (in our example).

2. Import parameters : Import parameters :

  • Main Language : untick “annotate the corpus” and select “el” for Greek language.
  • Lexical Segmentation : add — in punctuation list (for example as second character just after ”[”, Punctuations field content looks then like this : [—\p{Ps}\p{Pe}\p{Pi}\p{Pf}\p{Po}\p{S}])
  • Editions : Build edition, Words per page = 1000, Page break tag = pb
  • Display font : default setting (Font name = <default>)
  • Commands : Concordance context structure limits = text
  • Textual planes :
    • Outside-text = teiHeader,front,back,castList
    • Outside-text to edit = label,speaker
    • Note elements = bibl,note
    • Milestone elements = [nothing, leave blank]
    • Options : default (= remove temporary directories)

3. Click on “Start corpus import” (above - beginning of the form)

The import parameters are read from, and saved in, the import.xml file included in the copus directory (here “plato” directory). So if you have already imported the corpus, you recover your previous settings for this corpus. You can update or modify it before the new import. This file is edited through the graphical user interface (XML-XTZ + CSV import form).

Content of the metadata.csv file used for this import

Our metadata.csv looks like this in a spreadsheet software (like Cacl or Excel) :

Display of metadata.csv content with Calc

And here is a view of its content as a tabulated text (same file, opened with a plain text editor) :

"id","title1","update10"
"tlg0059_tlg001_perseus-grc1","Euthyphro","2017-03-13"
"tlg0059_tlg002_perseus-grc2","Apology","2017-03-16"
"tlg0059_tlg003_perseus-grc2","Crito","2017-03-16"
"tlg0059_tlg004_perseus-grc2","Phaedo","2017-03-28"
"tlg0059_tlg005_perseus-grc2","Cratylus","2017-03-29"
"tlg0059_tlg006_perseus-grc2","Theaetetus","2017-03-30"
"tlg0059_tlg007_perseus-grc2","Sophist","2017-04-06"
"tlg0059_tlg008_perseus-grc2","Statesman","2017-04-11"
"tlg0059_tlg009_perseus-grc2","Parmenides","2017-04-13"
"tlg0059_tlg010_perseus-grc2","Philebus","2017-04-18"
"tlg0059_tlg011_perseus-grc2","Symposium","2017-04-19"
"tlg0059_tlg012_perseus-grc2","Phaedrus","2017-04-18"
"tlg0059_tlg013_perseus-grc2","Alcibiades 1","2017-05-12"
"tlg0059_tlg014_perseus-grc2","Alcibiades 2","2017-05-12"
"tlg0059_tlg015_perseus-grc2","Hipparchus","2017-05-17"
"tlg0059_tlg016_perseus-grc2","Lovers","2017-05-17"
"tlg0059_tlg017_perseus-grc2","Theages","2017-05-17"
"tlg0059_tlg018_perseus-grc2","Charmides","2017-06-06"
"tlg0059_tlg019_perseus-grc2","Laches","2017-06-06"
"tlg0059_tlg020_perseus-grc2","Lysis","2017-06-07"
"tlg0059_tlg021_perseus-grc2","Euthydemus","2017-06-13"
"tlg0059_tlg022_perseus-grc2","Protagoras","2017-06-19"
"tlg0059_tlg023_perseus-grc2","Gorgias","2017-06-23"
"tlg0059_tlg024_perseus-grc2","Meno","2017-07-10"
"tlg0059_tlg025_perseus-grc1","Hippias Major","2014-07-01"
"tlg0059_tlg026_perseus-grc1","Hippias Minor","2014-07-01"
"tlg0059_tlg027_perseus-grc1","Ion","2014-07-01"
"tlg0059_tlg028_perseus-grc1","Menexenus","2014-07-01"
"tlg0059_tlg029_perseus-grc1","Cleitophon","2014-07-01"
"tlg0059_tlg030_perseus-grc1","Republic","1992-07-01"
"tlg0059_tlg030_perseus-grc2","Republic","2015-04-15"
"tlg0059_tlg031_perseus-grc1","Timaeus","2014-07-01"
"tlg0059_tlg032_perseus-grc1","Critias","2014-07-01"
"tlg0059_tlg033_perseus-grc1","Minos","2014-07-01"
"tlg0059_tlg034_perseus-grc1","Laws","1992-07-01"
"tlg0059_tlg035_perseus-grc1","Epinomis","2014-07-01"
"tlg0059_tlg036_perseus-grc1","Epistles","1992-07-01"

Note that the filenames (in the first column, which is entitled “id”) are adjusted : the dots have been replaced by underscores, because of the bug of early TXM 0.7.8 versions described above. When the bug will be fixed, the directory “1-split-merge” should be removed and real names (with original dots) can be used everywhere (in metadata.csv and for original XML source files in the import directory).

XSL Perseus stylesheets used for this import

perseus.css

This file is put in the css subdirectory (inside the corpus import directory, next to XML-TEI Plato's texts).

/*  
   Copyright © 2017 ENS de Lyon, CNRS, University of Franche-Comté
   Licensed under the terms of the GNU General Public License (http://www.gnu.org/licenses)
   @author cbourdot
   @author sheiden
 
   TXM default CSS 06-2017
 
*/
 
.txmeditionpage {
	background-color: #f8f7ee;
	font-family: brill, 'Arial Unicode MS',ubuntu,verdana; /*junicode (Greek is displayed with italics font)*/
	font-size: 14px;
	text-indent:0px;
	text-align: justify;
	box-shadow: .3125em .3125em .625em #888;
	margin: 1.25em auto;
	padding: 1.25em;
	/*width: 400px;*/
	min-height: 90%;
}
 
.txmeditionpb {
	text-align: center;
}
 
.txmeditionpb:before {
	content: "- ";
}
 
.txmeditionpb:after {
	content: " -";
}
 
.txmlettrinep:first-letter {
    float: left;
    font-size: 6em;
    line-height: 1;
    margin-right: 0.2em;
}
 
.editionpage {
    display:block;
    text-align:center;
    color:gray;
}
 
 
a {
	color:#802520;
}
 
h1 {
	font-size: 20px;
	font-variant: small-caps;
	text-align: center;
	color:#802520;
}
 
h2 {
	font-size: 18px;
	font-variant: small-caps;
	text-align: center;
	color:#802520;
}
 
h3 {
	font-size: 16px;
	font-variant: small-caps;
	text-align: center;
	color:#802520;
}
 
p {
    	text-indent: 0.2cm;
	text-align: justify;
    	text-justify: inter-word;	
  }
 
img {
    margin: 10px 10px 10px 10px;
}
 
td[rend="table-cell-align-right"] {
	text-align: right;
}
 
td[rend="table-cell-align-left"] {
	text-align: left;
}
 
td[rend="table-cell-align-center"] {
	text-align: center;
}
 
.bibl {
color:gray;
}
 
.bibl:before {
    content:"(";
}
 
.bibl:after {
    content:")";
}
 
.hi-italic {
    font-style:italic;
    }
 
.foreign {
    font-style:italic;
    color:darkred;
    }
 
.label, .speaker {
    font-style:italic;
    color:gray;
    }
 
 
.l {
    display:block;
}

rename-no-dots.xsl

This file is put in the xsl/1-split-merge subdirectory (inside the corpus import directory, next to XML-TEI Plato's texts).

It has been added in order to take account of a bug in TXM 0.7.8 early version. See note about 1-split-merge subdirectory above.

<!-- The Identity Transformation -->
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
<xsl:param name="output-directory">
  <xsl:analyze-string select="document-uri(.)" regex="^(.*)/([^/]+)\.[^/.]+$">
    <xsl:matching-substring>
      <xsl:value-of select="regex-group(1)"/>
    </xsl:matching-substring>
  </xsl:analyze-string>
</xsl:param>
 
  <xsl:variable name="filename">
    <xsl:analyze-string select="document-uri(.)" regex="^(.*)/([^/]+)\.[^/.]+$">
      <xsl:matching-substring>
        <xsl:value-of select="regex-group(2)"/>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </xsl:variable>
 
<xsl:template match="/">
  <xsl:result-document href="{$output-directory}/{replace($filename,'\.','_')}.xml">
    <xsl:copy-of select="."></xsl:copy-of>
  </xsl:result-document>
  <warning>Result file written to <xsl:value-of select="concat($output-directory,'/',replace($filename,'\.','_'),'.xml')"/></warning>
</xsl:template>
 
 
</xsl:stylesheet>

p4top5.xsl

This file is put in the xsl/2-front subdirectory (inside the corpus import directory, next to XML-TEI Plato's texts).

Note that this file has been edited to deal with Perseus texts where some pointers already have the ”#” character (see comment in the file).

<?xml version="1.0"?>
<xsl:stylesheet 
    xmlns:edate="http://exslt.org/dates-and-times"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:tei="http://www.tei-c.org/ns/1.0" 
    exclude-result-prefixes="tei edate" version="1.0">
  <!-- 
 
       P4 to P5 converter 
 
       Sebastian Rahtz <sebastian.rahtz@oucs.ox.ac.uk>
 
       $Date: 2007-11-01 16:33:34 +0000 (Thu, 01 Nov 2007) $  $Id: p4top5.xsl 3927 2007-11-01 16:33:34Z rahtz $
 
       Copyright 2007 TEI Consortium
 
       Permission is hereby granted, free of charge, to any person obtaining
       a copy of this software and any associated documentation gfiles (the
       ``Software''), to deal in the Software without restriction, including
       without limitation the rights to use, copy, modify, merge, publish,
       distribute, sublicense, and/or sell copies of the Software, and to
       permit persons to whom the Software is furnished to do so, subject to
       the following conditions:
 
       The above copyright notice and this permission notice shall be included
       in all copies or substantial portions of the Software.
 
  -->
  <xsl:output method="xml" encoding="utf-8"
    cdata-section-elements="tei:eg" omit-xml-declaration="yes"/>
 
  <xsl:variable name="processor">
    <xsl:value-of select="system-property('xsl:vendor')"/>
  </xsl:variable>
 
  <xsl:variable name="today">
    <xsl:choose>
      <xsl:when test="function-available('edate:date-time')">
	<xsl:value-of select="edate:date-time()"/>
      </xsl:when>
      <xsl:when test="contains($processor,'SAXON')">
	<xsl:value-of select="Date:toString(Date:new())"
		      xmlns:Date="/java.util.Date"/>
      </xsl:when>
      <xsl:otherwise>0000-00-00</xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
 
  <xsl:variable name="uc">ABCDEFGHIJKLMNOPQRSTUVWXYZ</xsl:variable>
  <xsl:variable name="lc">abcdefghijklmnopqrstuvwxyz</xsl:variable>
 
  <xsl:template match="*">
    <xsl:choose>
      <xsl:when test="namespace-uri()=''">
	<xsl:element namespace="http://www.tei-c.org/ns/1.0" name="{local-name(.)}">
	  <xsl:apply-templates select="*|@*|processing-instruction()|comment()|text()"/>
	</xsl:element>
      </xsl:when>
      <xsl:otherwise>
	<xsl:copy>
	  <xsl:apply-templates select="*|@*|processing-instruction()|comment()|text()"/>
	</xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
 
 
  <xsl:template match="@*|processing-instruction()|comment()">
    <xsl:copy/>
  </xsl:template>
 
 
  <xsl:template match="text()">
    <xsl:value-of select="."/>
  </xsl:template>
 
 
  <!-- change of name, or replaced by another element -->
  <xsl:template match="teiCorpus.2">
    <teiCorpus xmlns="http://www.tei-c.org/ns/1.0">
      <xsl:apply-templates select="*|@*|processing-instruction()|comment()|text()"/>
    </teiCorpus>
  </xsl:template>
 
  <xsl:template match="witness/@sigil">
    <xsl:attribute name="xml:id">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>
 
  <xsl:template match="witList">
    <listWit xmlns="http://www.tei-c.org/ns/1.0">
      <xsl:apply-templates select="*|@*|processing-instruction()|comment()|text()"/>
    </listWit>
  </xsl:template>
 
 
  <xsl:template match="TEI.2">
    <TEI xmlns="http://www.tei-c.org/ns/1.0">
      <xsl:apply-templates select="*|@*|processing-instruction()|comment()|text()"/>
    </TEI>
  </xsl:template>
 
  <xsl:template match="xref">
    <xsl:element namespace="http://www.tei-c.org/ns/1.0" name="ref">
      <xsl:apply-templates select="*|@*|processing-instruction()|comment()|text()"/>
    </xsl:element>
  </xsl:template>
 
 
  <xsl:template match="xptr">
    <xsl:element namespace="http://www.tei-c.org/ns/1.0" name="ptr">
      <xsl:apply-templates select="*|@*|processing-instruction()|comment()|text()"/>
    </xsl:element>
  </xsl:template>
 
 
  <xsl:template match="figure[@url]">
    <figure xmlns="http://www.tei-c.org/ns/1.0">
      <graphic xmlns="http://www.tei-c.org/ns/1.0">
	<xsl:copy-of select="@*"/>
      </graphic>
      <xsl:apply-templates/>
    </figure>
  </xsl:template>
 
 
  <xsl:template match="figure/@url"/>
 
  <xsl:template match="figure/@entity"/>
 
  <xsl:template match="figure[@entity]">
    <figure xmlns="http://www.tei-c.org/ns/1.0">
      <graphic xmlns="http://www.tei-c.org/ns/1.0" 
	       url="{unparsed-entity-uri(@entity)}">
	<xsl:apply-templates select="@*"/>
      </graphic>
      <xsl:apply-templates/>
    </figure>
  </xsl:template>
 
  <xsl:template match="event">
    <incident  xmlns="http://www.tei-c.org/ns/1.0">
      <xsl:apply-templates select="@*|*|text()|comment()|processing-instruction()"/>
    </incident>
  </xsl:template>
 
  <xsl:template match="state">
    <refState  xmlns="http://www.tei-c.org/ns/1.0">
      <xsl:apply-templates select="@*|*|text()|comment()|processing-instruction()"/>
    </refState>
  </xsl:template>
 
 
  <!-- lost elements -->
  <xsl:template match="dateRange">
    <date xmlns="http://www.tei-c.org/ns/1.0">
      <xsl:apply-templates select="*|@*|processing-instruction()|comment()|text()"/>
    </date>
  </xsl:template>
 
 
  <xsl:template match="dateRange/@from">
    <xsl:copy-of select="."/>
  </xsl:template>
 
  <xsl:template match="dateRange/@to">
    <xsl:copy-of select="."/>
  </xsl:template>
 
  <xsl:template match="language">
    <xsl:element namespace="http://www.tei-c.org/ns/1.0" name="language">
	<xsl:if test="@id">
        <xsl:attribute name="ident">
         	<xsl:value-of select="@id"/>
        </xsl:attribute>
        </xsl:if>
      <xsl:apply-templates select="*|processing-instruction()|comment()|text()"/>
    </xsl:element>
  </xsl:template>
 
  <!-- attributes lost -->
  <!-- dropped from TEI. Added as new change records later -->
  <xsl:template match="@date.created"/>
 
  <xsl:template match="@date.updated"/>
 
  <!-- dropped from TEI. No replacement -->
  <xsl:template match="refsDecl/@doctype"/>
 
  <!-- attributes changed name -->
 
  <xsl:template match="date/@value">
    <xsl:attribute name="when">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>
 
 
  <xsl:template match="@url">
    <xsl:attribute name="target">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>
 
 
  <xsl:template match="@doc">
    <xsl:attribute name="target">
      <xsl:value-of select="unparsed-entity-uri(.)"/>
    </xsl:attribute>
  </xsl:template>
 
 
  <xsl:template match="@id">
    <xsl:choose>
      <xsl:when test="parent::lang">
	<xsl:attribute name="ident">
	  <xsl:value-of select="."/>
	</xsl:attribute>
      </xsl:when>
      <xsl:otherwise>
	<xsl:attribute name="xml:id">
	  <xsl:value-of select="."/>
	</xsl:attribute>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
 
 
  <xsl:template match="@lang">
    <xsl:attribute name="xml:lang">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>
 
 
  <xsl:template match="change/@date"/>
 
  <xsl:template match="date/@certainty">
    <xsl:attribute name="cert">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>
 
  <!-- all pointing attributes preceded by # -->
 
  <xsl:template match="variantEncoding/@location">
    <xsl:copy-of select="."/>
  </xsl:template>
 
<!-- Modified for Perseus texts where some pointers already have # -->
 
  <xsl:template match="@ana|@active|@adj|@adjFrom|@adjTo|@children|@children|@class|@code|@code|@copyOf|@corresp|@decls|@domains|@end|@exclude|@fVal|@feats|@follow|@from|@hand|@inst|@langKey|@location|@mergedin|@new|@next|@old|@origin|@otherLangs|@parent|@passive|@perf|@prev|@render|@resp|@sameAs|@scheme|@script|@select|@since|@start|@synch|@target|@targetEnd|@to|@to|@value|@value|@who|@wit">
    <xsl:attribute name="{name(.)}">
      <xsl:choose>
        <xsl:when test="starts-with(.,'#')">
          <xsl:copy-of select="."/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:call-template name="splitter">
            <xsl:with-param name="val">
              <xsl:value-of select="."/>
            </xsl:with-param>
          </xsl:call-template>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:attribute>
  </xsl:template>
 
 
  <xsl:template name="splitter">
    <xsl:param name="val"/>
    <xsl:choose>
      <xsl:when test="contains($val,' ')">
	<xsl:text>#</xsl:text>
	<xsl:value-of select="substring-before($val,' ')"/>
	<xsl:text> </xsl:text>
	<xsl:call-template name="splitter">
	  <xsl:with-param name="val">
	    <xsl:value-of select="substring-after($val,' ')"/>
	  </xsl:with-param>
	</xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
	<xsl:text>#</xsl:text>
	<xsl:value-of select="$val"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
 
 
  <!-- fool around with selected elements -->
 
 
 <!-- imprint is no longer allowed inside bibl -->
 <xsl:template match="bibl/imprint">
    <xsl:apply-templates/>
  </xsl:template>
 
  <xsl:template match="editionStmt/editor">
    <respStmt xmlns="http://www.tei-c.org/ns/1.0">    
      <resp><xsl:value-of select="@role"/></resp>
      <name><xsl:apply-templates/></name>
    </respStmt>
  </xsl:template>
 
  <!-- header -->  
 
  <xsl:template match="teiHeader">
    <teiHeader xmlns="http://www.tei-c.org/ns/1.0">
      <xsl:apply-templates select="@*|*|comment()|processing-instruction()"/>
 
      <xsl:if test="not(revisionDesc) and (@date.created or @date.updated)">
	<revisionDesc  xmlns="http://www.tei-c.org/ns/1.0">
	  <xsl:if test="@date.updated">
	    <change  xmlns="http://www.tei-c.org/ns/1.0">>
	    <label>updated</label>
	    <date  xmlns="http://www.tei-c.org/ns/1.0">
	      <xsl:value-of select="@date.updated"/>
	    </date>
	    <label  xmlns="http://www.tei-c.org/ns/1.0">Date edited</label>
	    </change>
	  </xsl:if>
	  <xsl:if test="@date.created">
	    <change  xmlns="http://www.tei-c.org/ns/1.0">
	      <label>created</label>
	      <date  xmlns="http://www.tei-c.org/ns/1.0">
		<xsl:value-of select="@date.created"/>
	      </date>
	      <label  xmlns="http://www.tei-c.org/ns/1.0">Date created</label>
	    </change>
	  </xsl:if>
	</revisionDesc>
      </xsl:if>
      <!--
	  <change when="{$today}"  xmlns="http://www.tei-c.org/ns/1.0">Converted to TEI P5 XML by p4top5.xsl
	  written by Sebastian
	  Rahtz at Oxford University Computing Services.</change>
	  </revisionDesc>
	  </xsl:if>
      -->
    </teiHeader>
  </xsl:template>
 
  <xsl:template match="revisionDesc">
    <revisionDesc xmlns="http://www.tei-c.org/ns/1.0">
      <xsl:apply-templates
	  select="@*|*|comment()|processing-instruction()"/>
    </revisionDesc>
  </xsl:template>
 
  <xsl:template match="publicationStmt">
    <publicationStmt xmlns="http://www.tei-c.org/ns/1.0">
      <xsl:apply-templates select="@*|*|comment()|processing-instruction()"/>
      <!--
	  <availability xmlns="http://www.tei-c.org/ns/1.0">
	  <p xmlns="http://www.tei-c.org/ns/1.0">Licensed under <ptr xmlns="http://www.tei-c.org/ns/1.0" target="http://creativecommons.org/licenses/by-sa/2.0/uk/"/></p>
	  </availability>
      -->
    </publicationStmt>
  </xsl:template>
 
 <!-- space does not have @extent any more -->
  <xsl:template match="space/@extent">
    <xsl:attribute name="quantity">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>
 
  <!-- tagsDecl has a compulsory namespace child now -->
  <xsl:template match="tagsDecl">
    <xsl:if test="*">
      <tagsDecl xmlns="http://www.tei-c.org/ns/1.0">
	<namespace name="http://www.tei-c.org/ns/1.0">
	  <xsl:apply-templates select="*|comment()|processing-instruction"/>
	</namespace>
      </tagsDecl>
    </xsl:if>
  </xsl:template>
 
  <!-- orgTitle inside orgName? redundant -->
  <xsl:template match="orgName/orgTitle">
      <xsl:apply-templates/>
  </xsl:template>
 
 <!-- no need for empty <p> in sourceDesc -->  
  <xsl:template match="sourceDesc/p[string-length(.)=0]"/>
 
  <!-- start creating the new choice element -->
  <xsl:template match="corr[@sic]">
    <choice  xmlns="http://www.tei-c.org/ns/1.0">
      <corr  xmlns="http://www.tei-c.org/ns/1.0">
	<xsl:value-of select="text()" />
      </corr>
      <sic  xmlns="http://www.tei-c.org/ns/1.0">
	<xsl:value-of select="@sic" />
      </sic>
    </choice>
  </xsl:template>
 
  <xsl:template match="sic[@corr]">
    <choice  xmlns="http://www.tei-c.org/ns/1.0">
      <sic  xmlns="http://www.tei-c.org/ns/1.0">
	<xsl:value-of select="text()" />
      </sic>
      <corr  xmlns="http://www.tei-c.org/ns/1.0">
	<xsl:value-of select="@corr" />
      </corr>
    </choice>
  </xsl:template>
 
  <xsl:template match="abbr[@expan]">
    <choice  xmlns="http://www.tei-c.org/ns/1.0">
      <abbr  xmlns="http://www.tei-c.org/ns/1.0">
	<xsl:value-of select="text()" />
      </abbr>
      <expan  xmlns="http://www.tei-c.org/ns/1.0">
	<xsl:value-of select="@expan" />
      </expan>
    </choice>
  </xsl:template>
 
  <xsl:template match="expan[@abbr]">
    <choice xmlns="http://www.tei-c.org/ns/1.0">
      <expan xmlns="http://www.tei-c.org/ns/1.0">
	<xsl:value-of select="text()" />
      </expan>
      <abbr xmlns="http://www.tei-c.org/ns/1.0">
	<xsl:value-of select="@abbr" />
      </abbr>
    </choice>
  </xsl:template>
 
  <!-- special consideration for <change> element -->
  <xsl:template match="change">
    <change xmlns="http://www.tei-c.org/ns/1.0">
 
      <xsl:apply-templates select="date"/>
 
      <xsl:if test="respStmt/resp">
	<label>
	  <xsl:value-of select="respStmt/resp/text()"/>
	</label>
      </xsl:if>
	<xsl:for-each select="respStmt/name">
	  <name xmlns="http://www.tei-c.org/ns/1.0">
	    <xsl:apply-templates
		select="@*|*|comment()|processing-instruction()|text()"/>
	  </name>
	</xsl:for-each>
	<xsl:for-each select="item">
	  <xsl:apply-templates
	      select="@*|*|comment()|processing-instruction()|text()"/>
	</xsl:for-each>
    </change>
  </xsl:template>
 
 
  <xsl:template match="respStmt[resp]">
    <respStmt xmlns="http://www.tei-c.org/ns/1.0">
      <xsl:choose>
	<xsl:when test="resp/name">
	  <resp  xmlns="http://www.tei-c.org/ns/1.0">
	    <xsl:value-of select="resp/text()"/>
	  </resp>
	    <xsl:for-each select="resp/name">
	      <name xmlns="http://www.tei-c.org/ns/1.0">
		<xsl:apply-templates/>
	      </name>
	    </xsl:for-each>
	</xsl:when>
	<xsl:otherwise>
	  <xsl:apply-templates/>
	  <name  xmlns="http://www.tei-c.org/ns/1.0">
	  </name>
	</xsl:otherwise>
      </xsl:choose>
    </respStmt>
  </xsl:template>
 
  <xsl:template match="q/@direct"/>
 
  <xsl:template match="q">
    <q xmlns="http://www.tei-c.org/ns/1.0">
      <xsl:apply-templates
	  select="@*|*|comment()|processing-instruction()|text()"/>
    </q>
  </xsl:template>
 
 
<!-- if we are reading the P4 with a DTD,
       we need to avoid copying the default values
       of attributes -->
 
  <xsl:template match="@targOrder">
    <xsl:if test="not(translate(.,$uc,$lc) ='u')">
      <xsl:attribute name="targOrder">
	<xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:if>
  </xsl:template>
 
 
  <xsl:template match="@opt">
    <xsl:if test="not(translate(.,$uc,$lc) ='n')">
      <xsl:attribute name="opt">
	<xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:if>
  </xsl:template>
 
 
  <xsl:template match="@to">
    <xsl:if test="not(translate(.,$uc,$lc) ='ditto')">
      <xsl:attribute name="to">
	<xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:if>
  </xsl:template>
 
 
  <xsl:template match="@default">
    <xsl:choose>
      <xsl:when test="translate(.,$uc,$lc)= 'no'"/>
      <xsl:otherwise>
	<xsl:attribute name="default">
	  <xsl:value-of select="."/>
	</xsl:attribute>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
 
 
  <xsl:template match="@part">
    <xsl:if test="not(translate(.,$uc,$lc) ='n')">
      <xsl:attribute name="part">
	<xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:if>
  </xsl:template>
 
 
  <xsl:template match="@full">
    <xsl:if test="not(translate(.,$uc,$lc) ='yes')">
      <xsl:attribute name="full">
	<xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:if>
  </xsl:template>
 
 
  <xsl:template match="@from">
    <xsl:if test="not(translate(.,$uc,$lc) ='root')">
      <xsl:attribute name="from">
	<xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:if>
  </xsl:template>
 
 
  <xsl:template match="@status">
    <xsl:choose>
      <xsl:when test="parent::teiHeader">
	<xsl:if test="not(translate(.,$uc,$lc) ='new')">
	  <xsl:attribute name="status">
	    <xsl:value-of select="."/>
	  </xsl:attribute>
	</xsl:if>
      </xsl:when>
      <xsl:when test="parent::del">
	<xsl:if test="not(translate(.,$uc,$lc) ='unremarkable')">
	  <xsl:attribute name="status">
	    <xsl:value-of select="."/>
	  </xsl:attribute>
	</xsl:if>
      </xsl:when>
      <xsl:otherwise>
	<xsl:attribute name="status">
	  <xsl:value-of select="."/>
	</xsl:attribute>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
 
 
  <xsl:template match="@place">
    <xsl:if test="not(translate(.,$uc,$lc) ='unspecified')">
      <xsl:attribute name="place">
	<xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:if>
  </xsl:template>
 
 
  <xsl:template match="@sample">
    <xsl:if test="not(translate(.,$uc,$lc) ='complete')">
      <xsl:attribute name="sample">
	<xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:if>
  </xsl:template>
 
 
  <xsl:template match="@org">
    <xsl:if test="not(translate(.,$uc,$lc) ='uniform')">
      <xsl:attribute name="org">
	<xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:if>
  </xsl:template>
 
  <xsl:template match="teiHeader/@type">
    <xsl:if test="not(translate(.,$uc,$lc) ='text')">
      <xsl:attribute name="type">
	<xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:if>
  </xsl:template>
 
  <!-- yes|no to boolean -->
 
  <xsl:template match="@anchored">
    <xsl:attribute name="anchored">
      <xsl:choose>
	<xsl:when test="translate(.,$uc,$lc)='yes'">true</xsl:when>
	<xsl:when test="translate(.,$uc,$lc)='no'">false</xsl:when>
      </xsl:choose>
    </xsl:attribute>
  </xsl:template>
 
  <xsl:template match="sourceDesc/@default"/>
 
  <xsl:template match="@tei">
    <xsl:attribute name="tei">
      <xsl:choose>
	<xsl:when test="translate(.,$uc,$lc)='yes'">true</xsl:when>
	<xsl:when test="translate(.,$uc,$lc)='no'">false</xsl:when>
      </xsl:choose>
    </xsl:attribute>
  </xsl:template>
 
  <xsl:template match="@langKey"/>  
 
  <xsl:template match="@TEIform"/>  
 
<!-- assorted atts -->
  <xsl:template match="@old"/>  
 
  <xsl:template match="@mergedin">  
    <xsl:attribute name="mergedIn">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>
 
<!-- deal with the loss of div0 -->  
 
  <xsl:template match="div1|div2|div3|div4|div5|div6">
    <xsl:variable name="divName">
    <xsl:choose>
      <xsl:when test="ancestor::div0">
	<xsl:text>div</xsl:text>
	<xsl:value-of select="number(substring-after(local-name(.),'div')) + 1"/>
      </xsl:when>
      <xsl:otherwise>
	<xsl:value-of select="local-name()"/>
      </xsl:otherwise>
    </xsl:choose>
    </xsl:variable>
    <xsl:element name="{$divName}" namespace="http://www.tei-c.org/ns/1.0">
      <xsl:apply-templates select="*|@*|processing-instruction()|comment()|text()"/>
    </xsl:element>
  </xsl:template>
 
  <xsl:template match="div0">
    <div1 xmlns="http://www.tei-c.org/ns/1.0">
    <xsl:apply-templates 
	select="*|@*|processing-instruction()|comment()|text()"/>
    </div1>
  </xsl:template>
 
</xsl:stylesheet>

txm-front-teiperseus-xtz.xsl

This file is put in the xsl/2-front subdirectory (inside the corpus import directory, next to XML-TEI Plato's texts).

 

txm-posttok-addRef-perseus.xsl

This file is put in the xsl/3-posttok subdirectory (inside the corpus import directory, next to XML-TEI Plato's texts).

 

1-default-html.xsl

This file is put in the xsl/4-edition subdirectory (inside the corpus import directory, next to XML-TEI Plato's texts).

 

2-default-pager.xsl

This file is put in the xsl/4-edition subdirectory (inside the corpus import directory, next to XML-TEI Plato's texts).

 

public/perseus_201707_plato.1512145668.txt.gz · Dernière modification: 2017/12/01 17:27 par benedicte.pincemin@ens-lyon.fr