We will use TXM “XML/w + CSV” import module which needs each text of the corpus to be stored in a separate file. So the first step is to split the teiCorpus.
Apply txm-filter-teicorpustextgrid-xmlw.xsl stylesheet to the Dramen.xml file
you may use the TXM “ExecXSLMacro” for that purpose
it will create an “out” subfolder and produce 8 separate TEI files named by concatenating the following metadata :
creationDate
author
title
it will also add attributes @author, @title and @creationDate to the <text> element which will available for corpus exploitation
it will delete the teiHeader which does not belong to the source text
2. Primary tokenization
Copy the source files produced by the xsl transformation to a folder, e.g. “drametemp”
Open TXM and use File / Import / XML/w + CSV menu
Select the source directory (where you placed the files)
Start the import (“Play” button)
You may stop the import as soon as “Tokenizartion complete” is dispayed on the console
3. Adjust word properties and perform final import
Get the tokenized files from the $TXMHOME/corpora/dramentemp/tokenized foldel and place them to a new source folder, e.g. “dramen” (by default the folder name will become the name of the corpus)
Select the new source directory in the TXM “Import parameters of XML:w + CSV” form
Specify the main language “de” and check “Annotate the corpus” box if you want to annotate the files
Select the txm-filter-teitextgrid-xmlw-posttok.xsl stylesheet in the “Front XSL” field. This stylesheet will :
add @ref to every word (<w>) for default concordance references (filename + page number)
normalize transform <speaker> elements into @who attribute of the <sp> element (to allow comparing speakers)
raise initial <pb> tags as high as possible in the xml structure
Run the import process till the end
4. Customize edition pages
Get the XML-TXM files from the $TXMHOME/corpora/dramen/txm/DRAMEN folder
Run the txm-edition-xmltxm-textgrid.xsl stylesheet on all XML-TXM files
all original xml-tei elements will be transformed into <div>, <p> or <span> HTML elements with a @class created by concatenation of the original TEI element name and, if available, its @type and @subtype
Save or rename the standard output files with an .html extenstion
Run the txm-edition-page-split.xsl stylesheet on every .html file with the parameter cssname=txm-textgrid
the results will be written to the “default” subfolder
Create a “css” subfolder in the “default” sirectory and copy the tei.css and txm-textgrid.css files there
Replace the original “default” folder in $TXMHOME/corpora/dramen/HTML/DRAMEN with the one you have just generated
Your TXM corpus is ready!
you can customize the style of the TXM edition by editing the txm-textgrid.css file
public/tutorial_textgriddramen.txt · Dernière modification: 2016/06/23 14:01 par slh@ens-lyon.fr