TXM XTZ Import Tutorial for DEMM 2015-03-12 Workshop

This tutorial is created for the TXM workshop at the DEMM (Digital Editing of Medieval Manuscripts) programme training session in Lyon (12 March 2015). The working plan of the session is available in a Google document.

Software Preparation

  • Copy (merge) the content of the archive to the “USERHOME/TXM” folder. To do so, you should:
    • unzip the archive on your disk
    • then copy the folders named “scripts”, “xsl” and “css” in the “USERHOME/TXM” folder
  • Start TXM
  • Call the 'File' / 'Import' / 'XML/w+CSV' import module and ceate an import configuration file (“import.xml”) by selecting the source folder in the import parameters form
  • Open the XTZImporter folder in the 'View / Macro' view

Preparing the sources folder

The source folder must:

  • contain the “import.xml” file
    • note that the “Front XSL” import option is currently incompatible with “Synoptic edition” option of the macro
  • contain the “import.properties” file with the following lines:
    ignoredelements=note|teiHeader
    editionpage=cb
  • if the “Synoptic” option is used, the source folder must contain an “img” folder with a “xxx” subfolder for each “xxx.xml” source document. This subfolder contains images corresponding to the pages of the edition, the alphabetical order of image file names must match the order of the pages of each text.

A. Import sources + "stylable" edition

  • In TXM, open the 'View' / 'Macro' view and launch the XTZImporter macro
    XTZ Macro screenshot
    • Select the following options:
      • sourceDirectory : '…/XTZImporterPack/EXCERPTUMROBERTI-XML-TEI'
      • SpannedEdition : 'true'
      • synoptic : 'false'
      • mediapath : ''
      • facsEditionName : ''
    • Run the macro
    • The console should display something like:

Running XTZImporterMacro.groovy
Parameters: 
	srcDirectory: /media/alexey/data/Mes documents/Enseignement/DEMM 2015-03-12/XTZImporterPack/EXCERPTUMROBERTI-XML-TEI
	SpannedEdition: true
	synoptic: false
	mediaPath: img/
	facsEditionName: facs
Trying to read import properties file: /media/alexey/data/Mes documents/Enseignement/DEMM Erasmus M. Burghart 2015-03-12/XTZImporterPack/EXCERPTUMROBERTI-XML-TEI/import.properties
import properties: 
 sort metadata: null
 edition page tag: cb
 normalize attributes: false
 ignored elements: note|teiHeader|back
 stop if a XML source is malformed: false
-- Apply xsl /media/alexey/data/TXM/xsl/txm-filter-teip5-xmlw-excerptumroberti.xsl with parameters: {}
.

Trying to read metadatas from: /media/alexey/data/Mes documents/Enseignement/DEMM Erasmus M. Burghart 2015-03-12/XTZImporterPack/EXCERPTUMROBERTI-XML-TEI/metadata.csv
-- IMPORTER - Reading source files
Sources validation
.
Files processed: [/home/alexey/TXM/corpora/excerptumrobertixmltei/txm/EXCERPTUMROBERTIXMLTEI/excerptumroberti-tei.xml]
Tokenizing 1 files
.
Building XML-TXM (1 files)
.
-- INJECTING METADATA - [text, author, title, creationdatedesc, creationdate, msidentifier, msdate, msdatedesc, language, copyright] in texts of directory /home/alexey/TXM/corpora/excerptumrobertixmltei/txm
.
-- ANNOTATE - Running NLP tools
Building TT source files (1) from directory /home/alexey/TXM/corpora/excerptumrobertixmltei/txm/EXCERPTUMROBERTIXMLTEI
.
Applying la.par TreeTagger model on dir: /home/alexey/TXM/corpora/excerptumrobertixmltei/treetagger (1 files)
.
Building stdoff files (1) from dir:/home/alexey/TXM/corpora/excerptumrobertixmltei/treetagger to /home/alexey/TXM/corpora/excerptumrobertixmltei/annotations
.
Injecting stdoff files (1) data from /home/alexey/TXM/corpora/excerptumrobertixmltei/annotations to xml-txm files of /home/alexey/TXM/corpora/excerptumrobertixmltei/txm/EXCERPTUMROBERTIXMLTEI
.
-- COMPILING - Building Search Engine indexes
Compiling 1 [/home/alexey/TXM/corpora/excerptumrobertixmltei/txm/EXCERPTUMROBERTIXMLTEI/excerptumroberti-tei.xml] 
.
P-attributes: [id, n, lalemma, type, lapos]
S-attributes: [ab:0+n, author:0+n, back:0+n, bibl:1+id+subtype+type+n, body:0+n, cb:0+break+n, div:0+n+type, gap:0+reason+n, head:0+n, lb:0+break+n, listbibl:0+n, name:0+ref+type+n, note:0+id+n+type, p:0+n, pb:0+break+n, quote:0+source+n, ref:0+target+n, text:0+id+base+project+author+title+text+creationdatedesc+msidentifier+copyright+language+msdate+creationdate+msdatedesc, title:0+n, txmcorpus:0+lang]
-- EDITION - Building edition
Paginating texts: [/home/alexey/TXM/corpora/excerptumrobertixmltei/txm/EXCERPTUMROBERTIXMLTEI/excerptumroberti-tei.xml]
.-- Building Spanned edition...
ARGS=[monitor:Running XTZImporterMacro.groovy(200), params:BaseParameters [name=excerptumrobertixmltei, date=Thu Jan 15 00:03:00 CET 11, author=alexey, version=0.7, description=,
 links={}, corpora={EXCERPTUMROBERTIXMLTEI=[corpus: null]},
 root=[import: null], corporaElement=[corpora: null]], binDirectory:/home/alexey/TXM/corpora/excerptumrobertixmltei, txmDirectory:/home/alexey/TXM/corpora/excerptumrobertixmltei/txm/EXCERPTUMROBERTIXMLTEI, corpus:/media/alexey/data/Mes documents/Enseignement/DEMM Erasmus M. Burghart 2015-03-12/XTZImporterPack/EXCERPTUMROBERTI-XML-TEI, xslEdition:txm-edition-xtz-cb.xsl, xslPages:txm-edition-page-split.xsl, editionname:default, useTokenizedDirectory:false]
Parameters:
	xslEdition = txm-edition-xtz-cb.xsl
	xslPages = txm-edition-page-split.xsl
	editionName = default
	useTokenizedDirectory = false
XSLs: txm-edition-xtz-cb.xsl & txm-edition-page-split.xsl

Backup of /home/alexey/TXM/corpora/excerptumrobertixmltei/HTML/EXCERPTUMROBERTIXMLTEI/default directory to /home/alexey/TXM/corpora/excerptumrobertixmltei/HTML-default-back...
......
Applying XSL 1: /home/alexey/TXM/xsl/txm-edition-xtz-cb.xsl...
.
Applying XSL 2: /home/alexey/TXM/xsl/txm-edition-page-split.xsl...
.
New edition created.
Loading corpus...
Running SearchEngine in memory mode.
Reloading subcorpora and partitions...Done.
Reloading corpora view...
import done.
Done: 27506 ms 

B. Import sources + synoptic edition + "stylable" edition

  • In TXM, Call the 'File' / 'Import' / 'XML/w+CSV' import module and select the “XTZImporterPack/EXCERPTUMROBERTI-XML-TXMW” source folder
  • Select “la” as the main language
  • Start import

The following step is only necessary on Linux and Mac OS systems (where directory names are case sensitive)

  • Once import finished, open the import.xml file
    • replace (line 5)
      •         name="excerptumrobertixmltxmw"
      • with
      •         name="EXCERPTUMROBERTIXMLTXMW"
  • Open the “macro” view
  • Double-click on the “org/txm/macro/importer/XTZImporter” macro
    • Select the following options:
      • sourceDirectory : '…/XTZImporterPack/EXCERPTUMROBERTI-XMLW'
      • SpannedEdition : 'true'
      • synoptic : 'true'
      • mediapath : ''
      • facsEditionName : ''
    • Run the macro
    • The console should display something like:

Running XTZImporterMacro.groovy
Parameters: 
	srcDirectory: /media/alexey/data/Mes documents/Enseignement/DEMM Erasmus M. Burghart 2015-03-12/XTZImporterPack/excerptumroberti-xml-txmw
	SpannedEdition: true
	synoptic: true
	mediaPath: img/
	facsEditionName: facs
Trying to read import properties file: /media/alexey/data/Mes documents/Enseignement/DEMM Erasmus M. Burghart 2015-03-12/XTZImporterPack/excerptumroberti-xml-txmw/import.properties
import properties: 
 sort metadata: null
 edition page tag: cb
 normalize attributes: false
 ignored elements: note|teiHeader|back
 stop if a XML source is malformed: false
-- Adding the cb@facs attributes
ARGS=[monitor:Running XTZImporterMacro.groovy(267), sourceDirectory:/media/alexey/data/Mes documents/Enseignement/DEMM Erasmus M. Burghart 2015-03-12/XTZImporterPack/excerptumroberti-xml-txmw, imageDirectory:/media/alexey/data/Mes documents/Enseignement/DEMM 2015-03-12/XTZImporterPack/excerptumroberti-xml-txmw/img, outputDirectory:/home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/facs_src, element:cb, attribute:facs, prefix:img/]
Parameters:
	sourceDirectory: /media/alexey/data/Mes documents/Enseignement/DEMM Erasmus M. Burghart 2015-03-12/XTZImporterPack/excerptumroberti-xml-txmw
	imageDirectory: /media/alexey/data/Mes documents/Enseignement/DEMM Erasmus M. Burghart 2015-03-12/XTZImporterPack/excerptumroberti-xml-txmw/img
	outputDirectory: /home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/facs_src
	element: cb
	attribute: facs
srcFiles=[/media/alexey/data/Mes documents/Enseignement/DEMM Erasmus M. Burghart 2015-03-12/XTZImporterPack/excerptumroberti-xml-txmw/excerptumroberti-w.xml]
Processing 'excerptumroberti-w' directory with 5 images.
switching srcDir to /home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/facs_src
Trying to read metadatas from: /media/alexey/data/Mes documents/Enseignement/DEMM Erasmus M. Burghart 2015-03-12/XTZImporterPack/excerptumroberti-xml-txmw/metadata.csv
-- IMPORTER - Reading source files
Sources validation
.
Files processed: [/home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/txm/EXCERPTUMROBERTIXMLTXMW/excerptumroberti-w.xml]
Tokenizing 1 files
.
Building XML-TXM (1 files)
.
-- INJECTING METADATA - [text, author, title, creationdatedesc, creationdate, msidentifier, msdate, msdatedesc, language, copyright] in texts of directory /home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/txm
.
-- ANNOTATE - Running NLP tools
Building TT source files (1) from directory /home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/txm/EXCERPTUMROBERTIXMLTXMW
.
Applying en.par TreeTagger model on dir: /home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/treetagger (1 files)
.
Building stdoff files (1) from dir:/home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/treetagger to /home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/annotations
.
Injecting stdoff files (1) data from /home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/annotations to xml-txm files of /home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/txm/EXCERPTUMROBERTIXMLTXMW
.
-- COMPILING - Building Search Engine indexes
Compiling 1 [/home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/txm/EXCERPTUMROBERTIXMLTXMW/excerptumroberti-w.xml] 
.
P-attributes: [id, ref, enpos, enlemma, n, type, mlalemma, mlapos]
S-attributes: [ab:0+n, author:0+n, back:0+n, bibl:1+id+subtype+type+n, body:0+n, cb:0+break+facs+n, div:0+n+type, gap:0+reason+n, head:0+n, lb:0+break+n, listbibl:0+n, name:0+ref+type+n, note:0+id+n+type, p:0+n, pb:0+break+n, quote:0+source+n, ref:0+target+n, text:0+id+base+project+author+title+text+creationdatedesc+msidentifier+copyright+language+msdate+creationdate+msdatedesc, title:0+n, txmcorpus:0+lang]
-- EDITION - Building edition
Paginating texts: [/home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/txm/EXCERPTUMROBERTIXMLTXMW/excerptumroberti-w.xml]
.-- Building the 'facs' image edition...
ARGS=[monitor:Running XTZImporterMacro.groovy(267), params:BaseParameters [name=EXCERPTUMROBERTIXMLTXMW, date=Fri Jan 15 00:03:00 CET 12, author=alexey, version=0.7, description=,
 links={}, corpora={EXCERPTUMROBERTIXMLTXMW=[corpus: null]},
 root=[import: null], corporaElement=[corpora: null]], binDirectory:/home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW, txmDirectory:/home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/txm/EXCERPTUMROBERTIXMLTXMW, corpus:/media/alexey/data/Mes documents/Enseignement/DEMM Erasmus M. Burghart 2015-03-12/XTZImporterPack/excerptumroberti-xml-txmw, editionName:facs, tag:cb, attribute:facs]
Parameters:
	attribute = facs
	tag =  cb
	editionName =  facs
Working directory=/home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/txm/EXCERPTUMROBERTIXMLTXMW
** Updating corpus configuration...
** Building new edition HTML files...
 Creating edition 'facs' directory: '/home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/HTML/EXCERPTUMROBERTIXMLTXMW/facs'
 Building HTML pages of text=excerptumroberti-w
 add page 0 w_0
 add page 1 w_0
 add page 2 w_ExcerptumRoberti_col_1-5_1
 add page 3 w_ExcerptumRoberti_col_1-5_176
 add page 4 w_ExcerptumRoberti_col_1-5_390
 add page 5 w_ExcerptumRoberti_col_1-5_626
 Building edition references in corpus configuration
 Saving corpus configuration...
New edition created.
copying images into binary corpus: /home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/HTML/EXCERPTUMROBERTIXMLTXMW/facs/img ...
copy done
-- Building Spanned edition...
ARGS=[monitor:Running XTZImporterMacro.groovy(267), params:BaseParameters [name=EXCERPTUMROBERTIXMLTXMW, date=Fri Jan 15 00:03:00 CET 12, author=alexey, version=0.7, description=,
 links={}, corpora={EXCERPTUMROBERTIXMLTXMW=[corpus: null]},
 root=[import: null], corporaElement=[corpora: null]], binDirectory:/home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW, txmDirectory:/home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/txm/EXCERPTUMROBERTIXMLTXMW, corpus:/media/alexey/data/Mes documents/Enseignement/DEMM Erasmus M. Burghart 2015-03-12/XTZImporterPack/excerptumroberti-xml-txmw, xslEdition:txm-edition-xtz-cb.xsl, xslPages:txm-edition-page-split.xsl, editionname:default, useTokenizedDirectory:false]
Parameters:
	xslEdition = txm-edition-xtz-cb.xsl
	xslPages = txm-edition-page-split.xsl
	editionName = default
	useTokenizedDirectory = false
XSLs: txm-edition-xtz-cb.xsl & txm-edition-page-split.xsl

Backup of /home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/HTML/EXCERPTUMROBERTIXMLTXMW/default directory to /home/alexey/TXM/corpora/EXCERPTUMROBERTIXMLTXMW/HTML-default-back...
......
Applying XSL 1: /home/alexey/TXM/xsl/txm-edition-xtz-cb.xsl...
.
Applying XSL 2: /home/alexey/TXM/xsl/txm-edition-page-split.xsl...
.
New edition created.
Loading corpus...
Running SearchEngine in memory mode.
Reloading subcorpora and partitions...Done.
Reloading corpora view...
import done.

public/tutorial_macro_xtzimporter_en_demm20150312.txt · Dernière modification: 2015/10/16 16:25 par alexei.lavrentev@ens-lyon.fr