Tutorial for patching page numbers in the SCAT corpus

This tutorial is produced in the framework of SCAT-TXM joint project

  1. Download and unzip the import pack containing stylesheets and import settings for SCAT, this will create a SCAT2018 directory
  2. Copy the source files in the SCAT2018 directory
  3. Make sure that page numbers are unique in each source text
    • You may use the 'check-page-duplicates.xsl' stylesheet from xsl/lib subfolder of the import pack to check if there are duplicates or missing page numbers
    • Apply it to any XML file in the corpus source directory
  4. Import SCAT corpus using XTZ module
  5. Close TXM
  6. Access the binary corpus directory $USER/TXM/corpora/SCAT2018
  7. Apply the “txm-import-pageNamePatch-scat.xsl” stylesheet from xsl/lib to import.xml in the binary corpus directory
    • This will copy the original import.xml file to import-back.xml
    • A directory “main” will be created in HTML/SCAT2018 subfolder
    • Save the result of the transformation as import.xml (overwrite the original file)
  8. Copy the “css” folder from “default” to “main”
  9. The original “default” folder will no longer be used, you may delete it to save disk space
  10. Open TXM and check if the edition page numbers on top of the pages and in navigation panel correspond
