Retours TXM 0.7.2

Cette page contient seulement les retours de problèmes de TXM 0.7.2. Pour les demandes de fonctionnalités, c'est ici que ça se passe : Demandes de fonctionnalités

Archive des bugs TXM 0.7.2

Format des retours

  • initiales du testeur (date) : description du problème
    • Exemple

BP (2011-06-28) : le formulaire…

  • initiales (date), réponse (“OK”, “à retester”…)
  • suite du dialogue éventuelle
  • utiliser la balise <quote> pour les citations
  • utiliser la balise <code> pour les extraits de scripts

Participants

  • Matthieu Decorde : MD
  • Alexei Lavrentiev : AL
  • Bénédicte Pincemin : BP
  • Frédéric Erlos : FE
  • Julien Bonneau : JB
  • Serge Heiden: SH

Retours

Général

Import CWB

  • KN (2013-08-09): As an cwb user, I am trying to feed TXM with corpora which originate from CWB scripts (a set of .vrt files). The process, however, fails each time, as TXM is looking for .wtc file which obviously is not there (error: “No WTC file in source directory”). Since as someone already pointed out .wtc format is not well documented (is it?), I would like to ask if there is an easy way to convert .vrt corpus to .wtc file. I am sure that it is not a corpus size that matters: the import fails with large and small corpora. My OS: Ubuntu/Xubuntu/Kubuntu 64-bit 13.04.
    • SH (2013-08-11): Currently, the WTC/CWB format(*) imported by TXM is only the one generated by TXM itself for all the different import modules (you can look for examples in the various 'wtc' directories of corpora already imported into TXM: in the '$HOME/TXM/corpora' directory).
    • Here is a simple definition:
      1. a corpus is defined by only one wtc file. The TXM CWB import module must be given the full path to the source directory which contains the wtc file - and optionaly a cqp registry file;
      2. a wtc file must have the '.wtc' extension. Note 1: The IMS Corpus Workbench project never gave a name to that format. Files may have extensions like '.vrt', '.cqp', '.cwb' or '.wtc' depending on context. The 'wtc' extension was coined in the Weblex defunct project. Note 2: in the future TXM will use the '.cwb' extension;
      3. the wtc format follows the general format used by CWB to compile corpus sources for the CQP search engine (encode, makeall…);
      4. positional attributes are automaticaly named 'p1', 'p2'… unless you drop a usual cqp 'registry' file for your corpus into the source directory with the correct informations;
      5. the wtc file encodes all the text units of the corpus. To get text editions (very basic) built into TXM, you need to use the TXM format for the text units tags:

Each text unit must be delimited by the following tags:

<text id="textid1" @your-attributes...>
...
</text>

Each text unit must have a different @id attribute value.

(*) Although the 'WTC import module' has already been renamed 'CWB import module' in TXM 0.7.2, its parameter window and the file extension still use the 'wtc' name and file extension. Mind that this will change in the near future: file extension will need to be '.cwb'.

public/retours_de_bugs_logiciel/txm_0.7.2.txt · Dernière modification: 2013/08/11 13:53 par slh@ens-lyon.fr