Outils pour utilisateurs

Outils du site


Options Talismane Ligne de Commande

Option                                  Description                            
------                                  -----------                            
-?, --help                              show help                              
--algorithm <MachineLearningAlgorithm>  machine learning algorithm: [MaxEnt,   
                                          LinearSVM, Perceptron,               
--analyse                               analyse text                           
--beamWidth <Integer>                   beam width in pos-tagger and parser    
                                          beam search                          
--blockSize <Integer>                   The block size to use when applying    
                                          filters - if a text filter regex     
                                          goes beyond the blocksize, Talismane 
                                          will fail.                           
--builtInTemplate                       pre-defined output template:           
  <Talismane$BuiltInTemplate>             [standard, with_location, with_prob, 
                                          with_comments, original]             
--compare                               compare two annotated corpora          
--crossValidationSize <Integer>         number of cross-validation folds       
--csvEncoding <String>                  CSV file encoding in output            
--csvLocale <String>                    CSV file locale in output              
--csvSeparator <String>                 CSV file separator in output           
--cutoff <Integer>                      in how many distinct events should a   
                                          feature appear in order to get       
                                          included in the model?               
--earlyStop <Boolean>                   stop as soon as the beam contains n    
                                          terminal configurations              
--encoding <String>                     encoding for input and output          
--endModule <Talismane$Module>          where to end analysis:                 
                                          [languageDetector, sentenceDetector, 
                                          tokeniser, posTagger, parser]        
--evalFile <File>                       evaluation corpus file                 
--evalPattern <String>                  input pattern for evaluation           
--evalPatternFile <File>                input pattern file for evaluation      
--evaluate                              evaluate annotated corpus              
--excludeIndex <Integer>                cross-validation index to exclude for  
--features <File>                       a file containing the training feature 
--inFile <File>                         input file or directory                
--includeIndex <Integer>                cross-validation index to include for  
--includeUnknownWordResults <Boolean>   if true, will add a file ending with ".
                                          lexiconCoverage.csv" giving lexicon  
                                          word coverage                        
--inputEncoding <String>                encoding for input                     
--inputPattern <String>                 input pattern                          
--inputPatternFile <File>               input pattern file                     
--iterations <Integer>                  the number of training iterations      
                                          (MaxEnt, Perceptron)                 
--keepDirStructure <Boolean>            for analyse and process: if true, and  
                                          inFile is a directory, outFile will  
                                          be generated as a directory and the  
                                          inFile directory structure will be   
--labeledEvaluation <Boolean>           if true, takes both governor and       
                                          dependency label into account when   
                                          determining errors                   
--languageCorpusMap <File>              a file giving a mapping of languages   
                                          to corpora for langauge-detection    
--languageModel <File>                  statistical model for language         
--lexicalEntryRegex <File>              file describing regex for reading      
                                          lexical entries in the corpus        
--lexicon <File>                        semi-colon delimited list of pre-      
                                          compiled lexicon files               
--linearSVMCost <Double>                parameter C, typical values are powers 
                                          of 2, from 2^-5 to 2^5               
--linearSVMEpsilon <Double>             parameter epsilon, typical values are  
                                          0.01, 0.05, 0.1, 0.5                 
--locale <String>                       locale                                 
--logConfigFile <File>                  logback configuration file             
--maxParseAnalysisTime <Integer>        how long we will attempt to parse a    
                                          sentence before leaving the parse as 
                                          is, in seconds                       
--minFreeMemory <Integer>               minimum amount of remaining free       
                                          memory to continue a parse, in       
--mode <Talismane$Mode>                 execution mode: [normal, server]       
--module <Talismane$Module>             training / evaluation / processing     
                                          module: [languageDetector,           
                                          sentenceDetector, tokeniser,         
                                          posTagger, parser]                   
--newline <String>                      how to handle newlines: options are    
                                          SPACE (will be replaced by a space)  
                                          and SENTENCE_BREAK (will break       
--oneVsRest <Boolean>                   should we treat each outcome explicity 
                                          as one vs. rest, allowing for an     
                                          event to have multiple outcomes?     
--option <Talismane$ProcessingOption>   process command option: [output,       
--outDir <File>                         output directory (for writing          
                                          evaluation and analysis files other  
                                          than the standard output)            
--outFile <File>                        output file or directory (when inFile  
                                          is a directory)                      
--outputDivider <String>                a string to insert between sections    
                                          marked for output (e.g. XML tags to  
                                          be kept in the analysed output). The 
                                          String NEWLINE is interpreted as "   
                                        ". Otherwise, used literally.          
--outputEncoding <String>               encoding for output                    
--parserModel <File>                    statistical model for dependency       
--parserRules <File>                    semi-colon delimited list of files     
                                          containing parser rules              
--port <Integer>                        which port to listen on                
--posTaggerModel <File>                 statistical model for pos-tagging      
--posTaggerRules <File>                 semi-colon delimited list of files     
                                          containing pos-tagger rules          
--predictTransitions                    should the transitions leading to the  
  <Parser$PredictTransitions>             corpus dependencies be predicted -   
                                          normally only required for training  
                                          (leave at "depends"). Options are:   
                                          [yes, no, depends]                   
--process                               process annotated corpus               
--processByDefault <Boolean>            If true, the input file is processed   
                                          from the very start (e.g. TXT files).
                                          If false, we wait until a text       
                                          filter tells us to start processing  
                                          (e.g. XML files).                    
--propagateBeam <Boolean>               should we propagate the pos-tagger     
                                          beam to the parser                   
--sentenceAnnotators <File>             semi-colon delimited list of files     
                                          containing sentence annotators       
--sentenceCount <Integer>               max sentences to process               
--sentenceFile <File>                   the text of sentences represented by   
                                          the tokenised input is provided by   
                                          this file, one sentence per line     
--sentenceModel <File>                  statistical model for sentence         
--sessionId <String>                    the current session id - configuration 
                                          read as talismane.core.[sessionId]   
--startModule <Talismane$Module>        where to start analysis (or            
                                          evaluation): [languageDetector,      
                                          sentenceDetector, tokeniser,         
                                          posTagger, parser]                   
--startSentence <Integer>               first sentence index to process        
--suffix <String>                       suffix to all output files             
--template <File>                       user-defined template for output       
--testWords <String>                    comma-delimited test words for pos-    
                                          tagger feature tester                
--textAnnotators <File>                 semi-colon delimited list of files     
                                          containing text annotators           
--tokeniserBeamWidth <Integer>          beam width in tokeniser beam search    
--tokeniserModel <File>                 statistical model for tokenisation     
--tokeniserPatterns <File>              a file containing the patterns for     
                                          tokeniser training                   
--train                                 train model                            
public/options_talismane.txt · Dernière modification: 2018/04/06 17:09 par slh@ens-lyon.fr