Tutorial to use the stylometric tools of the Stylo R package into TXM

Install stylo package

If needed, install Stylo (reference documentation https://sites.google.com/site/computationalstylistics/stylo):

  • switch to the R perspective: using the “R” perspective button in the toolbar or “View > Perspectives > R” menu
  • create a new R session: click on the New session button in the toolbar
    → a new “sessionX.R” script file is opened in a text editor
  • copy the following R code in the script:
    install.packages("stylo")
  • run the script: click on the “Run” button in the toolbar
    → the Console display installation messages and finish with: Rserve>* DONE (stylo)
  • should the install process abort for some reason, you can try to install Stylo from R directly:
    • under Windows, TXM uses its own R environment, so you need to run that R to install the package
    • under Mac and Linux, TXM uses the installed R, so you need to install your R package as usual
    • when you start TXM again, it will be able to use the freshly installed packages through the R it communicates with.

Import a corpus

If needed, import a new corpus into TXM:

  • lets install the Stylo sample corpus
  • unzip the archive and rename the “corpus” directory to “BritishFiction”
  • launch TXM
  • Run the “File > Import > TXT+CSV” import command
  • in the “Import parameters” form:
    • select the “BritishFiction” source directory with the folder icon
    • if needed, select the “en” main language to tune the tagging process (TreeTagger must be installed for the annotation process to work, see http://txm.sourceforge.net/installtreetagger_en.html)
    • start the import process by clicking on the “Start” icon (green arrow)

Load the stylo package in TXM R workspace

Load the stylo package into TXM:

  • edit your R session script or create a new one (by clicking on the “New Session” button in the toolbar)
  • copy the following R code in the script :

library(stylo)

  • select the R code with the mouse and execute it through the contextual menu command “R > Execute selected text”

Call stylo() function on a TXM frequencies table

Now, let's use Stylo:

  • if needed, switch to the “Corpus perspective”
  • run the “Partition” command on the “ BRITISHFICTION” corpus to compare all the texts of the corpus (with the “Corpus > Partition” main menu or contextual menu), in the parameters window:
    • use “Simple” mode
    • select the “text” structure
    • select the “id” property
    • click on “OK”
      → a new “text_id” partition is created
  • let's build a frequencies table of all the “Adj Noun” patterns:
    • run the “Index” command on the “BRITISHFICTION > text_id” partition (with the “Corpus > Partition” main menu or contextual menu):
      • use the [enpos="JJ.*"] [enpos="NN.*"] Query
      • click on the “Edit” button of the “Properties” parameter and select only the “enlemma” property (to count the lemma of the sequences instead of their graphical forms), then click on “Ok”
      • set the “Vmax” parameter to 500 (to only build the table of the 500 most frequent lemma sequences)
      • click on “Search”
        → a new “BRITISHFICTION > text_id > [enpos=“JJ.*”] [enpos=“NN.*”]:enlemma” index is created

  • let's call the 'stylo' method on the frequencies table:
    • send the Index object to R (with the “Tools > Send to R” main menu or contextual menu)
      → the index icon gets a small red “R” letter decoration to confirm the transfer to R
      → the Console displays the name of the R symbol created (“Index1” for example):
      [enpos="JJ.*"] [enpos="NN.*"]:enlemma >> Index1
    • switch to the R perspective
    • edit your R session script or create a new one (by clicking on the “New Session” button in the toolbar)
    • copy the following R code in the script :
      stylo(frequencies=t(subset(Index1$data, select= -F)), relative.frequencies=F)
      Remark: You can also directly modify 'Index1' value before giving it to stylo Index1$data <- subset(Index1$data, select= -F) then stylo(frequencies=t(Index1$data), relative.frequencies=F)
    • select the R code with the mouse and execute it through the contextual menu command “R > Execute Text Selection”
      → choose Stylo parameters settings and press “OK”
      → the result is displayed in a new window
    • when finished, close the window by executing the “dev.off()” R code

Displaying your graphics into TXM

By default, R graphics generated by Stylo are displayed in an external window and frozen (). To display Stylo graphics into usual TXM windows, you can embed the R code calling Stylo in a Groovy script managing the R device for you.

To execute the same Stylo script as before but with graphics managed by TXM:

  • create a new “stylo.groovy” file and open it in TXM text editor (with the “File > New file” main menu command)
  • copy the following Groovy code in the script :

import org.txm.stat.engine.r.RWorkspace
import org.txm.rcpapplication.commands.*

def r = RWorkspace.getRWorkspaceInstance()

// start logging R output in the console
r.setLog(true)

// use a temporary file to save the graphic
def f = File.createTempFile("txm", ".svg")

// execute R code generating the graphics in a SVG device
r.plot(f, "stylo(frequencies=t(subset(Index1\$data, select= -F)), relative.frequencies=F)")

// open a new window to display the graphics
monitor.syncExec(new Runnable() {
	@Override
	public void run() {
		OpenSVGGraph.OpenSVGFile(f.getAbsolutePath(), "stylo plot")
		}
});

// stop logging R output
r.setLog(false)

  • run the script: click on the “Execute script” button in the toolbar (green arrow)
    → each time that script is called, a new - non-freezed - TXM window is created to hold the new graphic.
public/tutorial_to_use_stylo_into_txm.txt · Dernière modification: 2014/09/06 00:53 par slh@ens-lyon.fr