About the Sentence Demo

Sentences form small, coherent units of text. By operating at the sentence level, downstream extraction tasks will be faster and more focused, including tasks such as tokenization, entity mention extraction, part-of-speech tagging, and relation discovery. LingPipe extracts sentences heuristically by identifying tokens in context that end sentences.

Genre-Specific Models

These demos provide examples of sentence extraction using the package com.aliasi.sentence. There are two sentence models included with LingPipe, one for English news and one for English biomedical text.

Sentence XML Markup

For this demo, sentences are marked by putting their text inside of a specified element, s, with an attribute i providing an order identifier, counting from zero; e.g. <s i="2">.

Sentence Demo on the Web

The demos are hosted on the web at the following URLs:

Sentence Demo: English News Text

http://lingpipe-demos.com:8080/lingpipe-demos/sentence_en_news/textInput.html

Sentence Demo: English Biomedical Text

http://lingpipe-demos.com:8080/lingpipe-demos/sentence_en_bio/textInput.html

For detailed information about using web demos, including web form, file upload and web service instructions, see the web demo instructions

Sentence Demo via GUI

To launch the demo in a GUI, first change directories to the command directory and then invoke the demo batch script. Note: Parameters are set in the GUI, not as arguments to the launch script.

Windows Operating System

English News

> cd %LINGPIPE_HOME%\demos\generic\bin
> gui_sentence_en_news.bat 

English Biomedical

> cd %LINGPIPE_HOME%\demos\generic\bin
> gui_sentence_en_bio.bat 

Unix-like Operating Systems

English News

> cd $LINGPIPE_HOME/demos/generic/bin
> sh gui_sentence_en_news.sh

English Biomedical

> cd $LINGPIPE_HOME/demos/generic/bin
> sh gui_sentence_en_bio.sh

For detailed information about running demos in a GUI, see the GUI demo instructions

Sentence Demo via Shell Command

Shell commands may be run over single files, all of the files in a directory, or using standard input/output.

Running over a Directory

English News

> cd $LINGPIPE/demos/generic/bin
> cmd_sentence_en_news.bat -inDir=../../data/testdir -outDir=/testout

English Biomedical

> cd $LINGPIPE/demos/generic/bin
> cmd_sentence_en_bio.bat -inDir=../../data/testdir -outDir=/testout

Running a Single File

English News

> cd $LINGPIPE/demos/generic/bin
> cmd_sentence_en_news.bat -inFile=../../data/testdir/foo.txt -outFile=foo.out.xml

Running through a Pipe (Standard input/output)

English News

> cd demos/generic/bin
> echo See Spot. See Spot run. | cmd_sentence_en_news.bat

Running in Unix-like Operating Systems

For unix-like operating systems such as Unix, Solaris, Linux, or Macintosh OS X:

For detailed information about running demos from the command line, see the command line demo instructions

Sentence Demo Scripts

The following scripts are available in $LINGIPE/demos/generic/bin for running the demo. Note that each script comes in four flavors, distinguishing command line from GUI, and the Windows DOS shell from the Unix shell.

Language Genre Mode Windows DOS Unix/Linux/Mac sh
English News Command cmd_sentence_en_news.bat cmd_sentence_en_news.sh
GUI gui_sentence_en_news.bat gui_sentence_en_news.sh
English Biomedical Command cmd_sentence_en_bio.bat cmd_sentence_en_bio.sh
GUI gui_sentence_en_bio.bat gui_sentence_en_bio.sh

Sentence Demo Parameters

The following is a complete list of parameters for the demo.

General Demo Parameters

These parameters apply to every version (web/GUI/command) of every demo.

Parameter Description Usage Constraints
inCharset Input character set Optional. Defaults to platform default.
outCharset Output character set
contentType Input content type May be one of:
  • text/plain
  • text/html
  • text/xml
Defaults to text/plain.
removeElts Element tags to remove Optional. May only be used with contentType=text/html or contentType=text/xml. Each value may be comma-separated list. If neither of these are specified, all text content is processed.
includeElts Elements to annotate

Command-Line Only Parameters

These parameters apply to every command-line demo, but are not relevant for the GUI or web versions of the demos.

Parameter Description Usage Constraints
inFile Readable input file May not be used with inDir. If either is not specified, defaults to standard input or output.
outFile Writeable output file
inDir Readable input directory May not be used with inFile or outFile. If used, inDir and outDir must both be specified.
outDir Writeable output directory