List of Command Demos

Each demo has its own page, with examples of invoking it from the command line. The navigation bar to the left has a listing, as does the first section of the top-level demos page:

The rest of this page provides instructions for running the demos in a shell from the command line.

Ways to Run Commands

Demo Interfaces

LingPipe's command demos provide three interfaces, each with its own instructions:

Content Types

All three interfaces accept the same content types (HTML, XML and plain text) and specify input/output character set encodings in the same way:

Running the Commands

The commands can be run directly or through the shell scripts in the demos/generic/bin directory.

Unix-like Scripts

Shell scripts (with suffix .sh) can be run directly in from the demos/generic/bin directory. For example, the English news sentence extraction demo can be run with:

% cd $lingpipe/demos/generic/bin
% sh cmd_sentence_en_news.sh "-Param1=Val1" ... "-ParamN=ValN"

Windows Dos Batch Script

Windows batch scripts (with suffix .bat) can be run directly in Windows from the demos/generic/bin directory. For example:

% cd $lingpipe/demos/generic/bin
% cmd_sentence_en_news.bat "-Param1=Val1" ... "-ParamN=ValN"

In the installation instructions, we provide an overview of

Running Directly

The scripts are fairly simple, the only complexity coming from the fact that they call several subscripts tos et up the classpath and parameters. The scripts may be inspected, modified and run directly from the command line.

Parameter Specification

Parameters are specified on the command-line in the format:

"-Param=Value"

Note that the parameter/value pair must be quoted; this will prevent DOS or the unix-like shell from trying to expand or omit any of its contents.

Escaping Reserved Symbols

Depending on the command-line interpreter, symbols such as semicolons or quotes might be reserved and thus need to be escaped. This is typically done by preceding them with a slash, as in "-a=\"b\"\;", which sets the parameter a to value "b;" (with the value including the quote and semicolon).

Parameter Order

The command-line arguments are not order dependent, but should not be duplicated.

Parameter Documentation

The parameters available for all commands are documented on this page and repeated for each demo's page. The command-line specific parameters are all discussed on this page.

Interface 1: Single File Input/Output

Specifying Files as Parameters

Input and output file names may be provided using the parameters:

command  "-inFile=path1"  "-outFile=path2" ...

Note that the quotes are required around the key/value pair specifications in order to prevent DOS from expanding the equals sign (or splitting a command with space in it into multiple command-line arguments).

For single-file input, the following parameters should be used to specify the file paths (relative or absolute).

Parameter Description Usage Constraints
inFile Readable input file May not be used with inDir. If either is not specified, defaults to standard input or output.
outFile Writeable output file

Everything's Case Sensitive

Other than file names on Windows, everything is case sensitive, including the names of parameters. So infile can not be used in place of inFile.

File Permissions

The input file must be an ordinary readable file. The output file must either exist and be writeable or not exist and be createable.

Interface 2: Directory Input/Output

Batch Processing and Layout

Directory input allows the batch processing of a group of files in a directory. This is more efficient than running files one at a time, because resources such as LingPipe models, not to mention the Java virtual machine, only need to be loaded once.

Recursive Crawl

The directory-based commands will walk the entire directory recursively and process each file they find. The output is written in the specified output directory with a parallel directory structure.

Specifying Directories

Input and output directory names may be provided using the parameters:

command  "-inDir=path1"  "-outDir=path2" ...

Note that the quotes around each key/value pair are required to prevent the DOS shell from removing the equal signs.

For directory input, the parameters to specify (absolute or relative) directory paths are:

Parameter Description Usage Constraints
inDir Readable input directory May not be used with inFile or outFile. If used, inDir and outDir must both be specified.
outDir Writeable output directory

File Permissions

The input directory must be an ordinary listable directory. The output directory must either exist and allow file creation within it, or not exist and be creatable.

Interface 3: Piped Standard Input/Output

Unix-like Defaults

The commands are set up with Unix-like defaults to standard input or output if files are not specified. If a directory is specified for input or output, then standard input/output is not available. If no input directory or input file is specified, the input is read from the standard input stream. If no output directory (and or output file is specified, the output is written directly to standard out.

Piping Input/Output

With the command-line defaulting to standard input/output behavior, it may be used in a streaming fashion. For instance, the following invocation will pipe the quoted text through the sentence detector for English news and write the resulting XML to standard output.

> echo "Hello. Goodbye." | cmd_sentence_en_news.bat

Streaming Behavior

Whether the commands will stream output as they receive input depends on the particular demo. Most of the annotation demos (sentence detection, part-of-speech, entity detection, etc.) stream; most of the classification demos do not.

Content Types and Character Encodings

Input: Text, HTML or XML

The demos process data in one of three formats: plain text, HTML or XML. The usual parameter is used for this:

Parameter Description Usage Constraints
contentType Input content type May be one of:
  • text/plain
  • text/html
  • text/xml
Defaults to text/plain.

For XML and HTML content types, the following two parameters control the elements annotated:

Parameter Description Usage Constraints
removeElts Element tags to remove Optional. May only be used with contentType=text/html or contentType=text/xml. Each value may be comma-separated list. If neither of these are specified, all text content is processed.
includeElts Elements to annotate

Output: XML

The demo output format is XML in all cases. Plain text is minimally wrapped in an element. HTML is parsed using NekoHTML into well-formed XML. XML is passed through with inline annotation. In all cases, stripping LingPipe-specific tags from the output will result in a content identical to the input; that is, no information about whitespace, case, etc., is lost. The one exception is when the removeElts parameter is used to remove content (such as HTML style markups such as italics) from the input.

Character Encoding

The demos support any character set supported by the Java runtime engine. The following parameters are used to control character sets for input and output:

Parameter Description Usage Constraints
inCharset Input character set Optional. Defaults to platform default.
outCharset Output character set