List of Command Demos
Each demo has its own page, with examples of invoking it from the command line. The navigation bar to the left has a listing, as does the first section of the top-level demos page:
The rest of this page provides instructions for running the demos in a shell from the command line.
Ways to Run Commands
Demo Interfaces
LingPipe's command demos provide three interfaces, each with its own instructions:
Content Types
All three interfaces accept the same content types (HTML, XML and plain text) and specify input/output character set encodings in the same way:
Running the Commands
The commands can be run directly or through the shell scripts
in the demos/generic/bin directory.
Unix-like Scripts
Shell scripts (with suffix .sh) can be run
directly in from the demos/generic/bin directory. For example,
the English news sentence extraction demo can be run with:
% cd $lingpipe/demos/generic/bin % sh cmd_sentence_en_news.sh "-Param1=Val1" ... "-ParamN=ValN"
Windows Dos Batch Script
Windows batch scripts (with suffix .bat) can
be run directly in Windows from the demos/generic/bin directory. For
example:
% cd $lingpipe/demos/generic/bin % cmd_sentence_en_news.bat "-Param1=Val1" ... "-ParamN=ValN"
In the installation instructions, we provide an overview of
Running Directly
The scripts are fairly simple, the only complexity coming from the fact that they call several subscripts tos et up the classpath and parameters. The scripts may be inspected, modified and run directly from the command line.
Parameter Specification
Parameters are specified on the command-line in the format:
"-Param=Value"
Note that the parameter/value pair must be quoted; this will prevent DOS or the unix-like shell from trying to expand or omit any of its contents.
Escaping Reserved Symbols
Depending on the command-line interpreter, symbols such as semicolons
or quotes might be reserved and thus need to be escaped. This is
typically done by preceding them with a slash, as in
"-a=\"b\"\;", which sets the parameter
a to value "b;" (with the value
including the quote and semicolon).
Parameter Order
The command-line arguments are not order dependent, but should not be duplicated.
Parameter Documentation
The parameters available for all commands are documented on this page and repeated for each demo's page. The command-line specific parameters are all discussed on this page.
Interface 1: Single File Input/Output
Specifying Files as Parameters
Input and output file names may be provided using the parameters:
command "-inFile=path1" "-outFile=path2" ...
Note that the quotes are required around the key/value pair specifications in order to prevent DOS from expanding the equals sign (or splitting a command with space in it into multiple command-line arguments).
For single-file input, the following parameters should be used to specify the file paths (relative or absolute).
| Parameter | Description | Usage Constraints |
|---|---|---|
inFile |
Readable input file | May not be used with inDir.
If either is not specified, defaults to standard input or output. |
outFile |
Writeable output file |
Everything's Case Sensitive
Other than file names on Windows, everything is case sensitive,
including the names of parameters. So infile can not
be used in place of inFile.
File Permissions
The input file must be an ordinary readable file. The output file must either exist and be writeable or not exist and be createable.
Interface 2: Directory Input/Output
Batch Processing and Layout
Directory input allows the batch processing of a group of files in a directory. This is more efficient than running files one at a time, because resources such as LingPipe models, not to mention the Java virtual machine, only need to be loaded once.
Recursive Crawl
The directory-based commands will walk the entire directory recursively and process each file they find. The output is written in the specified output directory with a parallel directory structure.
Specifying Directories
Input and output directory names may be provided using the parameters:
command "-inDir=path1" "-outDir=path2" ...
Note that the quotes around each key/value pair are required to prevent the DOS shell from removing the equal signs.
For directory input, the parameters to specify (absolute or relative) directory paths are:
| Parameter | Description | Usage Constraints |
|---|---|---|
inDir |
Readable input directory | May not be used with inFile or outFile.
If used, inDir and outDir must both be specified. |
outDir |
Writeable output directory |
File Permissions
The input directory must be an ordinary listable directory. The output directory must either exist and allow file creation within it, or not exist and be creatable.
Interface 3: Piped Standard Input/Output
Unix-like Defaults
The commands are set up with Unix-like defaults to standard input or output if files are not specified. If a directory is specified for input or output, then standard input/output is not available. If no input directory or input file is specified, the input is read from the standard input stream. If no output directory (and or output file is specified, the output is written directly to standard out.
Piping Input/Output
With the command-line defaulting to standard input/output behavior, it may be used in a streaming fashion. For instance, the following invocation will pipe the quoted text through the sentence detector for English news and write the resulting XML to standard output.
> echo "Hello. Goodbye." | cmd_sentence_en_news.bat
Streaming Behavior
Whether the commands will stream output as they receive input depends on the particular demo. Most of the annotation demos (sentence detection, part-of-speech, entity detection, etc.) stream; most of the classification demos do not.
Content Types and Character Encodings
Input: Text, HTML or XML
The demos process data in one of three formats: plain text, HTML or XML. The usual parameter is used for this:
| Parameter | Description | Usage Constraints |
|---|---|---|
contentType |
Input content type | May be one of:
text/plain. |
For XML and HTML content types, the following two parameters control the elements annotated:
| Parameter | Description | Usage Constraints |
|---|---|---|
removeElts |
Element tags to remove | Optional. May only be used with contentType=text/html
or contentType=text/xml. Each value may be
comma-separated list. If neither of these are
specified, all text content is processed. |
includeElts |
Elements to annotate |
Output: XML
The demo output format is XML in all cases. Plain text is minimally
wrapped in an element. HTML is parsed using NekoHTML into well-formed
XML. XML is passed through with inline annotation. In all cases,
stripping LingPipe-specific tags from the output will result in a
content identical to the input; that is, no information about
whitespace, case, etc., is lost. The one exception is when the
removeElts parameter is used to remove content (such as
HTML style markups such as italics) from the input.
Character Encoding
The demos support any character set supported by the Java runtime engine. The following parameters are used to control character sets for input and output:
| Parameter | Description | Usage Constraints |
|---|---|---|
inCharset |
Input character set | Optional. Defaults to platform default. |
outCharset |
Output character set |