How Can We Help You?
- Get the latest version: Free and Paid Licenses/Downloads
- Learn how to use LingPipe: Tutorials
- Get expert help using LingPipe: Services
What is LingPipe?
LingPipe is tool kit for processing text using computational linguistics. LingPipe is used to do tasks like:
- Find the names of people, organizations or locations in news
- Automatically classify Twitter search results into categories
- Suggest correct spellings of queries
To get a better idea of the range of possible LingPipe uses, visit our tutorials and sandbox.
Architecture
LingPipe's architecture is designed to be efficient, scalable, reusable, and robust. Highlights include:
- Java API with source code and unit tests;
- multi-lingual, multi-domain, multi-genre models;
- training with new data for new tasks;
- n-best output with statistical confidence estimates;
- online training (learn-a-little, tag-a-little);
- thread-safe models and decoders for concurrent-read exclusive-write (CREW) synchronization; and
- character encoding-sensitive I/O.
Latest Release: LingPipe 4.1.0
Intermediate Release
The latest release of LingPipe is LingPipe 4.1.0, which is a feature release, as well as patching some bugs. It is fully backward compatible with LingPipe version 4.0.1.
Character, Token, and Document Suffix Arrays
The largest addition in LingPipe 4.1 is suffix arrays.
The package com.aliasi.suffixarray contains
classes for suffix arrays of characters, of tokens, or
of tokenized documents with links back to the documents
from the suffix array. Suffix arrays support finding
arbitrary length repeated strings in a large text
collection.
Serialization for Language Models
We also added serializability to a number of the language model implementations which helps them play nicely with our classifiers, taggers, etc.
TF/IDF Classifier Access Methods
We added methods to TF/IDF classifiers to access the raw IDF values for terms and raw IDF values for term/document pairs.
Line Tagging Parser
The line tagging parser was updated to handle more general end-of-line markers across platforms.
Single-Link Clustering Bug
We fixed a bug in single-link clustering which caused elements further away than the distance bound from all other elements to disappear.
Tests Fork
If you run our top-level API test through Ant, you'll find they're much slower, as in about four times slower. This isn't because LingPipe is slower, but because we rewrote the test call to fork a new process for each test. This allows the tests to succeed out of the box with under 1MB memory on the Macintosh OSX platform with their Java.
Migration from LingPipe 3 to LingPipe 4
LingPipe 4.1.0 is not backward compatible with LingPipe 3.9.3.
Programs that compile in LingPipe 3.9.3 without deprecation warnings should compile and run in Lingpipe 4.1.0.
Downloading Last 3.9 Version: LingPipe 3.9.3
The last 3.9 version of LingPipe before the major refactoring is available at: