com.aliasi.chunk
Class RescoringChunker<B extends NBestChunker>

java.lang.Object
  extended by com.aliasi.chunk.RescoringChunker<B>
Type Parameters:
B - the type of the underlying n-best chunker
All Implemented Interfaces:
Chunker, ConfidenceChunker, NBestChunker
Direct Known Subclasses:
AbstractCharLmRescoringChunker

public abstract class RescoringChunker<B extends NBestChunker>
extends Object
implements NBestChunker, ConfidenceChunker

A RescoringChunker provides first best, n-best and confidence chunking by rescoring n-best chunkings derived from a contained chunker.

Concrete subclasses must implement the abstract method rescore(Chunking), which provides a score for a chunking. There are no restrictions on how this score is computed; most typically, it will be a longer-distance/higher-order model than the contained chunker and provide more accurate results.

The n-best chunker works by generating the top analyses from the contained chunker. The number of such analyses considered is determined in the constructor for this class. These are then placed in a bounded priority queue with the bound determined by the maximum specified in the call to nBest(char[],int,int,int).

The first-best chunker methods chunk(CharSequence) and chunk(char[],int,int) operate by choosing the top scoring chunking from the rescoring of the contained chunker. The number of chunkings from the contained chunker that are rescored is determined in the constructor. This is more memory and time efficient than running the n-best chunking.

N-Best Chunks

The nBestChunks(char[],int,int,int) method is implemented by walking over the n-best analyses generated by nBest(char[],int,int,int) with a maximum n-best for full analyses set to the value of numChunkingsRescored(), which may be changed using setNumChunkingsRescored(int). For each analysis, the chunks are pulled out and their weight is incremented by the n-best analysis weight. Normalization is carried out by dividing by the total probability mass in the returned n-best list.

Caching

There is no caching in the rescoring chunker per se. Any caching needs to be carried out in the contained n-best chunker, which is available as the return result of baseChunker().

Since:
LingPipe2.3
Version:
3.8
Author:
Bob Carpenter

Constructor Summary
RescoringChunker(B chunker, int numChunkingsRescored)
          Construct a rescoring chunker that contains the specified base chunker and considers the specified number of chunkings for rescoring.
 
Method Summary
 B baseChunker()
          The base chunker that generates hypotheses to rescore.
 Chunking chunk(char[] cs, int start, int end)
          Returns the first-best chunking for the specified character slice.
 Chunking chunk(CharSequence cSeq)
          Returns the first-best chunking for the specified character sequence.
 Iterator<ScoredObject<Chunking>> nBest(char[] cs, int start, int end, int maxNBest)
          Returns the n-best chunkings of the specified character slice.
 Iterator<Chunk> nBestChunks(char[] cs, int start, int end, int maxNBest)
          Returns the n-best chunks for the specified character slice up to the specified maximum number of chunks.
 int numChunkingsRescored()
          Return the number of chunkings to generate from the base chunker for rescoring.
abstract  double rescore(Chunking chunking)
          Returns the score for a chunking.
 void setNumChunkingsRescored(int numChunkingsRescored)
          Set the number of base chunkings to rescore.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RescoringChunker

public RescoringChunker(B chunker,
                        int numChunkingsRescored)
Construct a rescoring chunker that contains the specified base chunker and considers the specified number of chunkings for rescoring.

Parameters:
chunker - Base n-best chunker.
numChunkingsRescored - Number of chunkings generated by the base chunker to rescore.
Method Detail

rescore

public abstract double rescore(Chunking chunking)
Returns the score for a chunking. This method is used to rescore the chunkings returned by the base chunker to order them for n-best or first-best return by this chunker. Although the base chunker's score is ignored, it may be incorporated in a subclass's implementation of this method.

The rescoring should be in the form of log (base 2) joint probability estimate for the specified chunking. For the simple whole-analysis rescoring method nBest(char[],int,int,int), this is not checked, and any values may be used in practice. For the n-best chunk method nBestChunks(char[],int,int,int), the scores are treated as log probabilities, but renormalized in order to compute conditional chunk probability estimates.

Parameters:
chunking - Chunking to rescore.
Returns:
The new score for this chunking.

baseChunker

public B baseChunker()
The base chunker that generates hypotheses to rescore. Note that this is the actual chunker used by this class, so any changes to it will affect this class's behavior. Common changes involve setting the underlying chunker's configuration.

Returns:
The base chunker.

numChunkingsRescored

public int numChunkingsRescored()
Return the number of chunkings to generate from the base chunker for rescoring.

Returns:
The number of base chunkings to rescore.

setNumChunkingsRescored

public void setNumChunkingsRescored(int numChunkingsRescored)
Set the number of base chunkings to rescore. This value will be used in every chunking method to determine the underlying number of chunkings considered.

Parameters:
numChunkingsRescored - Number of base chunkings to rescore.

chunk

public Chunking chunk(CharSequence cSeq)
Returns the first-best chunking for the specified character sequence. See the class documentation above for implementation details.

Specified by:
chunk in interface Chunker
Parameters:
cSeq - Character sequence to chunk.
Returns:
First-best chunking of the specified character sequence.

chunk

public Chunking chunk(char[] cs,
                      int start,
                      int end)
Returns the first-best chunking for the specified character slice. See the class documentation above for implementation details.

Specified by:
chunk in interface Chunker
Parameters:
cs - Underlying character array.
start - Index of first character to analyze.
end - Index of one past the last character to analyze.
Returns:
First-best chunking of the specified character slice.

nBest

public Iterator<ScoredObject<Chunking>> nBest(char[] cs,
                                              int start,
                                              int end,
                                              int maxNBest)
Returns the n-best chunkings of the specified character slice. See the class documentation above for implementation details.

Specified by:
nBest in interface NBestChunker
Parameters:
cs - Underlying character array.
start - Index of first character to analyze.
end - Index of one past the last character to analyze.
maxNBest - The maximum number of results to return.n
Returns:
Iterator over the n-best chunkings of the specified character slice.

nBestChunks

public Iterator<Chunk> nBestChunks(char[] cs,
                                   int start,
                                   int end,
                                   int maxNBest)
Returns the n-best chunks for the specified character slice up to the specified maximum number of chunks.

See the class documentation above for implementation details.

Specified by:
nBestChunks in interface ConfidenceChunker
Parameters:
cs - Underlying characters.
start - Index of first character in slice.
end - Index of one past last character in slice.
maxNBest - Maximum number of chunks to return.