com.aliasi.tag
Class StringTagging

java.lang.Object
  extended by com.aliasi.tag.Tagging<String>
      extended by com.aliasi.tag.StringTagging

public class StringTagging
extends Tagging<String>

A StringTagging is a tagging over string-based tokens that indexes each token to a position in an underlying character sequence.

Because tokenizers may normalize inputs, the underlying characters between a token's start and end are not necessarily equivalent to the token itself. That is, token(n) does not need to be equal to characters().substring(tokenStart(n),tokenEnd(n)).

Since:
LingPipe3.9
Version:
3.9
Author:
Bob Carpenter

Constructor Summary
StringTagging(List<String> tokens, List<String> tags, CharSequence cs, int[] tokenStarts, int[] tokenEnds)
          Construct a string tagging from the specified string-based tokens and tags, an underlying character sequence, and arrays representing the position at which each token starts and ends.
StringTagging(List<String> tokens, List<String> tags, CharSequence cs, List<Integer> tokenStarts, List<Integer> tokenEnds)
          Construct a string tagging from the specified string-based tokens and tags, an underlying character sequence, and lists representing the position at which each token starts and ends.
 
Method Summary
 String characters()
          Returns the characters underlying this string tagging.
 boolean equals(Object that)
          Returns true if the specified object is a string tagging that's structurally identical to this tagging.
 int hashCode()
          Returns a hash code computed from the underlying string and tags.
 String rawToken(int n)
          Return the string underlying the token in the specified position.
 int tokenEnd(int n)
          Return the character offset of the end of the token in the specified input position in the underlying characters.
 int tokenStart(int n)
          Return the character offfset of the start of the token in the specified input position in the underlying characters.
 String toString()
          Returns the chunking-based representation of this tagging, with chunks for each token spanning the underlying token and providing the type specified by the tag.
 
Methods inherited from class com.aliasi.tag.Tagging
size, tag, tags, token, tokens
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

StringTagging

public StringTagging(List<String> tokens,
                     List<String> tags,
                     CharSequence cs,
                     int[] tokenStarts,
                     int[] tokenEnds)
Construct a string tagging from the specified string-based tokens and tags, an underlying character sequence, and arrays representing the position at which each token starts and ends.

The lists and arrays are copied, and the character sequence converted to a string. Subsequent changes to these arguments will not affect the constructed tagging.

Parameters:
tokens - List of strings representing token inputs.
tags - List of strings representing tag outputs, parallel to tags.
cs - Underlying character sequence.
tokenStarts - Starting positions of tokens, parallel to tokens.
tokenEnds - Ending positions of tokens, parallel to tokens.
Throws:
IllegalArgumentException - If the list of tokens, list of tags, token starts, and token ends are not all the same length, or if a token start/end index is not possible for the underlying characters.

StringTagging

public StringTagging(List<String> tokens,
                     List<String> tags,
                     CharSequence cs,
                     List<Integer> tokenStarts,
                     List<Integer> tokenEnds)
Construct a string tagging from the specified string-based tokens and tags, an underlying character sequence, and lists representing the position at which each token starts and ends.

The lists are copied, and the character sequence converted to a string. Subsequent changes to these arguments will not affect the constructed tagging.

Parameters:
tokens - List of strings representing token inputs.
tags - List of strings representing tag outputs, parallel to tags.
cs - Underlying character sequence.
tokenStarts - Starting positions of tokens, parallel to tokens.
tokenEnds - Ending positions of tokens, parallel to tokens.
Throws:
IllegalArgumentException - If the list of tokens, list of tags, token starts, and token ends are not all the same length, or if a token start/end index is not possible for the underlying characters.
Method Detail

tokenStart

public int tokenStart(int n)
Return the character offfset of the start of the token in the specified input position in the underlying characters.

Parameters:
n - Position of token in input token list.
Returns:
Character offset of first character in the token in the underlying characters.

tokenEnd

public int tokenEnd(int n)
Return the character offset of the end of the token in the specified input position in the underlying characters.

Parameters:
n - Position of token in input token list.
Returns:
Character offset of last character plus 1 in the token in the underlying characters.

rawToken

public String rawToken(int n)
Return the string underlying the token in the specified position.

Parameters:
n - Token input position.
Returns:
Underlying token string.

characters

public String characters()
Returns the characters underlying this string tagging.

Returns:
Underlying character string.

toString

public String toString()
Returns the chunking-based representation of this tagging, with chunks for each token spanning the underlying token and providing the type specified by the tag.

Overrides:
toString in class Tagging<String>
Returns:
Chunking representation of this string tagging. public Chunking toChunking() { ChunkingImpl chunking = new ChunkingImpl(characters()); for (int n = 0; n < mTokenStarts.length; ++n) { Chunk chunk = ChunkFactory.createChunk(tokenStart(n), tokenEnd(n), tag(n)); chunking.add(chunk); } return chunking; }

equals

public boolean equals(Object that)
Returns true if the specified object is a string tagging that's structurally identical to this tagging. For taggings to be identical, their underlying strings must be equal, all tags and tokens must be equal, and all token starts and ends must be equal.

Overrides:
equals in class Object
Parameters:
that - Object to compare to this tagging.
Returns:
true if the specified object is a string tagging equal to this tagging.

hashCode

public int hashCode()
Returns a hash code computed from the underlying string and tags. The hash code is computed for a size N tagging as:
 31**N * characters().hashCode()
   + 31**(N-1) * token(N-1).hashCode()
   + 31**(N-2) * token(N-2).hashCode()
   + ...
   + 31**1 * token(1).hashCode()
   + 31**0 * token(0).hashCode()
 

This hash code is consistent with equality.

Overrides:
hashCode in class Object
Returns:
Hash code for this string tagging.