|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.tag.Tagging<String>
com.aliasi.tag.StringTagging
public class StringTagging
A StringTagging is a tagging over string-based tokens
that indexes each token to a position in an underlying character
sequence.
Because tokenizers may normalize inputs, the underlying
characters between a token's start and end are not necessarily
equivalent to the token itself. That is, token(n) does not
need to be equal to characters().substring(tokenStart(n),tokenEnd(n)).
| Constructor Summary | |
|---|---|
StringTagging(List<String> tokens,
List<String> tags,
CharSequence cs,
int[] tokenStarts,
int[] tokenEnds)
Construct a string tagging from the specified string-based tokens and tags, an underlying character sequence, and arrays representing the position at which each token starts and ends. |
|
StringTagging(List<String> tokens,
List<String> tags,
CharSequence cs,
List<Integer> tokenStarts,
List<Integer> tokenEnds)
Construct a string tagging from the specified string-based tokens and tags, an underlying character sequence, and lists representing the position at which each token starts and ends. |
|
| Method Summary | |
|---|---|
String |
characters()
Returns the characters underlying this string tagging. |
boolean |
equals(Object that)
Returns true if the specified object is a string
tagging that's structurally identical to this tagging. |
int |
hashCode()
Returns a hash code computed from the underlying string and tags. |
String |
rawToken(int n)
Return the string underlying the token in the specified position. |
int |
tokenEnd(int n)
Return the character offset of the end of the token in the specified input position in the underlying characters. |
int |
tokenStart(int n)
Return the character offfset of the start of the token in the specified input position in the underlying characters. |
String |
toString()
Returns the chunking-based representation of this tagging, with chunks for each token spanning the underlying token and providing the type specified by the tag. |
| Methods inherited from class com.aliasi.tag.Tagging |
|---|
size, tag, tags, token, tokens |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public StringTagging(List<String> tokens,
List<String> tags,
CharSequence cs,
int[] tokenStarts,
int[] tokenEnds)
The lists and arrays are copied, and the character sequence converted to a string. Subsequent changes to these arguments will not affect the constructed tagging.
tokens - List of strings representing token inputs.tags - List of strings representing tag outputs, parallel to tags.cs - Underlying character sequence.tokenStarts - Starting positions of tokens, parallel to tokens.tokenEnds - Ending positions of tokens, parallel to tokens.
IllegalArgumentException - If the list of tokens, list of tags,
token starts, and token ends are not all the same length, or if a token
start/end index is not possible for the underlying characters.
public StringTagging(List<String> tokens,
List<String> tags,
CharSequence cs,
List<Integer> tokenStarts,
List<Integer> tokenEnds)
The lists are copied, and the character sequence converted to a string. Subsequent changes to these arguments will not affect the constructed tagging.
tokens - List of strings representing token inputs.tags - List of strings representing tag outputs, parallel to tags.cs - Underlying character sequence.tokenStarts - Starting positions of tokens, parallel to tokens.tokenEnds - Ending positions of tokens, parallel to tokens.
IllegalArgumentException - If the list of tokens, list of tags,
token starts, and token ends are not all the same length, or if a token
start/end index is not possible for the underlying characters.| Method Detail |
|---|
public int tokenStart(int n)
n - Position of token in input token list.
public int tokenEnd(int n)
n - Position of token in input token list.
public String rawToken(int n)
n - Token input position.
public String characters()
public String toString()
toString in class Tagging<String>public boolean equals(Object that)
true if the specified object is a string
tagging that's structurally identical to this tagging.
For taggings to be identical, their underlying strings must
be equal, all tags and tokens must be equal, and all token
starts and ends must be equal.
equals in class Objectthat - Object to compare to this tagging.
true if the specified object is a string
tagging equal to this tagging.public int hashCode()
31**N * characters().hashCode() + 31**(N-1) * token(N-1).hashCode() + 31**(N-2) * token(N-2).hashCode() + ... + 31**1 * token(1).hashCode() + 31**0 * token(0).hashCode()
This hash code is consistent with equality.
hashCode in class Object
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||