|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.chunk.RegExChunker
public class RegExChunker
A RegExChunker finds chunks that matches regular
expressions. Specifically, a matcher is created and its Matcher.find() method is used to iterate over matching text
segments and convert them to chunks.
The behavior of the find method is largely determined by the
specific instance of Pattern) on which the chunker is
based. For more information, see Sun's RegEx
Tutorial.
All found chunks will receive a type and score that is specified at construction time.
Warning: Java uses the same regular expression matching
as Perl. Perl uses a greedy
strategy for quantifiers, taking something like .* to
match as many characters as possible. In constrast, disjunction
uses a first-match strategy. For example, the regular expression
ab|abc will not produce the same chunker as
abc|ab; for input abcde, the former will
return ab as a chunk, whereas the latter will return
abc. This first-best matching through disjunctions
takes precedence over any quantifiers applied to the strings.
For convenience, this class implements both the util.Compilable
and java.io.Serializable interfaces. These both store the
same thing, namely the string underlying the regex pattern, the chunk type
and the score. The reconstituted object will also be an instance of this
class.
| Constructor Summary | |
|---|---|
RegExChunker(Pattern pattern,
String chunkType,
double chunkScore)
Construct a chunker based on the specified regular expression pattern, producing the specified chunk type and score. |
|
RegExChunker(String regex,
String chunkType,
double chunkScore)
Construct a chunker based on the specified regular expression, producing the specified chunk type and score. |
|
| Method Summary | |
|---|---|
Chunking |
chunk(char[] cs,
int start,
int end)
Return the chunking of the specified character slice. |
Chunking |
chunk(CharSequence cSeq)
Return the chunking of the specified character sequence. |
void |
compileTo(ObjectOutput out)
Compiles this regular-expression chunker to the specified object output. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public RegExChunker(String regex,
String chunkType,
double chunkScore)
Pattern.compile(String).
regex - Regular expression for chunks.chunkType - Type for all found chunks.chunkScore - Score for all found chunks.
public RegExChunker(Pattern pattern,
String chunkType,
double chunkScore)
pattern - Regular expression patternfor chunks.chunkType - Type for all found chunks.chunkScore - Score for all found chunks.| Method Detail |
|---|
public Chunking chunk(CharSequence cSeq)
Matcher.find() as applied
to the regular expression pattern underlying this chunker.
chunk in interface ChunkercSeq - Character sequence to chunk.
public void compileTo(ObjectOutput out)
throws IOException
compileTo in interface Compilableout - Object output to which this chunker is compiled.
IOException - If there is an underlying I/O error during
the write.
public Chunking chunk(char[] cs,
int start,
int end)
chunk in interface Chunkercs - Underlying character sequence.start - Index of first character in slice.end - Index of one past the last character in the slice.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||