com.aliasi.util
Class Sgml

java.lang.Object
  extended by com.aliasi.util.Sgml

public class Sgml
extends Object

The Sgml class contains static methods for processing SGML into unicode characters. There is a method entityToCharacter(String) which returns the unicode character corresponding to an SGML entity. There is also a method replaceEntities(String,String) which performs a substitution for entities in an input string.

See the following document for a complete list of over 1000 entities known by this class:

Since:
LingPipe3.2
Version:
3.9.1
Author:
Bob Carpenter (from data provided by John Cowan)

Method Summary
static Character entityToCharacter(String entity)
          Returns the character represented by the specified SGML entity, or null if the entity is undefined.
static String replaceEntities(String in)
          Convenience method to call replaceEntities(String,String) with the question marked used for unknown entities.
static String replaceEntities(String in, String unknownReplacement)
          Returns the result of replacing all the entities appearing in the specified string with their corresponding unicode characters, using the specified replacement string for unknown entities.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

entityToCharacter

public static Character entityToCharacter(String entity)
Returns the character represented by the specified SGML entity, or null if the entity is undefined. Note that the SGML entity should be passed in without its preceding ampersand or following semicolon.

Parameters:
entity - Name of SGML entity (without initial ampersand and final semicolon).
Returns:
The character for the entity, or null if it is undefined.

replaceEntities

public static String replaceEntities(String in,
                                     String unknownReplacement)
Returns the result of replacing all the entities appearing in the specified string with their corresponding unicode characters, using the specified replacement string for unknown entities.

Parameters:
in - Input string.
unknownReplacement - String with which to replace unknown entities.
Returns:
The input string with entities replaced with their corresponding characters.

replaceEntities

public static String replaceEntities(String in)
Convenience method to call replaceEntities(String,String) with the question marked used for unknown entities.

Parameters:
in - Input string.
Returns:
The input string with entities replaced with their corresponding characters.