com.aliasi.cluster
Class AbstractHierarchicalClusterer<E>

java.lang.Object
  extended by com.aliasi.cluster.AbstractHierarchicalClusterer<E>
Type Parameters:
E - the type of objects being clustered
All Implemented Interfaces:
Clusterer<E>, HierarchicalClusterer<E>
Direct Known Subclasses:
CompleteLinkClusterer, SingleLinkClusterer

public abstract class AbstractHierarchicalClusterer<E>
extends Object
implements HierarchicalClusterer<E>

An AbstractHierachicalClusterer provides an adapter for clustering for hierarchical clusterers. The abstract method hierarchicalCluster(Set) defines hierarchical clustering for the specified input set, returning a dendrogram. The basic clustering interface cluster(Set) is defined by specifying a cutoff in terms of distance.

Distance measures between elements provide measures of dissimilarity in that the larger the distance the more dissimilar the members. Zero values indicate perfect similarity and larger numbers indicate less similarity. The typical example is a distance measure of some kind; closer objects are clustered more readily in these cases. A typical distance metric is Euclidean distance between vector objects. Other Minkowski metrics are also common, such as the Manhattan metric, which reduces to Hamming distance for binary vectors. Edit distance, as implemented in the com.aliasi.spell package is another popular dissimilarity metric for text. Two texts, text1 and text2, may be compared by sample cross-entropy. If Mi is the result of training a language model on texti, then a symmetric measure of of dissimilarity is M1.crossEntropy(text2) + M2.crossEntropy(text1). Averages, min or max may also be used.

Since:
LingPipe2.0
Version:
3.8
Author:
Bob Carpenter

Constructor Summary
AbstractHierarchicalClusterer(double maxDistance, Distance<? super E> distance)
          Construct an abstract hierarchical clusterer with the specified maximum distance.
 
Method Summary
 Set<Set<E>> cluster(Set<? extends E> elements)
          Returns the clustering of the specified elements.
 Distance<? super E> distance()
          Returns the distance function for this hierarchical clusterer.
 double getMaxDistance()
          Returns the maximum distance for clusters in a dendrogram.
abstract  Dendrogram<E> hierarchicalCluster(Set<? extends E> elements)
          Returns the array of clusters derived from performing clustering with this class's specified maximum distance.
 void setMaxDistance(double maxDistance)
          Sets the maximum distance at which two clusters may be merged.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AbstractHierarchicalClusterer

public AbstractHierarchicalClusterer(double maxDistance,
                                     Distance<? super E> distance)
Construct an abstract hierarchical clusterer with the specified maximum distance. The distance must be a number greater than or equal to zero, but it may be positive infinity.

Parameters:
maxDistance - Maximum distance between clusters that can be linked. // * @param minClusters Minimum number of clusters to return. // * @param maxClusters Maximum number of clusters to return.
Throws:
IllegalArgumentException - If the specified distance is not a non-negative number.
Method Detail

distance

public Distance<? super E> distance()
Returns the distance function for this hierarchical clusterer.

Returns:
The distance function for this hierarchical clusterer.

hierarchicalCluster

public abstract Dendrogram<E> hierarchicalCluster(Set<? extends E> elements)
Returns the array of clusters derived from performing clustering with this class's specified maximum distance. Setting the maximum distance to Double.POSITIVE_INFINITY should result in a complete clustering.

Specified by:
hierarchicalCluster in interface HierarchicalClusterer<E>
Parameters:
elements - Set of objects to cluster.
Returns:
The dendrogram representing the hierarchical clustering of the elements.

cluster

public Set<Set<E>> cluster(Set<? extends E> elements)
Returns the clustering of the specified elements. The clustering is determined by splitting a complete hierarchical clustering at this class's distance bound. Thus the pairwise distances between the sets in the clustering returned will all be greater than this clusterer's maximum distance.

Specified by:
cluster in interface Clusterer<E>
Parameters:
elements - Elements to cluster.
Returns:
Clustering of elements.

getMaxDistance

public double getMaxDistance()
Returns the maximum distance for clusters in a dendrogram.

Returns:
The maximimum distance score for a dendrogram to remain after cutting.

setMaxDistance

public final void setMaxDistance(double maxDistance)
Sets the maximum distance at which two clusters may be merged.

Parameters:
maxDistance - New value for maximum distance.