com.aliasi.cluster
Class CompleteLinkClusterer<E>

java.lang.Object
  extended by com.aliasi.cluster.AbstractHierarchicalClusterer<E>
      extended by com.aliasi.cluster.CompleteLinkClusterer<E>
Type Parameters:
E - the type of objects being clustered
All Implemented Interfaces:
Clusterer<E>, HierarchicalClusterer<E>

public class CompleteLinkClusterer<E>
extends AbstractHierarchicalClusterer<E>

A CompleteLinkClusterer implements complete link agglomerative clustering. Complete link clustering is a greedy algorithm in which the two closest clusters are always merged up to a specified distance threshold. Distance between clusters for complete link clustering is defined be the maximum of the distances between the members of the clusters. See SingleLinkClusterer for a clusterer that takes the minimum rather than the maximum in making clustering decisions.

For example, consider the following distance matrix (which is also analyzed by way of example in SingleLinkClusterer):

 ABCDE
A01275
B 0386
C  059
D   04
E    0
The result of complete-link clustering is the following dendrogram:
         A   B   C   D   E
         |   |   |   |   |
      1  -----   |   |   |
      2    |     |   |   |
      3    -------   |   |
      4       |      -----
      5       |        |
      6       |        |
      7       |        |
      8       |        |
      9       ----------
 

First, the objects A and B are merged at distance one. At distance 3, the object C is merged with the cluster {A,B} because max(distance(C,A),distance(C,B))=3, and that is smaller than any other pair of distances. Next, D and E are merged at distance 4, because that is the distance between them, and they are both further than 4 away from the cluster {A,B,C}. This continues with one more step, which happens at distance 9, the distance between C and E, which is the maximum distance between pairs drawn from {A,B,C} and {D,E}.

The various clusters at each distance bound threshold are:

Threshold RangeClusters
[Double.NEGATIVE_INFINITY,1){A}, {B}, {C}, {D}, {E}
[1,3) {A,B}, {C}, {D}, {E}
[3,4){A,B,C}, {D}, {E}
[4,9){A,B,C}, {D,E}
[9,Double.POSITIVE_INFINITY]{A,B,C,D,E}
The intervals show the clusters returned for thresholds within the specified interval. As usual, square brackets denote inclusive range bounds and round brackets exclusive bounds. Although this example has the same single-link clustering, see the class documentation in SingleLinkClusterer for an example that does not.

Note that results may not be well-defined in the case of ties between proximities. If there are ties, there is no guarantee as to how this implementation will break the tie.

Implementation Note: This algorithm requires worst-case O(n3) running time to cluster n elements. The initialization loop considers all pairs of elements, thus requiring n2 running time. The main loop then walks over these pairs, requiring up to n steps each to compute new distances. If the initial array of pairs is heavily pruned, the outer loop is smaller and the updates are actually smaller, too.

Since:
LingPipe2.0
Version:
3.8.3
Author:
Bob Carpenter

Constructor Summary
CompleteLinkClusterer(Distance<? super E> distance)
          Construct a complete link clusterer with no distance bound.
CompleteLinkClusterer(double maxDistance, Distance<? super E> distance)
          Construct a complete link clusterer with the specified distance bound.
 
Method Summary
 Dendrogram<E> hierarchicalCluster(Set<? extends E> elementSet)
          Returns the array of clusters derived from performing clustering with this class's specified maximum distance.
 
Methods inherited from class com.aliasi.cluster.AbstractHierarchicalClusterer
cluster, distance, getMaxDistance, setMaxDistance
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CompleteLinkClusterer

public CompleteLinkClusterer(double maxDistance,
                             Distance<? super E> distance)
Construct a complete link clusterer with the specified distance bound.

Parameters:
maxDistance - Maximum distance between pairs of clusters to allow clustering.
distance - Distance measure to use among instances.

CompleteLinkClusterer

public CompleteLinkClusterer(Distance<? super E> distance)
Construct a complete link clusterer with no distance bound. This constructor uses Double.POSITIVE_INFINITY as the bound value, which effectively makes clustering unbounded.

Method Detail

hierarchicalCluster

public Dendrogram<E> hierarchicalCluster(Set<? extends E> elementSet)
Description copied from class: AbstractHierarchicalClusterer
Returns the array of clusters derived from performing clustering with this class's specified maximum distance. Setting the maximum distance to Double.POSITIVE_INFINITY should result in a complete clustering.

Specified by:
hierarchicalCluster in interface HierarchicalClusterer<E>
Specified by:
hierarchicalCluster in class AbstractHierarchicalClusterer<E>
Parameters:
elementSet - Set of objects to cluster.
Returns:
The dendrogram representing the hierarchical clustering of the elements.