com.aliasi.util
Class CompactHashSet<E>

java.lang.Object
  extended by java.util.AbstractCollection<E>
      extended by java.util.AbstractSet<E>
          extended by com.aliasi.util.CompactHashSet<E>
Type Parameters:
E - the type of element stored in the set
All Implemented Interfaces:
Serializable, Iterable<E>, Collection<E>, Set<E>

public class CompactHashSet<E>
extends AbstractSet<E>
implements Serializable

A CompactHashSet implements the set interface more tightly in memory and more efficiently than Java's HashSet.

Sizing and Resizing

This hash set allows arbitrary capacities sized hash sets to be created, including hash sets of size 0. When resizing is necessary due to objects being added, it resizes to the next largest capacity that is at least 1.5 times

What's wrong with HashSet?

Java's hash set implementation HashSet wraps a HashMap, requiring a map entry with a dummy value for each set entry. Map entries are even heavier as the map entries also contain a next field to implement a linked list structure to deal with collisions. This class represents hash set entries in terms of the entries themselves. Iterators are based directly on the underlying hash table.

Java's hash set implementation requires hash tables with capacities that are even powers of 2. This allows a very quick modulus operation on hash codes by using a mask, but restricts the size of hash sets to be powers of 2.

Implementation

The implementation in this class uses open addressing, which uses a simple array to store all of the set elements based on their hash codes. Hash codes are taken modulo the capacity of the hash set, so capacities above a certain small level are rounded up so that they are not powers of 2.

The current implementation uses linear probing with a step size of 1 for the case of hash code collisions. This means that if the position for an entry is full, the next position is considered (wrapping to the beginning at the end of the array).

We borrowed the supplemental hashing function from HashMap (version ID 1.73). If the initial hashFunction is h the supplemental hash function is computed by:

 static int supplementalHash(int n) {
     int n2 = n ^ (n >>> 20) ^ (n >>> 12);
     return n2 ^ (n2 >>> 7) ^ (n2 >>> 4);
 }
This is required to scramble the hash code of strings that share the same prefix with a different final character from being adjacent. Recall that the hash code for strings s consisting of characters s[0], ..., s[n-1] is defined in String.hashCode() to be:
 s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

Null Elements

Attempts to add, remove, or test null objects for membership will throw null pointer exceptions.

Illegal Types and Class Casts

When adding, removing, or checking for membership, the Object.equals(Object) method may throw class cast exceptions if the specified object is not of a type comparable to the elements of the set.

Resizing

As more elements are added to the set, it will automatically resize its underlying array of buckets.

As of this release, sets will never be resized downward.

Equality and Hash Codes

Compact hash sets satisfy the specification of the the equality metho, AbstractSet.equals(Object), and hash code method AbstractSet.hashCode() specified by the Set interface.

The implementations are inherited from the superclass AbstractSet, which uses the underlying iterators.

Concurrent Modification

This set implementation does not support concurrent modification. (Note that this does not exclusively have to do with threading, though multiple threads may lead to concurrent modifications.) If the set is modified during an iteration, or conversion to array, its behavior will be unpredictable, though it is likely to make errors in basic methods and to throw an array index or null pointer exception.

Thread Safety

This class is only thread safe with read-write locking. Any number of read operations may be executed concurrently, but writes must be executed exclusively. The write operations include any method that may potentially change the set.

Serializability

A small hash set is serializable if all of its elements are serializable. The deserialized object will be an instance of this class.

References

Since:
LingPipe3.9.1
Version:
3.9.1
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
CompactHashSet(E... es)
          Construct a compact hash set containing the specified list of values.
CompactHashSet(int initialCapacity)
          Construct a compact hash set with the specified initial capacity.
 
Method Summary
 boolean add(E e)
          Add the specified object to the set if it is not already present, returning true if the set didn't already contain the element.
 void clear()
          Remove all of the elements from this set.
 boolean contains(Object o)
          Returns true if this set contains the specified object.
 Iterator<E> iterator()
          Returns an iterator over the elements in this set.
 boolean remove(Object o)
          Removes the specified object from the set, returning true if the object was present before the remove.
 boolean removeAll(Collection<?> collection)
          Removes all the elements of the specified collection from this set, returning true if the set was modified as a result.
 boolean retainAll(Collection<?> collection)
          Remove all elements from this set that are not members of the specified collection, returning true if this set was modified as a result of the operation.
 int size()
          Returns the number of objects in this set.
 Object[] toArray()
          Returns an object array containing all of the members of this set.
<T> T[]
toArray(T[] array)
          Returns an array of elements contained in this set, the runtime type of which is that of the specified array argument.
 
Methods inherited from class java.util.AbstractSet
equals, hashCode
 
Methods inherited from class java.util.AbstractCollection
addAll, containsAll, isEmpty, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface java.util.Set
addAll, containsAll, isEmpty
 

Constructor Detail

CompactHashSet

public CompactHashSet(int initialCapacity)
Construct a compact hash set with the specified initial capacity.

Throws:
IllegalArgumentException - If the capacity is less than 1.
OutOfMemoryException - If the initial capacity is too large for the JVM.

CompactHashSet

public CompactHashSet(E... es)
Construct a compact hash set containing the specified list of values. It begins with a set of initial capacity 1.

Parameters:
es - Initial values to add to set.
Method Detail

add

public boolean add(E e)
Add the specified object to the set if it is not already present, returning true if the set didn't already contain the element. If the set already contains the object, there is no effect.

Specified by:
add in interface Collection<E>
Specified by:
add in interface Set<E>
Overrides:
add in class AbstractCollection<E>
Parameters:
e - Object to add to set.
Returns:
true if the set didn't already contain the element.
Throws:
NullPointerException - If the specified element is null.
ClassCastException - If the class of this object prevents it from being added to the set.

clear

public void clear()
Remove all of the elements from this set.

Note that this operation does not affect the underlying capacity of the set itself.

Specified by:
clear in interface Collection<E>
Specified by:
clear in interface Set<E>
Overrides:
clear in class AbstractCollection<E>

contains

public boolean contains(Object o)
Returns true if this set contains the specified object.

Specified by:
contains in interface Collection<E>
Specified by:
contains in interface Set<E>
Overrides:
contains in class AbstractCollection<E>
Parameters:
o - Object to test.
Returns:
true if this set contains the specified object.
Throws:
NullPointerException - If the specified object is null.
ClassCastException - If the type of this object is incompatible with this set.

iterator

public Iterator<E> iterator()
Returns an iterator over the elements in this set. The iterator supports the Iterator.remove() operation, which is not considered a concurrent modification.

Iteration order for sets is not guaranteed to be stable under adds or deletes, or for sets of different capacities containing the same elements.

The set must not be modified by other operations while an iteration is in process. Doing so may cause an illegal state and unpredictable behavior such as null pointer or array index exceptions.

Specified by:
iterator in interface Iterable<E>
Specified by:
iterator in interface Collection<E>
Specified by:
iterator in interface Set<E>
Specified by:
iterator in class AbstractCollection<E>
Returns:
Iterator over the set elements.

remove

public boolean remove(Object o)
Removes the specified object from the set, returning true if the object was present before the remove.

Specified by:
remove in interface Collection<E>
Specified by:
remove in interface Set<E>
Overrides:
remove in class AbstractCollection<E>
Parameters:
o - Object to remove.
Returns:
true if the object was present before the remove operation.
Throws:
ClassCastException - If the specified object is not compatible with this set.

removeAll

public boolean removeAll(Collection<?> collection)
Removes all the elements of the specified collection from this set, returning true if the set was modified as a result.

Implementation Note: Unlike the implementation the parent class AbstractSet inherits from its parent class AbstractCollection, this implementation iterates over the argument collection, removing each of its elements.

Specified by:
removeAll in interface Collection<E>
Specified by:
removeAll in interface Set<E>
Overrides:
removeAll in class AbstractSet<E>
Parameters:
collection - Collection of objects to remove.
Returns:
true if the set was modified as a result.
Throws:
NullPointerException - If any of the collection members is null, or if the specified collection is itself null.
ClassCastException - If attempting to remove a member of the collection results in a cast exception.

retainAll

public boolean retainAll(Collection<?> collection)
Remove all elements from this set that are not members of the specified collection, returning true if this set was modified as a result of the operation.

Implementation Note: Unlike the implementation that the parent class AbstractSet inherits from AbstractCollection, this implementation directly visits the underlying hash entries rather than invoking the overhead of an iterator.

Specified by:
retainAll in interface Collection<E>
Specified by:
retainAll in interface Set<E>
Overrides:
retainAll in class AbstractCollection<E>
Parameters:
collection - Collection of objects to retain.
Returns:
true if this set was modified as a result of the operation.
Throws:
ClassCastException - If comparing elements of the specified collection to elements of this set throws a class cast exception.
NullPointerException - If the specified collection is null.

size

public int size()
Returns the number of objects in this set.

Specified by:
size in interface Collection<E>
Specified by:
size in interface Set<E>
Specified by:
size in class AbstractCollection<E>
Returns:
The number of objects in this set.

toArray

public Object[] toArray()
Returns an object array containing all of the members of this set. The order of elements in the array is the iteration order, but this is not guaranteed to be stable under modifications or changes in capacity.

The returned array is fresh and may be modified without affect this set.

Specified by:
toArray in interface Collection<E>
Specified by:
toArray in interface Set<E>
Overrides:
toArray in class AbstractCollection<E>
Returns:
Array of objects in this set.

toArray

public <T> T[] toArray(T[] array)
Returns an array of elements contained in this set, the runtime type of which is that of the specified array argument.

If the specified array argument is long enough to hold all of the elements in this set, it will be filled starting from index 0. If the specified array is longer than the size of the set, the array entry following the last entry filled by this set will be set to null.

If the specified array is not long enough to hold all of the elements, a new array will be created of the appropriate type through reflection.

Specified by:
toArray in interface Collection<E>
Specified by:
toArray in interface Set<E>
Overrides:
toArray in class AbstractCollection<E>
Type Parameters:
T - Type of target array.
Parameters:
array - Array of values determining type of output and containing output elements if long enough.
Throws:
ArrayStoreException - If the members of this set cannot be inserted into an array of the specified type.
NullPointerException - If the specified array is null.