Lecture 11 - Chaining

3.2. Chaining

Open addressing has the advantage that amount of memory needed is fixed. It has two disadvantages:

  1. The first one we've just seen: RETRIEVE is very inefficient when the key does not occur in the table.

  2. The other disadvantage relates to the load factor defined earlier. With open addressing, the amount you can store in the table is. of course, limited by the size of the table; and, what is worse, as the load factor gets large, all the operations degrade to linear-time.

These disadvantages are all overcome by the ``chaining'' technique.

Each position in the array contains a Collection of values of unlimited size (we use a linked implementation of some sort, with dynamically allocated storage).

To insert a value with key K, we add it to the collection in position H(K). Similarly, to delete a value with key K, we remove it from the collection in position H(K). We did not have an explicit retrieval operation on collections, but one could be added, or MAP could be called with an appropriate match-and-save function.

Most discussions on chaining assume that the collections in each array position are implemented as linked lists - `chains' (hence the name of this technique). In this case INSERT is constant time, but DELETE and RETRIEVE are LINEAR in the number of values in the chain.

However, there is no reason why the collections cannot be implemented with more efficient data structures - balanced binary search trees, for example. If this is done, all operations are log(L) where L is the number of values in the collection.

If the hashing function maps the keys uniformly to all the positions in the array, the expected number of values mapped to each position is the load factor, L = k/N where k is the number of key values and N the number of positions in the array. If a balanced binary search tree had been used on its own, the operations would have been O(log(k)); by using a hash table this has been reduced to O(log(k/N)).