Lecture 11 - Chaining

## 3.2. Chaining

Open addressing has the advantage that amount of memory needed is
fixed. It has two disadvantages:

- The first one we've just seen: RETRIEVE is very inefficient when
the key does not occur in the table.

- The other disadvantage relates to the load factor defined
earlier. With open addressing, the amount you can store in the
table is. of course, limited by the size of the table; and, what
is worse, as the load factor gets large, all the operations
degrade to
*linear*-time.

These disadvantages are all overcome by the ``chaining'' technique.

Each position in the array contains a Collection of values of
unlimited size (we use a linked implementation of some sort, with
dynamically allocated storage).

To insert a value with key K, we add it to the collection in
position H(K). Similarly, to delete a value with key K, we remove it
from the collection in position H(K). We did not have an explicit
retrieval operation on collections, but one could be added, or MAP
could be called with an appropriate match-and-save function.

Most discussions on chaining assume that the collections in each
array position are implemented as linked lists - `chains' (hence the
name of this technique). In this case INSERT is constant time, but
DELETE and RETRIEVE are LINEAR in the number of values in the chain.

However, there is no reason why the collections cannot be
implemented with more efficient data structures - balanced binary
search trees, for example. If this is done, all operations are log(L)
where L is the number of values in the collection.

If the hashing function maps the keys uniformly to all the
positions in the array, the expected number of values mapped to each
position is the *load factor*, L = k/N where k is the number of
key values and N the number of positions in the array. If a balanced
binary search tree had been used on its own, the operations would have
been O(log(k)); by using a hash table this has been reduced to
O(log(k/N)).