Hash Tables With Open Addressing

April 30, 2019

We have studied hash tables in several previous exercises, most of them of the “chaining” variety, where collisions are resolved by inserting each item in a linked list in the bucket to which it hashes. Today’s exercise looks at two different varieties of hash tables that use “open addressing,” in which all keys are stored in the buckets themselves, one key per bucket, with collisions resolved by moving them to a different bucket. We studied one type of open addressing in the exercise on cuckoo hashing, today we will see two others: linear probing and double hashing. Our hash tables consist of M memory addresses that hold N keys (or key/value pairs). Obviously, N must be less than M, and the load factor N M is critical to the performance of the algorithm; if the table becomes too loaded, search times increase dramatically. Open address hash tables work best when N can be predicted with some reliability, so an appropriate M can be chosen.

The linear probing algorithm hashes the key onto the range 0 .. M−1 and looks first in the indicated bucket. If the key is there, it is returned. If the bucket is empty, the key is not present in the table. If the bucket is non-empty, but the key doesn’t match, the search goes to the next bucket in order, wrapping around at the end of the table. Double hashing is similar, except that a second hash function determines the increment between probes. When double hashing, we must arrange that the increment is non-zero (otherwise the search will never visit any bucket except the first) and is co-prime to M (otherwise some buckets will never be visited), which is most easily arranged by making M prime.

Deletions require some care. We can’t just mark a bucket available when its key is deleted, because subsequent searches will miss keys that are beyond the newly-empty bucket. Instead, we mark the bucket with a special value showing that it is deleted, and consider that bucket to be non-empty but with a key that never matches.

Your task is to implement hash tables using linear probing and double hashing. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

1 Comment »

One Response to “Hash Tables With Open Addressing”

Paul said

May 2, 2019 at 1:06 PM

In Python. The full code is on ideone.

from itertools import count, accumulate, chain, cycle
from random import randrange, sample

empty = ("_empty", "_empty", 0)
deleted = ("_deleted", "_deleted", 0)

def inc_linprob(hashval, M):
    """ increment for linear probing"""
    return 1

def inc_double_hashing(hashval, M):
    """ increment for double hashing - make sure it is in range [1, M-1]"""
    return hashval % (M-1) + 1

class dictlh(object):
    """ Store (key, value, hash(key))
    """
    def __init__(self, mode=inc_linprob, load_factor=0.5):
        self.M = 997
        self.buckets = [empty] * self.M
        self.load_factor = load_factor
        self.max_load = load_factor * self.M
        self.n_elements = 0
        self.n_deleted = 0
        self._increment = mode
        self.searches = 0

    def _find_insert_bucket(self, key):
        """ result: find the key OR find an empty bucket to insert """
        hashval = hash(key)
        b, M = self.buckets, self.M
        inc = self._increment(hashval, M)
        h = hashval % M
        bh = b[h]
        while bh != empty and bh[0] != key:
            h = (h + inc) % M
            bh = b[h]
            self.searches += 1
        return h

    def _insert(self, key, value, hashval=None):
        if hashval is None:
            hashval = hash(key)
        bi = self._find_insert_bucket(key)
        isempty = self.buckets[bi] == empty
        self.buckets[bi] = (key, value)
        if isempty:
            self.n_elements += 1
            if self.n_elements > self.max_load:
                self._rehash()

    def _find_key(self, key):
        bi = self._find_insert_bucket(key)
        if self.buckets[bi][0] == key:
            return bi
        raise KeyError("key not found")

    def _lookup(self, key):
        h = self._find_key(key)
        return self.buckets[h][1]

    def _rehash(self):
        load = self.n_elements - self.n_deleted
        self.M = next_prime(4 * load)
        self.max_load = load_factor * self.M
        self.n_elements = self.n_deleted = 0
        old_buckets = list(self.buckets)
        self.buckets = [empty] * self.M
        for item in old_buckets:
            if item not in (deleted, empty):
                self._insert(*item)

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Programming Praxis