Perfect(ing) hashing in NetBSD

Jörg Sonnenberger

joerg@NetBSD.org

Tokyo, March 16, 2013

AsiaBSDCon 2013

Overview

  • What are Perfect Hash Functions (PHF)?
  • Why should you care?
  • Characteristics
  • History of perfect hash functions
  • Excursion I: Randomized hash functions
  • Excursion II: Random graphs
  • CHM
  • BPZ
  • nbperf
  • NetBSD's cdb
  • The future: route lookup

What are Perfect Hash Functions (PHF)?

  • Hash functions: map a set of keys to a set of integers
  • ...used to implement associative arrays
  • Perfect: injective, i.e. distinct keys are mapped to distinct values
  • A.k.a.: collision-free
  • Minimal: if the value range is [0..n-1] for n keys

Why should you care?

  • Algorithmic attacks try to force high collision rates as DoS
  • Deterministic timing
  • Compact, even if keys don't fit into memory
  • It's the S in CS
  • Because they are used in NetBSD

Characteristics

  • Does the PHF preserve the key order? I.e. can I decide what value each key gets assigned?
  • What is the space requirement for the PHF itself?
  • What is the computational cost of the PHF?
  • If it is not minimal, how dense are the values?

History of perfect hash functions

  • Bostic: predecessor of GNU gperf, 1984
  • Knuth: examples in TAoCP
  • Pearson: Fast Hashing of Variable-Length Text Strings, 1990
  • ...looks quite a bit like RC4
  • Common issue: lack of scalability
  • If they work: very fast, reasonable compact

Excursion I: Randomized hash functions

  • Families of hash functions derived from a parameter (seed)
  • Can be used to prevent complexity attacks
  • Strong property: independent random hash functions behave like random variables

Excursion II: Random graphs

  • Edges in the graph formed from random variable
  • Very interesting part of graph theory
  • Property: cycles in 2-graphs and 3-graphs are rare, if graph is dense enough
  • 2-graphs: |V| / |E| >= 2
  • 3-graphs: |V| / |E| > 1.24

CHM

  • Found by Czech, Havas, Majewski in 1992
  • One of the first expected linear run time algorithm
  • MPHF
  • Order-preserving
  • n * c * ceil(ld(n)) bits per key
  • Compute 2/3 hash values, reduce by modulo, 2/3 table lookups, sum, modulo

CHM illustrated

Key 0 Key 1 Key 2 Key 3
h0 0 2 3 1
h1 1 3 4 4
=> =>

BPZ

  • Found by Botelho, Pagh, Ziviani in 2007
  • Similar to CHD
  • Not order-preserving , not minimal
  • Same computation cost as CHD for PHF
  • Using 3-graphs: 1.98 bits per key
  • Post-process with counting filter to obtain MPHF
  • Including counting filter: 2.79 bits per key for MPHF
  • Adds two table lookups and one 64bit population count

BPZ illustrated

Key 0 Key 1 Key 2 Key 3
h0 0 2 3 1
h1 1 3 4 4
=> =>

CHD

  • "Hash, displace, compress"
  • Found by Belazzougui, Botelho, Dietzfelbinger in 2009
  • 1.4 bit per key for PHF
  • Significantly more complicated than BPZ: no further details

nbperf

  • Started in 2009 to provide an alternative for GNU gperf for large key sets
  • cmph didn't fit: license-wise and in terms of code
  • Provides CHM for 2-graphs and 3-graphs, BPZ for 3-graphs
  • CHD is too new, will follow later
  • Options for algorithm, density, stable seeding

nbperf performance

  • Webster's Second International Dictionary
  • 234977 and 76295 lines


Input Algorithm Tries Run time in s
web2 CHM 1 0.58
CHM3 39 0.58
BPZ 11 0.51
web2a CHM 12 0.35
CHM3 7 0.17
BPZ 18 0.16

nbperf output for CHM


#include <stdlib.h>

uint32_t
hash(const void * __restrict key, size_t keylen)
{
  static const uint32_t g[469955] = {
    /* ... */
  };
  uint32_t h[3];

  mi_vector_hash(key, keylen, 0x00000000U, h);

  return (g[h[0] % 469955] + g[h[1] % 469955]) % 234977;
}
            

Berkeley DB for index databases

  • No atomic transactions
  • Result: Copy-update-move schemes
  • High database size overhead
  • Size matters for pre-built databases
  • Complex code
  • Per-application database caching
  • Needs locking in multi-threaded programs

SQLite

  • Supports transactions
  • Limited support for concurrent read-only access
  • Still high database size overhead
  • Even larger code base

The NetBSD constant database

  • First part: indexed variable size storage
  • Second part: perfect hash index
  • No storage of keys: regenerate keys from data
  • Tiny: 1.6KB reader, 4.3KB writer on AMD64
  • Databases similar or smaller size than textual source
  • Lock-free, memory mapped IO
  • Worst case: 3 block reads for key lookup, 1 block read for offset
  • NetBSD 6: services, terminfo and device database
  • Database size: 1/4 of BDB and faster creation

Route lookup in the BSD network stack

  • Problem with CIDR: find best matching prefix (BMP)
  • Implementation: radix tree
  • Very complex code
  • Solves more generic problem: non-continous netmasks
  • Kitchen-sink approach in the network stack
  • Mixes policy and implementation details
  • David Young started separation in NetBSD with thin wrapper layer

Multi-level hashing

  • Find BMP using binary search on the possible prefix lengths
  • Needs markers if a longer prefix exists and the search starts at a shorter prefix
  • References to the covered prefix in the marker avoid backtracking

Multi-level hashing example

Destination network Gateway
0/0 192.168.0.10
127/8 127.0.0.1
192.168/16 R: 0/0
192.168.0/24 192.168.0.2
192.168.1/24 192.168.0.1
10/8 192.168.0.1
10.0/16 R: 10/8
10.0.10/24 192.168.0.5
10.192/12 192.168.0.6
11/8 R: 10/8
11.192/12 192.168.0.7
  • 192.168.0.5: 192.168/16 -> 192.168.0/16 done
  • 10.0.9.8: 10.0/16 -> 10.0.9/24 done

Multi-level performance

  • Perfect hashing: deterministic cost (memory accesses!)
  • BPZ with 2-graph and no post-filter: 50% usage, 2 bit per key
  • Separate tables for larger prefixes
  • IPv4: 64bit per entry (32bit network, 32bit next-hop ID)
  • IPv6: from 64bit to 160bit
  • Fits into L3 cache even with 100,000 entries
  • Question: construction time
  • Dynamic branching: less than 2 lookups on average

Summary

  • Modern algorithms like CHM, BPZ and CHD are fast and practical
  • Useful for performance sensitive read-mostly data structures
  • Useful for compact on-disk databases
  • Available in NetBSD 6

Q&A

Questions?