RHIK: Resizable Hash-based Indexing for KV-SSD

June 19, 2023

data-systemskv-ssdindexinghash-table

Key-Value SSDs manage their own internal index to map keys to physical flash locations. The current state-of-the-art uses fixed-size hash tables, which creates two problems that limit practical adoption: performance degrades as the device fills up, and the device can only store a fraction of its nominal capacity in key-value pairs.

The Degradation Problem

Through systematic experiments on the open-source KV-SSD emulator, we observed that performance drops significantly as more data is stored on the device. Worse, the device hits its key-value pair limit well before its physical capacity is exhausted. The root cause is the fixed-size hash index — as it fills, collision chains grow longer, slowing every lookup. And the fixed table size imposes a hard cap on the number of entries regardless of available flash space.

Dynamic Resizing Without Rehashing

Traditional hash table resizing requires rehashing every existing entry into the new table — an operation that causes unacceptable latency spikes on a storage device that must maintain real-time I/O service. RHIK (Re-configurable Hash-based Indexing for KV-SSD) solves this with a design that enables dynamic resizing without full-table rehashing. The index grows incrementally as needed, maintaining consistent lookup performance regardless of occupancy.

Results

We implemented RHIK on the open-source KV-SSD emulator and evaluated it using both real workload traces and synthetic microbenchmarks. The results demonstrate that RHIK maintains high performance at high occupancy levels where the baseline design degrades, and supports storing significantly more key-value pairs on the same physical device.

This work is covered by granted U.S. Patent 11474699 (Oct 2022).

Published at HPDC 2023 (32nd International Symposium on High-Performance Parallel and Distributed Computing). DOI: 10.1145/3588195.3595945