Leveraging Keys in Key-Value SSD for Production Workloads

June 19, 2023

data-systemskv-ssdindexingkey-hierarchy

Key-Value SSDs promise to simplify storage management by moving data indexing from the host into the device itself. But the current key-to-page mapping inside KV-SSDs has a fundamental problem: the sparsely populated NVMe KV namespace produces very large index structures that cannot be optimized using techniques from traditional block SSDs.

The Sparse Namespace Problem

In a block SSD, logical block addresses (LBAs) are dense and sequential, enabling compact mapping tables. In a KV-SSD, keys are arbitrary byte strings drawn from a vast namespace. The resulting index is sparse and large — consuming precious controller memory and slowing lookups. Traditional flash translation layer (FTL) optimization techniques like hybrid or block mapping don’t apply because keys have no inherent spatial locality.

Keys Carry Information

The insight behind this work is that keys in real production workloads are not random — they carry meaningful structure. Applications encode hierarchy, grouping, and type information into key prefixes. Object storage systems use paths like bucket/user/object_id. Databases use composite keys that encode table, partition, and row information. This structure is invisible to current KV-SSD implementations, which treat keys as opaque byte strings.

Exploiting Key Hierarchy

We designed a scheme that leverages the information embedded in keys about application keyspaces and groups — typically encoded as prefixes — to improve I/O request handling performance. By recognizing and exploiting key structure, the device can make better placement and indexing decisions, grouping related key-value pairs for more efficient access.

Impact

This work demonstrates that breaking the abstraction barrier between applications and storage — specifically, making the storage device aware of key semantics — can yield meaningful performance improvements for KV and Object storage devices serving production workloads.

Published at HPDC 2023 (32nd International Symposium on High-Performance Parallel and Distributed Computing). DOI: 10.1145/3588195.3595949