2012: Lookup Tables: Fine-Grained Partitioning for Distributed Databases

The standard way to get linear scaling in a distributed OLTP DBMS is to horizontally partition data across several nodes. Ideally, this partitioning will result in each query being executed at just one node, to avoid the overheads of distributed transactions and allow nodes to be added without increasing the amount of required coordination. For some applications, simple strategies, such as hashing on primary key, provide this property. Unfortunately, for many applications, including social networking and order-fulfillment, many-to-many relationships cause simple strategies to result in a large fraction of distributed queries. Instead, what is needed is a fine-grained partitioning, where related individual tuples (e.g., cliques of friends) are co-located together in the same partition. Maintaining such a fine-grained partitioning requires the database to store a large amount of metadata about which partition each tuple resides in. We call such metadata a lookup table, and present the design of a data distribution layer that efficiently stores these tables and maintains them in the presence of inserts, deletes, and updates. We show that such tables can provide scalability for several difficult to partition database workloads, including Wikipedia, Twitter, and TPC-E. Our implementation provides 40% to 300% better performance on these workloads than either simple range or hash partitioning and shows greater potential for further scale-out.

@inproceedings{tatarowicz2012lookup,   author    = {Aubrey Tatarowicz and                Carlo Curino and                Evan P. C. Jones and                Sam Madden},   title     = {Lookup Tables: Fine-Grained Partitioning for Distributed                Databases},   booktitle = {ICDE},   year      = {2012},   pages     = {102-113},   ee        = {http://doi.ieeecomputersociety.org/10.1109/ICDE.2012.26}, }

Leave a Reply Cancel reply