2010: SCHISM: a Workload-driven Approach to Database Replication and Partitionoing

2010: “Schism: a Workload-Driven Approach to Database Replication and Partitioning”, Carlo Curino, Yang Zhang, Evan Jones, Sam Madden, accepted for publication to Proceedings of Very Large Data Base (VLDB)


We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of shared-nothing distributed databases.  Because distributed transactions are expensive in  OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the  number of distributed transactions, while producing balanced  partitions. Schism consists of two phases: i) a  workload-driven, graph-based replication/partitioning phase and ii)  an explanation and validation phase. The first phase creates a  graph with a node per tuple (or group of tuples) and edges between  nodes accessed by the same transaction, and then uses a graph  partitioner to split the graph into k balanced partitions that  minimize the number of cross-partition transactions. The second  phase exploits machine learning techniques to find a predicate-based  explanation of the partitioning strategy (i.e., a set of range  predicates that represent the same replication/partitioning scheme  produced by the partitioner).

The strengths of Schism are: i) independence from the schema  layout, ii) effectiveness on n-to-n relations, typical in social  network databases, iii) a unified and fine-grained approach to  replication and partitioning. We implemented and tested a prototype  of Schism on a wide spectrum of test cases, ranging from classical  OLTP workloads (e.g., TPC-C and TPC-E), to more complex scenarios  derived from social network websites (e.g., Epinions.com), whose  schema contains multiple n-to-n relationships, which are known to be  hard to partition. Schism consistently outperforms simple  partitioning schemes, and in some cases proves superior to the best known manual partitioning, reducing the cost of distributed transactions up to 30%.


Contact me to receive a copy of the paper.