2010: SCHISM: a Workload-driven Approach to Database Replication and Partitionoing

2010: “Schism: a Workload-Driven Approach to Database Replication and Partitioning”, Carlo Curino, Yang Zhang, Evan Jones, Sam Madden, accepted for publication to Proceedings of Very Large Data Base (VLDB)

ABSTRACT:

We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of shared-nothing distributed databases.  Because distributed transactions are expensive in  OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the  number of distributed transactions, while producing balanced  partitions. Schism consists of two phases: i) a  workload-driven, graph-based replication/partitioning phase and ii)  an explanation and validation phase. The first phase creates a  graph with a node per tuple (or group of tuples) and edges between  nodes accessed by the same transaction, and then uses a graph  partitioner to split the graph into k balanced partitions that  minimize the number of cross-partition transactions. The second  phase exploits machine learning techniques to find a predicate-based  explanation of the partitioning strategy (i.e., a set of range  predicates that represent the same replication/partitioning scheme  produced by the partitioner).

The strengths of Schism are: i) independence from the schema  layout, ii) effectiveness on n-to-n relations, typical in social  network databases, iii) a unified and fine-grained approach to  replication and partitioning. We implemented and tested a prototype  of Schism on a wide spectrum of test cases, ranging from classical  OLTP workloads (e.g., TPC-C and TPC-E), to more complex scenarios  derived from social network websites (e.g., Epinions.com), whose  schema contains multiple n-to-n relationships, which are known to be  hard to partition. Schism consistently outperforms simple  partitioning schemes, and in some cases proves superior to the best known manual partitioning, reducing the cost of distributed transactions up to 30%.

 

Contact me to receive a copy of the paper.

 

 

2008 VLDB: “Managing and querying transaction-time databases under schema evolution”

 “Managing and querying transaction-time databases under schema evolution”   H. J. Moon, C. A. Curino, A. Deutsch, C.-Y. Hou, and C. Zaniolo. Very Large Data Base VLDB, 2008. 

The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The first is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a unified representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modification Operators (SMOs) to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history. 

 

For more information on this project visit: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prima