OLTP-Bench at VLDB 2014

The work behind this paper started long ago (2009?), working with Evan Jones and Yang Zhang on some infrastructure to test our DBaaS project (http://relationalcloud.com) and continued with Andy Pavlo and I cursing about how painful it is to build a reasonable testing infrastructure (is like a tax on DB/system PhD students).

We decided to make it easier for future generations, and started to combine Andy’s workloads and mine, and the strength of both infrastructures (at the time a beautiful mess of hacky codes and half-fast scripts). Djellel Difallah and Phil Cudre-Maroux join the effort (and arguably Djellel put in more hours than anyone else on this since then). We polished the infrastructure and added several more workloads, with help and input from many people including Rusty Sears, Ippokratis Pandis, Barzan Mozafari, Dimitri Vorona, Sam Madden, and Mark Callaghan.

The goal was to produce enough critical mass of features and workloads. So that other researchers would prefer to pick up this infrastructure and contribute to it, rather than building from scratch. This seems to be working as we received many requests and contributions from companies and academics all around the world.  Andy Pavlo is now heading a revamp of the website, including much needed graphing and comparisons interfaces.

Hopefully our community can rally behind this effort, and drive it in whichever direction seems appropriate (we are open to extensions and changes, even drastic), reducing the repeated work, and fostering some better repeatability and ease of comparison among “scientific” results in papers.

Checkout the paper here:


Our website at:


And get the code from github:



2008: Schema Evolution in Wikipedia: toward a Web Information System Benchmark

Evolving the database that is at the core of an Information System represents a difficult maintenance problem 
that has only been studied in the framework of traditional information systems. However, the problem is likely 
to be even more severe in web information systems, where open-source software is often developed through 
the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an in- 
depth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known 
example of a large family of web information systems built using the open-source MediaWiki software. Our 
study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation 
for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed 
us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of 
growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis 
suggests the need for developing better methods and tools to support graceful schema evolution. Therefore, 
we briefly discuss documentation and automation support systems for database evolution, and suggest that the 
Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems. 
To appear ICEIS 2008
Here you can find a copy of the paper.