November 2012 – Carlo Curino

2012 VLDBJournal: Automating the Database Schema Evolution Process

Abstract Supporting database schema evolution represents a long-standing challenge of practical and theoretical importance for modern information systems. In this paper we describe techniques and systems for automating the critical tasks of migrating the database and rewriting the legacy applications. In addition to labor saving, the benefits delivered by these advances are many, and include reliable prediction of outcome, minimization of down-time, system-produced documentation, and support for archiving, historical queries and provenance. The PRISM/PRISM++ system delivers these benefits, by solving the difficult problem of automating the migration of databases and the rewriting of queries and updates. In this paper we present the PRISM/PRISM++ system, and the novel technology that made it possible. In particular we focus on the difficult, and previously unsolved problem of supporting legacy queries and updates under schema and integrity constraints evolution.

The PRISM/PRISM++ approach consists in providing the users with a set of SQL-based Schema Modification Operators (SMOs) which describe how the tables in the old schema are modified into those in the new schema. In order to support updates, SMOs are extended with Constraints Modification Operators (ICMOs). By using recent results on schema mapping, the paper (i) characterizes the impact on integrity constraints of structural schema changes, (ii) devises representations that enable the rewriting of updates, and (iii) develop a unified approach for query and update rewriting under constraints.

We complement the system with two novel tools: the first automatically collects and provides statistics on schema evolution histories, whereas the second derives equivalent sequences of SMOs from the migration scripts that were used for schema upgrades. These tools were used to produce an extensive testbed containing 15 evolution histories of scientific databases and web information systems, providing over 100 years of aggregate evolution histories and almost 2000 schema evolution steps.

To appear in VLDB Journal “Best of VLDB 2011”

2012: Keynote at CloudDB “Benchmarking OLTP/Web Databases in the Cloud: the OLTP-Bench Framework”

Benchmarking is a key activity in building and tuning data manage- ment systems, but the lack of reference workloads and a common platform makes it a time consuming and painful task. The need for such a tool is heightened with the advent of cloud computing— with its pay-per-use cost models, shared multi-tenant infrastruc- tures, and lack of control on system configuration. Benchmarking is the only avenue for users to validate the quality of service they receive and to optimize their deployments for performance and re- source utilization.

In this talk, we present our experience in building several ad- hoc benchmarking infrastructures for various research projects tar- geting several OLTP DBMSs, ranging from traditional relational databases, main-memory distributed systems, and cloud-based scal- able architectures. We also discuss our struggle to build mean- ingful micro-benchmarks and gather workloads representative of real-world applications to stress-test our systems. This experience motivates the OLTP-Bench project, a “batteries-included” bench- marking infrastructure designed for and tested on several relational DBMSs and cloud-based database-as-a-service (DBaaS) offerings. OLTP-Bench is capable of controlling transaction rate, mixture, and workload skew dynamically during the execution of an ex- periment, thus allowing the user to simulate a multitude of prac- tical scenarios that are typically hard to test (e.g., time-evolving access skew). Moreover, the infrastructure provides an easy way to monitor performance and resource consumption of the database under test. We also introduce the ten included workloads, derived from either synthetic micro benchmarks, popular benchmarks, and real world applications, and how they can be used to investigate various performance and resource-consumption characteristics of a data management system. We showcase the effectiveness of our benchmarking infrastructure and the usefulness of the workloads we selected by reporting sample results from hundreds of side-by- side comparisons on popular DBMSs and DBaaS offerings.

More details at: http://oltpbenchmark.com

Compile Hadoop Trunk on Mac Java 7

(UPDATED)

I saw my Java 6 disappear… and I don’t even want to know how to blame or rant about, but I need to compile Hadoop (trunk as of early november) on my macbook running 10.7.5 (latest updates at the time of this post) with the Oracle Java 1.7.0.8.

This is briefly what worked for me… but as a disclaimer this is a hackish quick-fix to get the thing to compile, not a clean mvn marvel.

There are two parts of this:

1) fix a pom.xml in hadoop-common-project/hadoop-annotations by adding the following

This fixes the compile errors coming from the hadoop-common-projects. (apologies for the image, but it is joomla is kicking and screaming on the tags)

2) (ugly) Fix the compile errors for hadoop-hdfs

sudo mkdir /Library/Java/JavaVirtualMachines/jdk1.7.0_08.jdk/Contents/Home/Classes  cd !$ sudo ln -s ../lib/tools.jar classes.jar

This solves the problem of ant/mvn/make/someone that looks for tools.jar into the wrong place by linking the wrong place to the right place. Now this is a disgusting trick, but after fussing around with mvn (which I don’t know well at all) for a bit, I decided to be pragmatical.

Potential (cleaner) solution that Markus Weimer suggested is the following:

I haven’t tested it yet, but here it is.

The final version of this consists in using:

<profile>

   <id>macdep</id>

   <activation>

      <os>

        <name>macprofile</name>

        <family>mac</family>

      </os>

    </activation>

    <dependencies>

    <dependency>

     <groupId>jdk.tools</groupId>

     <artifactId>jdk.tools</artifactId>

     <scope>system</scope>

     <systemPath>/Library/Java/JavaVirtualMachines/jdk1.7.0_08.jdk/Contents/Home/lib/tools.jar</systemPath>

     <version>1.7</version>

    </dependency>

   </dependencies>

  </profile>

In the <profiles> section of the main pom.xml, and in the hadoop-annotations.