Research – Carlo Curino

OLTPBenchmark

With Andy Pavlo, Djellel Difallah, and Phil Cudre-Maroux we just completed an interesting project: an extensible testbed for benchmarking relational databases.

This was motivated by the huge amount of time each one of us wasted in our research in building workloads (gathering real/realistic data and query loads, implementing driving infrastructures, collect statistics, plot statistics).

To reduce the wasted effort for others and to promote repeatability and ease of comparison of scientific results we pull together our independent efforts and created an extensible driving infrastructure for OLTP/Web workload, capable of:

Targeting all major relational DBMSs via JDBC (tested on MySQL, Postgres, Oracle, SQLServer, DB2, HSQLDB)
Precise rate control (allows to define and change over time the rate at which requests are submitted)
Precise transactional mixture control (allow to define and change over time % of each transaction type)
Access Distribution control (allows to emulate evolving hot-spots, temporal skew, etc..)
Support trace-based execution (ideal to handle real data)
Extensible design
Elegant management of SQL Dialect translations (to target various DBMSs)
Store-Procedure friendly architecture
Include 10 workloads
- AuctionMark
- Epinions
- JPAB
- Resource Stresser
- SEATS
- TATP
- TPCC
- Twitter
- Wikipedia
- YCSB

If you are interested in using our testbed, or to contribute to it visit: http://oltpbenchmark.com/

A Relational rant about Hbase/Pig/Hadoop

Disclaimer: my background is heavily skewed in favor of Relational Databases, and I am fairly new to the entire Hadoop/Pig/Hbase world.

For a project I am involved in, I am in the process of setting up a 40 nodes Hadoop/Pig/Hbase cluster. I am learning the tools as I go, and I feel like sharing the pain and excitements of this phase.

While I am 100% sure that someone with more experience in this field would have avoided many dumb errors I committed, I represent an ideal example of how easy/hard it is to pick up Hadoop/Pig/Hbase, i.e., zero backround in hadoop/hbase/pig, strong background in related fields (DB and Java programming).

I am running hbase 0.90.3 on hadoop 0.20.3 and pig 0.9.1.

RANT 1: No documentation

My first rant is about the painful lack of documentation, examples, tutorials. I was expecting to join a “modern” project run by a hip community, so I was expecting amazing tools and great documentation. The code is mostly barebone, there is very few and only partial examples around, most of the knowledge is trapped in mailinglists and bug-tracking software. Not very pleasant experience, especially because the rate of evolution of the project is high enough that most of what you find is already obsolete.

RANT 2: Classpath what a nightmare

One of the most horrible things to deal with is surprise surprise: the classpath! Using Pig and Hadoop the actual active classpath is born as a funky combination of many variables, registering from the pig scripts and scanning of directories. This would not be a problem in itself, what made my life miserable was the following: you load the wrong .jar file (e.g., a slightly different version of hadoop or hbase or pig or guava or commons etc..) and the error that comes out is not something reasonable like “Wrong class version” but rather varios bizarre things like “IndexOutOfBoundException”, “EOFException”, and many others… now this is annoying and rather hard to debug (if you don’t expect it). Especially when it is some obscure Maven script that is sneaking the wrong jar into a directory you don’t even know it exists, much less you suspect is somehow every jar in that dir is part of the classpath. Another interesting thing I observed is that pig 0.9.1 “register” does not always work, sometimes you have to put the jar both in the register and in the -cp you pass to pig in input for it to work. Oh joy….

At least from my experience having a more systematic way to load and check the jars in the classpath would save lots of time, especially when you first start (I am now very careful to alway check the classpath anytime there is an error).

RANT 2: Poor error propagation

I run in a very privileged setting, I am root on every box in the cluster, and I debugged many of my errors by looking at the logs of Zookeeper, Job Tracker, Pig, Master Hbase nodes and what so ever. But the propagation of errors seems rather bad, in a normal “shared grid” environment, where you don’t necessarily have access to all these logs, I would have had a even harder time to debug my code (e.g., the pig script complete successfully, and the error is lost in the job tracker log).

RANT 3: Hbase is (not) easy to configure/run

I am doing the installation with the precious help of a skilled sys-admin, therefore I was thus expecting everything to run rather smoothly. Afterall, Hbase is giving up almost everything I love in life (transactions, secondary indexes, joins), in order to be super-scalable, robust, zero-admin. Well, not really… Having Hbase to work decently at scale seems a rather manual work of pampering and convincing the thing that “it’s ok” to go fast.

I am loading data in Hbase via Pig HbaseStorage and/or importtsv (after spending enough time to find the combination of versions that works properly together), and I was expecting the thing to scale linearly and trivially (and to load at some 10k row per box per second). At my first attempt the pig script was failing because of too much waits on a region or something. And even after I pre-split my table across many regions and I generate keys as an MD5(id)+id, there is a huge amount of skew (50% of request hit a single node), but after some more parameter tuning the thing at least complete the loading. I will do more performance debugging in the future. (I plan to generate hfiles directly and bulk load).

Bottomline, I am ok to give up all the fancy relational transactional guarantees for a worry-free, super-scalable, zero-admin, self-balancing something… not sure Hbase is quite there yet. I feel there is lots of hype associated to it.

Altogether I am a bit underwhelmed by Hbase, I haven’t seen anything I could not do about 5X faster with a set of MySQL boxes… Ok I know MySQL performance tuning much better than Hbase anything… but I will work on tuning the performance more and at some point I will run some comparisons, I am still skeptical.

PROS:

Now that I released some steam with my rants, let me say few words about what’s nice… What it is very nice for me is pig because it allows me to write very few lines of code that automagically get distributed and run like crazy. Let me reformulate, it is pig+hadoop… fine it is pig+hadoop+hbase. I have to admit that whenever you get the entire software-stack to play nice it is very exciting to be able to hack up some code in few minutes and have it parallelized across the entire cluster (and even more exciting on our main grid where you can spawn thousands of mappers in parallel).

Altogether, I am probably spoiled by having worked in the relational world for long time, where the systems are 30+ years old and thus many of these small issues have been solved (while plenty worst problems are lurking in the darkness of an unpredictable optimizer or hard to scale data model etc..). But this brings me to the moral of this post:

MORAL

My word of advice to people that plan to start using Hbase/Pig/Hadoop… by any mean get into it, it is an exciting world, but be ready to deal with a hackish, unpolished, untamed, kicking and screaming, system.. be ready to open java source code of the system to figure out why things are not working, or how to use and API… be ready to have interfaces that are as stable as butterflies… If you are ready for that you will not be disappointed.

Schema Evolution: Datasets…

With the help of various students of Prof. Carlo Zaniolo at UCLA, I have gathered a long list of datasets to test schema evolution systems. You can find a brief summary of the system and links to their schemas at the following link: Schema Evolution Benchmark Datasets

Enjoy! And if you use this in a paper, don’t forget to acknowledge/cite us…

Database in the Cloud

The advent of cloud computing and hosted software as a service is creating a novel market for data management. Cloud-based DB services are starting to appear, and have the potential to attract customers from very diverse sectors of the market, from small businesses aiming at reducing the total cost of ownership (perfectly suited for multi-tenancy solutions), to very large enterprises (seeking high-profile solutions spanning on potentially thousands of machines and capable of absorbing unexpected burst of traffic). At the same time, the database community is producing new architectures and systems tackling in novel ways various classes of typical DB workloads. These novel, dedicated approaches often outshine the traditional DBMS trying to provide an unreachable “one-size-fit-all” dream. Such increase in the number of available data management products is exacerbating the problem of selecting, deploying and tuning in-house solutions for data management.

In this context we are building a cloud-based database architecture, enabling the implementation of DB-as-a-service. We envision a database provider that (i) provides the illusion of infinite resources, by continuously meeting user expectations under evolving workloads, and (ii) minimizes the operational costs associated to each user by amortizing administrative costs across users and developing software techniques to automate the management of many databases running inside of a single data center. Thus, the traditional provisioning problem (what resources to allocate for a single database) becomes an optimization issue, where a large user base, multiple DBMS engines, and a very large data center provide an unprecedented opportunity to exploit economy of scale, smart load balancing, higher power efficiency, and principled overselling.

The “relational cloud” infrastructure has several strong points for database users: (i) predictable and lower costs, proportional to the quality of service and actual workloads, (ii) reduced technical complexity, thanks to a unified access interface and the delegation of DB tuning and administration, and (iii) elasticity and scalability, providing the perception of virtually infinite resources ready at hand. In order to achieve this, we are working on harnessing many recent technological advances in data management by efficiently exploiting multiple DBMS engines (targeting different types of workloads) in a self-balancing solution that optimizes the assignment of resources of very large data centers to serve potentially thousands of users with very diverse needs. Among the critical research challenges to be faced we have identified: (i) automatic database partitioning, (ii) multi-db, multi-engine workload balancing, (iii) scalable multi-tenancy strategies, (iv) high profile distributed transaction support, (v) automated replication and backup. These and many others are the key ingredients of the RelationalCloud (http://relationalcloud.com) prototype currently under development.

Data Integration

The current business reality induces frequent acquisitions and merges of companies and organizations, and more and more tight interaction between information systems of cooperating companies. We present our contributions to Context-ADDICT, a framework supporting on-the-ﬂy (semi-) automatic data integration, context modeling and context-aware data ﬁltering. Data Integration is achieved in Context-ADDICT by means of automatic ontology-based data source wrapping and integration. While this approach lays its foundations in a solid theoretical background, it also provides heuristics to solve the practical aspects of data integration in a dynamic context.

More details on this research effort can be found at: http://kid.dei.polimi.it/

Java for HW reconfigurable architectures

Another research topic I have worked on is the exploitation of the Java language as an Hardware Description Language for IP-Core definition in the context of the DRESD project. The main advantage will be the leveraging of the familiarity of the developer with an existing language, and the exploitation of existing IDE for the Java language, while maintaining the performance typical of the Caronte architecture, developed in the DRESD project.

More details can be found at: http://www.dresd.org/

Wireless Sensor Networks: the TinyLime experience!

In the the past i spent some time working on Wireless Sensor Networks. The most complete result of this research has been the design and development of TinyLIME, in collaboration with GianPietro Picco and Amy Murphy, the original designer and developer of LIME.

TinyLIME is a middleware for wireless sensor networks (WSN) that departs from the traditional WSN setting where sensor data is collected by a central monitoring station, and enables instead multiple mobile monitoring stations to access the sensors in their proximity and share the collected data through wireless links. This intrinsically context-aware setting is demanded by applications where the sensors are sparse and possibly isolated, and where on-site, location-dependent data collection is required. An extension of LIME, TinyLIME makes sensor data available through a tuple space interface, providing the illusion of shared memory between applications and sensors. Data aggregation capabilities and a power-savvy architecture complete the middleware features.

Please refer to the official webpage for more details and to download the system.

Schema Evolution

Under the guidance of Professor Carlo Zaniolo I’m currently work on schema evolution and Temporal Databases. Every Information System (IS) is subject to an uninterrupted evolution process aimed at adapting the system to changing requirements. One of the most critical portions (of the IS) to evolve is the data management core.

Often based on relational database technologies, the data management core of a system needs to evolve whenever the revision process requires modiﬁcations in the set of data or in the way they are stored or maintained, e.g., to increase performance. Given its fundamental role, the evolution of the database underlying an IS has a very strong impact on (all) the application(s) accessing the data, and support for a graceful evolution is of paramount importance nowadays. The complexity of database and software maintenance, clearly, grows with the size and complexity of the system.

Furthermore,when moving from intra-company systems, typically managed by rather small and stable teams of developers/administrators, to collaboratively-developed and -maintained public systems, the need for a well-behaved evolution becomes indispensable. In this web-scale scenario, due to its collaborative nature and fast rate of growth, the forces driving a system to change become “wilder”, while stability, shared agreement, and evolution documentation become crucial. This part of the research is thus devoted to the effort of developing methodologies and tools to support seamless database evolution by means of query rewriting and migration support.

Within this context we developed two systems: PRISM, a system to support schema evolution in relational snapshot databases, and PRIMA:a system to support data archival and querying under schema evolution.

VSDB: Very Small Data Base

This was one of my first topic of interest. As a Bachelor thesis i developed, guided by Cristiana Bolchini and Letizia Tanca the PoLiDBMS tool discussed in the following.

The use of handheld devices, such as Smart Cards, Portable Data Assistants (PDA), Palm PCs and Cell Phones, to store data locally and to issue transactions against both local and remote data from Information Systems is being widely discussed in recent times.

Features required by portable devices in order to manage data are, for some aspects, similar to those found in Embedded Database systems, and range from very simple file system functions to a full set of database management capabilities, including some ACID transactions properties. Databases for very small devices – henceforth called Very Small Data Bases (VSDB) – are useful in various circumstances:

• The personal (micro)information system, the so-called citizen’s card, which records administrative personal data like driver’s license and car information, passport, judicial register, etc.;

• The personal medical record, reporting the owner’s clinical history complete with all the past clinical tests and diagnoses; this is most useful with patients suffering from some form of physical handicap or needing some critical treatment like periodical dialysis;

• The traveling salesman database, i.e. the “clients portfolio”, storing visit schedule and purchase orders along with the interesting information about each client’s particular needs;

• The personal travel database, recording all the travel (e.g. touristic) information considered interesting by the device owner.

In this scenario a twofold research project began a couple of years ago, to tackle the problems of a) efficiently managing data stored locally on devices with limited resources – the VSDB DataBase Management System – and b) designing and selecting the portion of data to be held – the VDSB Design Methodology -.

The two aspects of the problem are strictly related and the research has at first investigated the opportunity to define new physical and logical data structures to exploit the technological characteristics of the digital mobile devices hosting the DBMS and the data. On top of such ad-hoc data structures a Portable Light DBMS has been designed and a prototype is currently available (namecode: PoLiDBMS), to process query and manage the stored data.

On the other hand, we also worked on the design methodology to define the DB to be stored locally on the device and be readily available to the user.

More specifically, the methodology we propose for Very Small Data Base design is based on the classical three levels of the ANSI-SPARC model, sharing many issues with the methodologies for distributed/federated database design. However, three main differences w.r.t. the traditional design methodologies are introduced: first, since most interesting microdevices are portable, the main mobility issues are to be considered along with data distribution; second, context awareness is included in the data design issues to allow a full exploitation of context sensitive application functionalities; third, the peculiarities of the storage device(s) must be taken into account from the early steps, thus a logistic phase is added after the usual conceptual and logical phases, which supports the designer in the physical design task by taking into account the logistic aspects of data storage.

By examining these three aspects together we delineate the “VSDB ambient”, which isthe set of personal and environmental characteristics determining the portion of data that must be stored on the portable device.

Context-ADDICT

One my main research topic is the Context-Aware Data Tailoring, the goal of this research is to exploit contextual information, properly captured by a model we are developing, to filter information coming from heterogeneous data sources. The result will be common semantic view over the relevant portion of the available data. The system we are developing is named Context-ADDICT (Context-Aware Data Design Integration Customization and Tailoring), more details can be found here.

A presentation of Context-ADDICT is available here.