November 2009 – Carlo Curino

The advent of cloud computing and hosted software as a service is creating a novel market for data management. Cloud-based DB services are starting to appear, and have the potential to attract customers from very diverse sectors of the market, from small businesses aiming at reducing the total cost of ownership (perfectly suited for multi-tenancy solutions), to very large enterprises (seeking high-profile solutions spanning on potentially thousands of machines and capable of absorbing unexpected burst of traffic). At the same time, the database community is producing new architectures and systems tackling in novel ways various classes of typical DB workloads. These novel, dedicated approaches often outshine the traditional DBMS trying to provide an unreachable “one-size-fit-all” dream. Such increase in the number of available data management products is exacerbating the problem of selecting, deploying and tuning in-house solutions for data management.

In this context we are building a cloud-based database architecture, enabling the implementation of DB-as-a-service. We envision a database provider that (i) provides the illusion of infinite resources, by continuously meeting user expectations under evolving workloads, and (ii) minimizes the operational costs associated to each user by amortizing administrative costs across users and developing software techniques to automate the management of many databases running inside of a single data center. Thus, the traditional provisioning problem (what resources to allocate for a single database) becomes an optimization issue, where a large user base, multiple DBMS engines, and a very large data center provide an unprecedented opportunity to exploit economy of scale, smart load balancing, higher power efficiency, and principled overselling.

The “relational cloud” infrastructure has several strong points for database users: (i) predictable and lower costs, proportional to the quality of service and actual workloads, (ii) reduced technical complexity, thanks to a unified access interface and the delegation of DB tuning and administration, and (iii) elasticity and scalability, providing the perception of virtually infinite resources ready at hand. In order to achieve this, we are working on harnessing many recent technological advances in data management by efficiently exploiting multiple DBMS engines (targeting different types of workloads) in a self-balancing solution that optimizes the assignment of resources of very large data centers to serve potentially thousands of users with very diverse needs. Among the critical research challenges to be faced we have identified: (i) automatic database partitioning, (ii) multi-db, multi-engine workload balancing, (iii) scalable multi-tenancy strategies, (iv) high profile distributed transaction support, (v) automated replication and backup. These and many others are the key ingredients of the RelationalCloud (http://relationalcloud.com) prototype currently under development.