Publications – Page 2

2011: “Workload-aware Database Monitoring and Consolidation”

Conference: SIGMOD 2011

Authors:

Carlo Curino, Evan P. C. Jones, Sam Madden, Hari Balakrishnan

ABSTRACT:

In most enterprises, databases are deployed on dedicated database servers. Often, these servers are underutilized much of the time. For example, in traces from almost 200 production servers from different organizations, we see an average CPU utilization of less than 4%. This unused capacity can be harnessed to consolidate multiple databases on fewer machines, reducing hardware and operational costs. Virtual machine (VM) technology is one popular way to approach this problem. However, as we demonstrate in this paper, VMs fail to adequately support database consolidation, because databases place a unique and challenging set of demands on hardware resources, which are not well-suited to the assumptions made by VM-based consolidation.

Our system for database consolidation, named Kairos, uses novel techniques to measure the hardware requirements of database workloads, as well as models to predict the combined resource utilization of those workloads. We formalize the consolidation problem as a non-linear optimization program, aiming to minimize the number of servers and balance load, while achieving near-zero performance degradation. We compare Kairos against virtual machines, showing up to a factor of 12× higher throughput on a TPC-C-like benchmark. We also tested the effectiveness of our approach on real-world data collected from production servers at Wikia.com, Wikipedia, Second Life, and our institution, showing absolute consolidation ratios ranging between 5.5:1 and 17:1.

2011:Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System

Supporting legacy applications when the database schema evolves represents a long-standing challenge of practical and theoretical importance. Recent work has produced algorithms and systems that automate the process of data migration and query adaptation; however, the problems of evolving integrity constraints and supporting legacy updates under schema and integrity constraints evolution are significantly more difficult and have thus far remained unsolved. In this paper, we address this issue by introducing a formal evolution model for the database schema structure and its integrity constraints, and use it to derive update mapping techniques akin to the rewriting techniques used for queries. Thus, we (i) propose a new set of Integrity Constraints Modification Operators (ICMOs), (ii) characterize the impact on integrity constraints of structural schema changes, (iii) devise representations that enable the rewriting of updates, and (iv) develop a unified approach for query and update rewriting under constraints. We then describe the efficient implementation of these techniques provided by our PRISM++ system. The effectiveness of PRISM++ and its enabling technology has been verified on a testbed containing the evolution histories of several scientific databases and web information systems, including the Genetic DB Ensembl (410+ schema versions in 9 years), and Wikipedia (240+ schema versions in 6 years).

2010: SCHISM: a Workload-driven Approach to Database Replication and Partitionoing

2010: “Schism: a Workload-Driven Approach to Database Replication and Partitioning”, Carlo Curino, Yang Zhang, Evan Jones, Sam Madden, accepted for publication to Proceedings of Very Large Data Base (VLDB)

ABSTRACT:

We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of shared-nothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the number of distributed transactions, while producing balanced partitions. Schism consists of two phases: i) a workload-driven, graph-based replication/partitioning phase and ii) an explanation and validation phase. The first phase creates a graph with a node per tuple (or group of tuples) and edges between nodes accessed by the same transaction, and then uses a graph partitioner to split the graph into k balanced partitions that minimize the number of cross-partition transactions. The second phase exploits machine learning techniques to find a predicate-based explanation of the partitioning strategy (i.e., a set of range predicates that represent the same replication/partitioning scheme produced by the partitioner).

The strengths of Schism are: i) independence from the schema layout, ii) effectiveness on n-to-n relations, typical in social network databases, iii) a unified and fine-grained approach to replication and partitioning. We implemented and tested a prototype of Schism on a wide spectrum of test cases, ranging from classical OLTP workloads (e.g., TPC-C and TPC-E), to more complex scenarios derived from social network websites (e.g., Epinions.com), whose schema contains multiple n-to-n relationships, which are known to be hard to partition. Schism consistently outperforms simple partitioning schemes, and in some cases proves superior to the best known manual partitioning, reducing the cost of distributed transactions up to 30%.

Contact me to receive a copy of the paper.

2010: Scalable Architecture and Query Optimization for Transaction-time DBs with Evolving Schemas

Title: Scalable Architecture and Query Optimization for Transaction-time DBs with Evolving Schemas

Venue: SIGMOD 2010

Abstract:

The problem of archiving and querying the history of a database is made more complex by the fact that, along with the database content, the database schema also evolves with time. Indeed, archival quality can only be guaranteed by storing past database contents using the schema versions under which they were originally created. This causes major usability and scalability problems in preservation,retrieval and querying of databases with intense evolution histories, i.e., hundreds of schema versions.This scenarios are common in web information systems and scientific databases that frequently accumulate that many versions in just few years.

Our system, Archival Information Management System (AIMS), solves this usability issue by letting users write queries against a chosen schema version and then performing for the users the rewriting and execution of queries on all the appropriate schema versions. AIMS achieves scalability by using (i) and advanced storage strategy based on relational technology and attribute-level-timestamping of the history of the database content, (ii) suitable temporal indexing and clustering techniques, and (iii) novel temporal query optimizations. In particular, with AIMS we introduce a novel technique called CoalNesT that achieves unprecedented performance when temporal coalescing tuples fragmented by schema changes.Extensive experiments show that the performance and scalability thus achieved greatly exceeds those obtained by previous approaches. The AIMS technology is easily deployed by plugging into existing DBMS replication technologies, leading to very low overhead;moreover, by decoupling logical and physical layer provides multiple query interfaces, from the basic archive&query features considered in the upcoming SQL standards, to the much richer XML XQuery capabilities proposed by temporal database researchers.

More information available at Panta Rhei: Schema Evolution and Temporal Database Tools

2009 SIGMOD 09: “PRIMA: Querying Historical Data with Evolving Schemas”

“PRIMA: Archiving and Querying Historical Data with Evolving Schemas” Hyun J. Moon, Carlo A. Curino, MyungWon Ham, Carlo Zaniolo, accepted as demo paper at International Conference on Management of Data ’09 (SIGMOD)

ABSTRACT:

Schema evolution poses serious challenges in historical data management. Traditionally the archival data has been (i) either migrated under the current schema version, to ease querying, but compromising archival quality, or (ii) maintained under the original schema version in which they firstly appeared, leading to a perfect archival quality, but to a taxing query interface.
The PRIMA system, we present, achieves the best of both worlds, by archiving data under the original schema version, while automatically adapting the user temporal queries to the appropriate schema versions. The user is entitled to query the archive under a schema version of choice, letting the system to rewrite the queries to the potentially many involved schema versions. Moreover, the system offers automatic documentation of the schema history, and allows to pose temporal queries over the metadata history itself.
The proposed demonstration, highlights the system features exploiting both a synthetic-educational running example and the real-life evolution histories (schemas and data).
The selected real-life systems include, but are not limited to, the popular genomic database Ensembl and of Wikipedia, with their hundreds of schema versions.
The demonstration offers a thorough walk through the system features and an hands-on system testing phase, in which the audience is invited to interact directly with the advanced query interface of PRIMA. The conference participants will freely pose complex temporal queries over transaction-time databases subject to schema evolution, observing PRIMA rewriting and query execution capabilities.

2009 ICDE 2009: “The PRISM Workwench: Database Schema Evolution Without Tears”

“The PRISM Workwench: Database Schema Evolution Without Tears” Carlo A. Curino, Hyun J. Moon, MyungWon Ham, Carlo Zaniolo, DEMO paper at ICDE 2009

Information Systems are subject to a perpetual evolution, which is particularly pressing in Web Information Systems, due to their distributed and often collaborative nature. Such continuous adaptation process, comes with a very high cost, because of the intrinsic complexity of the task and the serious ramifications of such changes upon database-centric Information System softwares. Therefore, there is a need to automate and simplify the schema evolution process and to ensure predictability and logical independence upon schema changes. Current relational technology makes it easy to change the database content or to revise the underlaying storage and indexes but does little to support logical schema evolution which nowadays remains poorly supported by commercial tools. The PRISM system demonstrates a major new advance toward automating schema evolution (including query mapping and database conversion), by improving predictability, logical independence, and auditability of the process. In fact, PRISM exploits recent theoretical results on mapping composition, invertibility and query rewriting to provide DB Administrators with an intuitive, operational workbench usable in their everyday activities—thus enabling graceful schema evolution. In this demonstration, we will show (i) the functionality of PRISM and its supportive AJAX interface, (ii) its architecture built upon a simple SQL–inspired language of Schema Modification Operators, and (iii) we will allow conference participants to directly interact with the system to test its capabilities. Finally, some of the most interesting evolution steps of popular Web Information Systems, such as Wikipedia, will be reviewed in a brief “Saga of Famous Schema Evolutions”.

2008: “Improving search and navigation by combining Ontologies and Social Tags”

“Improving search and navigation by combining Ontologies and Social Tags“, Silvia Bindelli, Claudio Criscione, Carlo A. Curino, Mauro L. Drago, Davide Eynard, Giorgio Orsi, OTM Workshop: Ambient Data Integration (ADI) 2008

ABSTRACT:

The Semantic Web has the ambitious goal of enabling complex autonomous applications to reason on a machine-processable version of the World Wide Web. This, however, would require a coordinated effort not easily achievable in practice. On the other hand, spontaneous communities, based on social tagging, recently achieved noticeable consensus and diffusion.

The goal of the TagOnto system is to bridge between these to realities by automatically mapping (social) tags to more structured domain ontologies, thus, providing assistive, navigational features typical of the Semantic Web. These novel searching and navigational capabilities are complementary to more traditional search engine functionalities. The system, and its intuitive AJAX interface, are released and demonstrated on-line.

The paper is avaialble at: http://www.springerlink.com/content/180178w48mw2qg3p/

Demo on-line available at: http://kid.dei.polimi.it/tagonto/

2008: “Managing the History of Metadata in support for DB Archiving and Schema Evolution”

TO APPEAR “Managing the History of Metadata in support for DB Archiving and Schema Evolution“, Carlo A. Curino, Hyun J. Moon, Carlo Zaniolo, ER Interational Workshop on Evolution and Change in Data Management (ECDM) 2008

Modern information systems, and web information systems
in particular, are faced with frequent database schema changes, which
generate the necessity to manage such evolution and preserve their his-
tory. In this paper, we describe the Panta Rhei Framework designed to
provide powerful tools that: (i) facilitate schema evolution and guide the
Database Administrator in planning and evaluating changes, (ii) support
automatic rewriting of legacy queries against the current schema version,
(iii) enable efficient archiving of the histories of data and metadata, and
(iv) support complex temporal queries over such histories. We then in-
troduce the Historical Metadata Manager (HMM), a tool designed to
facilitate the process of documenting and querying the schema evolution
itself. We use the schema history of the Wikipedia database as a telling
example of the many uses and beneﬁts of HMM.

For more information: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prima

2008 VLDB: “Managing and querying transaction-time databases under schema evolution”

“Managing and querying transaction-time databases under schema evolution” H. J. Moon, C. A. Curino, A. Deutsch, C.-Y. Hou, and C. Zaniolo. Very Large Data Base VLDB, 2008.

The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The ﬁrst is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a uniﬁed representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modiﬁcation Operators (SMOs) to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.

For more information on this project visit: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prima

2008 VLDB: “Graceful database schema evolution: the prism workbench”

“Graceful database schema evolution: the prism workbench” Carlo A. Curino, Hyun J. Moon, and Carlo Zaniolo. Very Large Data Base VLDB, 2008 PDF

Supporting graceful schema evolution represents an unsolved problem for traditional information systems that is further exacerbated in web information systems, such as Wikipedia and public scientific databases: in these projects based on multiparty cooperation the frequency of database schema changes has increased while tolerance for downtimes has nearly disappeared. As of today, schema evolution remains an error-prone and time-consuming undertaking, because the DB Administrator (DBA) lacks the methods and tools needed to manage and automate this endeavor by (i) predicting and evaluating the effects of the proposed schema changes, (ii) rewriting queries and applications to operate on the new schema, and (iii) migrating the database.

Our PRISM system takes a big first step toward addressing this pressing need by providing: (i) a language of Schema Modification Operators (SMO) to express concisely complex schema changes, (ii) tools that allow the DBA to evaluate the effects of such changes, (iii) optimized translation of old queries to work on the new schema version, (iv) automatic data migration, and (v) full documentation of intervened changes as needed to support data provenance, database flash back, and historical queries. PRISM solves these problems by integrating recent theoretical advances on mapping composition and invertibility, into a design that also achieves usability and scalability. Wikipedia and its 170+ schema versions provided an invaluable testbed for validating PRISM tools and their ability to support legacy queries.

For more details and an on-line DEMO of the system visit: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prism

Bibtex:

@INPROCEEDINGS{curino-vldb2008a,
author = {Carlo A. Curino and Hyun J. Moon and Carlo Zaniolo},
title = {Graceful database schema evolution: the prism workbench},
booktitle = {Very Large Data Base (VLDB)},
year = {2008}
}