prima – Carlo Curino

“PRIMA: Archiving and Querying Historical Data with Evolving Schemas” Hyun J. Moon, Carlo A. Curino, MyungWon Ham, Carlo Zaniolo, accepted as demo paper at International Conference on Management of Data ’09 (SIGMOD)

ABSTRACT:

Schema evolution poses serious challenges in historical data management. Traditionally the archival data has been (i) either migrated under the current schema version, to ease querying, but compromising archival quality, or (ii) maintained under the original schema version in which they firstly appeared, leading to a perfect archival quality, but to a taxing query interface.
The PRIMA system, we present, achieves the best of both worlds, by archiving data under the original schema version, while automatically adapting the user temporal queries to the appropriate schema versions. The user is entitled to query the archive under a schema version of choice, letting the system to rewrite the queries to the potentially many involved schema versions. Moreover, the system offers automatic documentation of the schema history, and allows to pose temporal queries over the metadata history itself.
The proposed demonstration, highlights the system features exploiting both a synthetic-educational running example and the real-life evolution histories (schemas and data).
The selected real-life systems include, but are not limited to, the popular genomic database Ensembl and of Wikipedia, with their hundreds of schema versions.
The demonstration offers a thorough walk through the system features and an hands-on system testing phase, in which the audience is invited to interact directly with the advanced query interface of PRIMA. The conference participants will freely pose complex temporal queries over transaction-time databases subject to schema evolution, observing PRIMA rewriting and query execution capabilities.

“Managing and querying transaction-time databases under schema evolution” H. J. Moon, C. A. Curino, A. Deutsch, C.-Y. Hou, and C. Zaniolo. Very Large Data Base VLDB, 2008.

The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The ﬁrst is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a uniﬁed representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modiﬁcation Operators (SMOs) to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.

For more information on this project visit: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prima

Tag: prima

2009 SIGMOD 09: “PRIMA: Querying Historical Data with Evolving Schemas”

2008 VLDB: “Managing and querying transaction-time databases under schema evolution”