schema evolution – Carlo Curino

Schema Evolution: Datasets…

With the help of various students of Prof. Carlo Zaniolo at UCLA, I have gathered a long list of datasets to test schema evolution systems. You can find a brief summary of the system and links to their schemas at the following link: Schema Evolution Benchmark Datasets

Enjoy! And if you use this in a paper, don’t forget to acknowledge/cite us…

2010: Scalable Architecture and Query Optimization for Transaction-time DBs with Evolving Schemas

Title: Scalable Architecture and Query Optimization for Transaction-time DBs with Evolving Schemas

Venue: SIGMOD 2010

Abstract:

The problem of archiving and querying the history of a database is made more complex by the fact that, along with the database content, the database schema also evolves with time. Indeed, archival quality can only be guaranteed by storing past database contents using the schema versions under which they were originally created. This causes major usability and scalability problems in preservation,retrieval and querying of databases with intense evolution histories, i.e., hundreds of schema versions.This scenarios are common in web information systems and scientific databases that frequently accumulate that many versions in just few years.

Our system, Archival Information Management System (AIMS), solves this usability issue by letting users write queries against a chosen schema version and then performing for the users the rewriting and execution of queries on all the appropriate schema versions. AIMS achieves scalability by using (i) and advanced storage strategy based on relational technology and attribute-level-timestamping of the history of the database content, (ii) suitable temporal indexing and clustering techniques, and (iii) novel temporal query optimizations. In particular, with AIMS we introduce a novel technique called CoalNesT that achieves unprecedented performance when temporal coalescing tuples fragmented by schema changes.Extensive experiments show that the performance and scalability thus achieved greatly exceeds those obtained by previous approaches. The AIMS technology is easily deployed by plugging into existing DBMS replication technologies, leading to very low overhead;moreover, by decoupling logical and physical layer provides multiple query interfaces, from the basic archive&query features considered in the upcoming SQL standards, to the much richer XML XQuery capabilities proposed by temporal database researchers.

More information available at Panta Rhei: Schema Evolution and Temporal Database Tools

2009 ICDE 2009: “The PRISM Workwench: Database Schema Evolution Without Tears”

“The PRISM Workwench: Database Schema Evolution Without Tears” Carlo A. Curino, Hyun J. Moon, MyungWon Ham, Carlo Zaniolo, DEMO paper at ICDE 2009

Information Systems are subject to a perpetual evolution, which is particularly pressing in Web Information Systems, due to their distributed and often collaborative nature. Such continuous adaptation process, comes with a very high cost, because of the intrinsic complexity of the task and the serious ramifications of such changes upon database-centric Information System softwares. Therefore, there is a need to automate and simplify the schema evolution process and to ensure predictability and logical independence upon schema changes. Current relational technology makes it easy to change the database content or to revise the underlaying storage and indexes but does little to support logical schema evolution which nowadays remains poorly supported by commercial tools. The PRISM system demonstrates a major new advance toward automating schema evolution (including query mapping and database conversion), by improving predictability, logical independence, and auditability of the process. In fact, PRISM exploits recent theoretical results on mapping composition, invertibility and query rewriting to provide DB Administrators with an intuitive, operational workbench usable in their everyday activities—thus enabling graceful schema evolution. In this demonstration, we will show (i) the functionality of PRISM and its supportive AJAX interface, (ii) its architecture built upon a simple SQL–inspired language of Schema Modification Operators, and (iii) we will allow conference participants to directly interact with the system to test its capabilities. Finally, some of the most interesting evolution steps of popular Web Information Systems, such as Wikipedia, will be reviewed in a brief “Saga of Famous Schema Evolutions”.

2008: “Managing the History of Metadata in support for DB Archiving and Schema Evolution”

TO APPEAR “Managing the History of Metadata in support for DB Archiving and Schema Evolution“, Carlo A. Curino, Hyun J. Moon, Carlo Zaniolo, ER Interational Workshop on Evolution and Change in Data Management (ECDM) 2008

Modern information systems, and web information systems
in particular, are faced with frequent database schema changes, which
generate the necessity to manage such evolution and preserve their his-
tory. In this paper, we describe the Panta Rhei Framework designed to
provide powerful tools that: (i) facilitate schema evolution and guide the
Database Administrator in planning and evaluating changes, (ii) support
automatic rewriting of legacy queries against the current schema version,
(iii) enable efficient archiving of the histories of data and metadata, and
(iv) support complex temporal queries over such histories. We then in-
troduce the Historical Metadata Manager (HMM), a tool designed to
facilitate the process of documenting and querying the schema evolution
itself. We use the schema history of the Wikipedia database as a telling
example of the many uses and beneﬁts of HMM.

For more information: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prima

2008 VLDB: “Managing and querying transaction-time databases under schema evolution”

“Managing and querying transaction-time databases under schema evolution” H. J. Moon, C. A. Curino, A. Deutsch, C.-Y. Hou, and C. Zaniolo. Very Large Data Base VLDB, 2008.

The old problem of managing the history of database information is now made more urgent and complex by fast-spreading web information systems. Indeed, systems such as Wikipedia are faced with the challenge of managing the history of their databases in the face of intense database schema evolution. Our PRIMA system addresses this difficult problem by introducing two key pieces of new technology. The ﬁrst is a method for publishing the history of a relational database in XML, whereby the evolution of the schema and its underlying database are given a uniﬁed representation. This temporally grouped representation makes it easy to formulate sophisticated historical queries on any given schema version using standard XQuery. The second key piece of technology provided by PRIMA is that schema evolution is transparent to the user: she writes queries against the current schema while retrieving the data from one or more schema versions. The system then performs the labor-intensive and error-prone task of rewriting such queries into equivalent ones for the appropriate versions of the schema. This feature is particularly relevant for historical queries spanning over potentially hundreds of different schema versions. The latter one is realized by (i) introducing Schema Modiﬁcation Operators (SMOs) to represent the mappings between successive schema versions and (ii) an XML integrity constraint language (XIC) to efficiently rewrite the queries using the constraints established by the SMOs. The scalability of the approach has been tested against both synthetic data and real-world data from the Wikipedia DB schema evolution history.

For more information on this project visit: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Prima

2008: Information Systems Integration and Evolution: Ontologies at Rescue

“Information Systems Integration and Evolution: Ontologies at Rescue”, Carlo A. Curino, Letizia Tanca, Carlo Zaniolo International Workshop on Semantic Technologies for System Maintenance (STSM) 2008

The life of a modern Information System is often char-
acterized by (i) a push toward integration with other sys-
tems, and (ii) the evolution of its data management core
in response to continuously changing application require-
ments. Most of the current proposals dealing with these is-
sues from a database perspective rely on the formal notions
of mapping and query rewriting. This paper presents the
research agenda of ADAM (Advanced Data And Metadata
Manager); by harvesting the recent theoretical advances in
this area into a uniﬁed framework, ADAM seeks to deliver
practical solutions to the problems of automatic schema
mapping and assisted schema evolution. The evolution of
an Information System (IS) reﬂects the changes occurring in
the application reality that the IS is modelling: thus, ADAM
exploits ontologies to capture such changes and provide
traceability and automated documentation for such evolu-
tion. Initial results and immediate beneﬁts of this approach
are presented.

PRISM: a tool for schema evolution

I just posted on-line a Demo (work in progress) of a tool for Schema Evolution support i designed and implemented during my stay in UCLA under the guidance of Carlo Zaniolo and with the collaboration of Hyun J. Moon.

Feel free to test it, criticize it, and report me feedback of any kind: Prism a tool for schema evolution (http://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrismDemo)

A Video of a typical interaction with the interface is available at: Prism a tool for schema evolution (http://yellowstone.cs.ucla.edu/schema-evolution/documents/Prism-Demo.mov)

Let me know your opinions …

2008: Schema Evolution in Wikipedia: toward a Web Information System Benchmark

Evolving the database that is at the core of an Information System represents a difﬁcult maintenance problem

that has only been studied in the framework of traditional information systems. However, the problem is likely

to be even more severe in web information systems, where open-source software is often developed through

the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an in-

depth analysis of the evolution history of the Wikipedia database and its schema; Wikipedia is the best-known

example of a large family of web information systems built using the open-source MediaWiki software. Our

study is based on: (i) a set of Schema Modiﬁcation Operators that provide a simple conceptual representation

for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed

us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of

growth and evolution. Beyond conﬁrming the initial hunch about the severity of the problem, our analysis

suggests the need for developing better methods and tools to support graceful schema evolution. Therefore,

we brieﬂy discuss documentation and automation support systems for database evolution, and suggest that the

Wikipedia case study can provide the kernel of a benchmark for testing and improving such systems.

To appear ICEIS 2008

Here you can find a copy of the paper.

For more details on this project please visit: http://yellowstone.cs.ucla.edu/schema-evolution/index.php/Schema_Evolution_Benchmark