Activities – Carlo Curino

Compile Hadoop Trunk on Mac Java 7

(UPDATED)

I saw my Java 6 disappear… and I don’t even want to know how to blame or rant about, but I need to compile Hadoop (trunk as of early november) on my macbook running 10.7.5 (latest updates at the time of this post) with the Oracle Java 1.7.0.8.

This is briefly what worked for me… but as a disclaimer this is a hackish quick-fix to get the thing to compile, not a clean mvn marvel.

There are two parts of this:

1) fix a pom.xml in hadoop-common-project/hadoop-annotations by adding the following

This fixes the compile errors coming from the hadoop-common-projects. (apologies for the image, but it is joomla is kicking and screaming on the tags)

2) (ugly) Fix the compile errors for hadoop-hdfs

sudo mkdir /Library/Java/JavaVirtualMachines/jdk1.7.0_08.jdk/Contents/Home/Classes  cd !$ sudo ln -s ../lib/tools.jar classes.jar

This solves the problem of ant/mvn/make/someone that looks for tools.jar into the wrong place by linking the wrong place to the right place. Now this is a disgusting trick, but after fussing around with mvn (which I don’t know well at all) for a bit, I decided to be pragmatical.

Potential (cleaner) solution that Markus Weimer suggested is the following:

I haven’t tested it yet, but here it is.

The final version of this consists in using:

<profile>

   <id>macdep</id>

   <activation>

      <os>

        <name>macprofile</name>

        <family>mac</family>

      </os>

    </activation>

    <dependencies>

    <dependency>

     <groupId>jdk.tools</groupId>

     <artifactId>jdk.tools</artifactId>

     <scope>system</scope>

     <systemPath>/Library/Java/JavaVirtualMachines/jdk1.7.0_08.jdk/Contents/Home/lib/tools.jar</systemPath>

     <version>1.7</version>

    </dependency>

   </dependencies>

  </profile>

In the <profiles> section of the main pom.xml, and in the hadoop-annotations.

TPC-C hardware resources utilization: an analysis

I’ve run several experiments on what are the HW requirements for OLTP applications, in particular for TPC-C. The results are posted on here: TPC-C performance experiments and hardware utilization.

FlyMake Emacs: continuous compilation

This youtube video shows the possibility of using Emacs with a “continuous” compilation feature turned on… it is horribly slow, since the guys who made it do not talk, but only type 🙁 so feel free to skip all the way to 4′:30” (out of 5′) to see the interesting part. Sounds good to me…

{youtube}F5Cc2W6PbL8{/youtube}

PRISM: a tool for schema evolution

I just posted on-line a Demo (work in progress) of a tool for Schema Evolution support i designed and implemented during my stay in UCLA under the guidance of Carlo Zaniolo and with the collaboration of Hyun J. Moon.

Feel free to test it, criticize it, and report me feedback of any kind: Prism a tool for schema evolution (http://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrismDemo)

A Video of a typical interaction with the interface is available at: Prism a tool for schema evolution (http://yellowstone.cs.ucla.edu/schema-evolution/documents/Prism-Demo.mov)

Let me know your opinions …

MySQL: UDF libraries

I found i nice repository of User Defined Functions for MySQL. In particular there are simple extensions of the few statistical functions offered by MySQL.

http://www.mysqludf.org/libraries.php

MicroJena: Jena Ontology Management API for Mobile Clients

In Politecnico di Milano, together with Giorgio Orsi i supervised two students Fulvio Crivellaro and Gabriele Genovese that port Jena to run on J2ME.

The results has been amazing, the API we developed is even faster than the original Jena within 4000 tuples (more than enough on Mobile Systems).

The library available for download at: http://poseidon.elet.polimi.it/ca/?page_id=59

It has also been released in the jena-contrib package, see: http://jena.sourceforge.net/contrib/contributions.html

Please contact me for questions or suggestions.

PRISM: Graceful Schema Evolution Implemented

Supporting graceful schema evolution represents an unsolved problem for traditional information systems that is further exacerbated in web information systems, such as Wikipedia and public scientific databases: in these projects based on multiparty cooperation the frequency of database schema changes has increased while tolerance for downtimes has nearly disappeared. As of today, schema evolution remains an error-prone and time-consuming undertaking, because the DB Administrator (DBA) lacks the methods and tools needed to manage and automate this endeavor by (i) predicting and evaluating the effects of the proposed schema changes, (ii) rewriting queries and applications to operate on the new schema, and (iii) migrating the database.

Our PRISM system takes a big first step toward addressing this pressing need by providing: (i) a language of Schema Modification Operators to express concisely complex schema changes, (ii) tools that allow the DBA to evaluate the effects of such changes, (iii) optimized translation of old queries to work on the new schema version, (iv) automatic data migration, and (v) full documentation of intervened changes as needed to support data provenance, database flash back, and historical queries. PRISM solves these problems by integrating recent theoretical advances on mapping composition and invertibility, into a design that also achieves usability and scalability. Wikipedia and its 170+ schema versions provided an invaluable testbed for validating PRISM tools and their ability to support legacy queries.

More details will be available about the Graceful Schema Evolution support tool Prism here.

Wikipedia Profiler: a great in-sight in the Wikipedia DB backend

Wikipedia is one of the biggest DB-based website around, with over 700Gb of data.

Both data and schema are available:

http://download.wikimedia.org/ (data)

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=markup (schema)

moreover the result of a profiler running on the Wikipedia installation of MediaWiki is made available at:

http://noc.wikimedia.org/cgi-bin/report.py

this means having workload and queries of the actual Wikipedia. I spent in the last months quite a lot of time working on this dataset. Soon i will post the result of my analysis, which has been accepted for publication at ICEIS 2008.

Google WebKit: sweeet!!!

Hi everyone,

I’m playing with the Google Web Kit is actually very nice. The idea is quite simple, it allows you to develop your application in a comfortable Java language (with few limitations e.g., JVM 1.4 compatible + use of their graphical API) and with 1-click convert everything into a super cool Web 2.0 AJAX-based interface “a là” Gmail.

The pro is that in few hours of playing I learned how to use the graphics, including some of the nifty stuff like text-completions, druggable dialogbox, succeed in splitting my application in AJAX frontend and pure Java backend (on tomcat).

I think is definitely a valuable option for interface generation. And helps you to create hot and trendy interfaces with a minimum effort. As soon as i will release the interface i’m working on i’ll link it here to show you the result.

Here follows a YouTube presentation video:

{youtube}NvRa-CxkpZI{/youtube}