Compile Hadoop Trunk on Mac Java 7

(UPDATED)

I saw my Java 6 disappear… and I don’t even want to know how to blame or rant about, but I need to compile Hadoop (trunk as of early november) on my macbook running 10.7.5 (latest updates at the time of this post) with the Oracle Java 1.7.0.8.

This is briefly what worked for me… but as a disclaimer this is a hackish quick-fix to get the thing to compile, not a clean mvn marvel.

There are two parts of this:

1) fix a pom.xml in hadoop-common-project/hadoop-annotations by adding the following

 

This fixes the compile errors coming from the hadoop-common-projects. (apologies for the image, but it is joomla is kicking and screaming on the tags)

 

2) (ugly) Fix the compile errors for hadoop-hdfs

sudo mkdir /Library/Java/JavaVirtualMachines/jdk1.7.0_08.jdk/Contents/Home/Classes  cd !$ sudo ln -s ../lib/tools.jar classes.jar

This solves the problem of ant/mvn/make/someone that looks for tools.jar into the wrong place by linking the wrong place to the right place. Now this is a disgusting trick, but after fussing around with mvn (which I don’t know well at all) for a bit, I decided to be pragmatical.

Potential (cleaner) solution that Markus Weimer suggested is the following:

I haven’t tested it yet, but here it is.

 

The final version of this consists in using:


<profile>
   <id>macdep</id>
   <activation>
      <os>
        <name>macprofile</name>
        <family>mac</family>
      </os>
    </activation>
    <dependencies>
    <dependency>
     <groupId>jdk.tools</groupId>
     <artifactId>jdk.tools</artifactId>
     <scope>system</scope>
     <systemPath>/Library/Java/JavaVirtualMachines/jdk1.7.0_08.jdk/Contents/Home/lib/tools.jar</systemPath>
     <version>1.7</version>
    </dependency>
   </dependencies>
  </profile>

In the <profiles> section of the main pom.xml, and in the hadoop-annotations.

 

CFP: Data Management in the Cloud 2013

I am co-organizer with Ashraf Aboulnaga for the Second Workshop on Data Management in the Cloud, co-located with ICDE (Brisbane, Australia). We expect it to be a highly interactive forum for both practitioners and academics interested in the space of data management and cloud computing, and we welcome both novel research and industry experience papers.

CFP and details at: http://db.uwaterloo.ca/dmc2013/ and below the high level idea of the workshop:

Cloud computing has emerged as a promising computing and business model. By decoupling the management of the infrastructure (cloud providers) from its use (cloud tenants), and by allowing the sharing of massive infrastructures, cloud computing delivers unprecedented economical and scalability benefits for existing applications and enables many new scenarios. This comes at the cost of increased complexity in managing a highly multi-tenant infrastructure and limited visibility/access posing new questions on attribution, pricing, isolation, scalability, fault-tolerance, load balancing, etc. This is particularly challenging for stateful, data-intensive applications.

This unique combination of opportunities and challenges attracted much attention from both academia and industry. The DMC workshop aims at bringing researchers and practitioners in cloud computing and data management systems together to discuss the research issues at the intersection of these two areas, and also to draw more attention from the larger data management and systems research communities to this new and highly promising field.

 

See you in Australia!!!

FlyMake Emacs: continuous compilation

This youtube video shows the possibility of using Emacs with a “continuous” compilation feature turned on… it is horribly slow, since the guys who made it do not talk, but only type 🙁 so feel free to skip all the way to 4′:30” (out of 5′) to see the interesting part. Sounds good to me…

{youtube}F5Cc2W6PbL8{/youtube}

PRISM: a tool for schema evolution


 

 

I just posted on-line a Demo (work in progress) of a tool for Schema Evolution support i designed and implemented during my stay in UCLA under the guidance of Carlo Zaniolo and with the collaboration of Hyun J. Moon.

 Feel free to test it, criticize it, and report me feedback of any kind: Prism a tool for schema evolution (http://yellowstone.cs.ucla.edu/schema-evolution/index.php/PrismDemo)

 A Video of a typical interaction with the interface is available at: Prism a tool for schema evolution (http://yellowstone.cs.ucla.edu/schema-evolution/documents/Prism-Demo.mov)

 Let me know your opinions …

 

 

MicroJena: Jena Ontology Management API for Mobile Clients

In Politecnico di Milano, together with Giorgio Orsi i supervised two students Fulvio Crivellaro and Gabriele Genovese that port Jena to run on J2ME.

The results has been amazing, the API we developed is even faster than the original Jena within 4000 tuples (more than enough on Mobile Systems).

The library available for download at: http://poseidon.elet.polimi.it/ca/?page_id=59

It has also been released in the jena-contrib package, see: http://jena.sourceforge.net/contrib/contributions.html

Please contact me for questions or suggestions.

 

PRISM: Graceful Schema Evolution Implemented

Supporting graceful schema evolution represents an unsolved problem for traditional information systems that is further exacerbated in web information systems, such as Wikipedia and public scientific databases: in these projects based on multiparty cooperation the frequency of database schema changes has increased while tolerance for downtimes has nearly disappeared. As of today, schema evolution remains an error-prone and time-consuming undertaking, because the DB Administrator (DBA) lacks the methods and tools needed to manage and automate this endeavor by (i) predicting and evaluating the effects of the proposed schema changes, (ii) rewriting queries and applications to operate on the new schema, and (iii) migrating the database.

Our PRISM system takes a big first step toward addressing this pressing need by providing: (i) a language of Schema Modification Operators to express concisely complex schema changes, (ii) tools that allow the DBA to evaluate the effects of such changes, (iii) optimized translation of old queries to work on the new schema version, (iv) automatic data migration, and (v) full documentation of intervened changes as needed to support data provenance, database flash back, and historical queries. PRISM solves these problems by integrating recent theoretical advances on mapping composition and invertibility, into a design that also achieves usability and scalability. Wikipedia and its 170+ schema versions provided an invaluable testbed for validating PRISM tools and their ability to support legacy queries.

 More details will be available about the Graceful Schema Evolution support tool Prism here.

Wikipedia Profiler: a great in-sight in the Wikipedia DB backend

Wikipedia is one of the biggest DB-based website around, with over 700Gb of data.

Both data and schema are available:

http://download.wikimedia.org/ (data) 

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=markup (schema)

moreover the result of a profiler running on the Wikipedia installation of MediaWiki is made available at:

http://noc.wikimedia.org/cgi-bin/report.py

this means having workload and queries of the actual Wikipedia. I spent in the last months quite a lot of time working on this dataset. Soon i will post the result of my analysis, which has been accepted for publication at ICEIS 2008.

 

 

Google WebKit: sweeet!!!

Hi everyone,

I’m playing with the Google Web Kit  is actually very nice. The idea is quite simple, it allows you to develop your application in a comfortable Java language (with few limitations e.g., JVM 1.4 compatible + use of their graphical API) and with 1-click convert everything into a super cool Web 2.0 AJAX-based interface “a là” Gmail. 

The pro is that in few hours of playing I learned how to use the graphics, including some of the nifty stuff like text-completions, druggable dialogbox, succeed in splitting my application in AJAX frontend and pure Java backend (on tomcat).

I think is definitely a valuable option for interface generation. And helps you to create hot and trendy interfaces with a minimum effort. As soon as i will release the interface i’m working on i’ll link it here to show you the result.  

Here follows a YouTube presentation video:  

{youtube}NvRa-CxkpZI{/youtube}