Wikipedia Profiler: a great in-sight in the Wikipedia DB backend

Wikipedia is one of the biggest DB-based website around, with over 700Gb of data.

Both data and schema are available:

http://download.wikimedia.org/ (data) 

http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/tables.sql?view=markup (schema)

moreover the result of a profiler running on the Wikipedia installation of MediaWiki is made available at:

http://noc.wikimedia.org/cgi-bin/report.py

this means having workload and queries of the actual Wikipedia. I spent in the last months quite a lot of time working on this dataset. Soon i will post the result of my analysis, which has been accepted for publication at ICEIS 2008.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *