DBpedia-Live Tutorial

DBpedia-Live Tutorial


In this tutorial we give detailed description about how to install and configure DBpedia-Live.


 

 

1 Requirement

  1. JDK 1.6 or higher.
  2. MySQL for local MediaWiki.
  3. Apache server.
  4. OpenLink Virtuoso, which can be downloaded from http://sourceforge.net/projects/virtuoso/.


The process of installing and running DBpedia-Live is divided into two major steps:

  1. Local MediaWiki installation.
  2. DBpedia-Live source download and configuration.


Those steps are described in details in the following sections.

2 Local MediaWiki Installation

  1. Download the latest Wikipedia SQL dumps from http://dumps.wikimedia.org/enwiki/latest/. The most important file is pages-articles.xml.bz2.
  2. Run "clean.sh" script, which does some cleaning to the dump files in order to enable later insertion into MySQL.
  3. Use “import.sh” script to import the data of the dump file(s) to MySQL database.
  4. Download and install MediaWiki; for more details about the process please check those links http://www.mediawiki.org/wiki/[..]_MediaWiki_on_Ubuntuhttps://help.ubuntu.com/community/MediaWiki.
  5. Install the MediaWiki extensions as described in http://www.mediawiki.org/wiki/[..]MediaWiki_extensions. One of these extensions, specifically "OAIRepository", enables you to later get a continuous stream of updates from your local Wikipedia.
  6. You can find your MediaWiki installation under /var/www.
  7. Open the file titled “LocalSettings.php” which contains all configurations the MediaWiki requires to access MySQL.
  8. Under section “## Database settings” of that file, please set your specific database settings, e.g. username and password of yourMySQL database, as follows:
    • $wgDBname = "dbpediaDB"; //Name of database to which you imported Wikipedia dumps
    • $wgDBuser = "xyz"; //Username
    • $wgDBpassword = "xyz"; //Password
  9. You can start your local MediaWiki via http://localhost/mediawiki.
  10. Do some configuration to your MediaWiki, e.g. set the root password of your local Wikipedia.
  11. In order to be able to get a feed stream of updates from your local Wikipedia, you should install a MediaWiki extension called “OAIRepository”. More details about installing this extension can be found at http://www.mediawiki.org/wiki/Extension:OAIRepository.
  12. In order to keep your local Wikipedia in sync with the official Wikipedia, you should configure your MediaWiki to get the stream of updates from the official Wikipedia and feed them to your local Wikipedia.
  13. Under your MediaWiki installation folder, navigate to folder “OAI”, and open a file called “oaiUpdate.php”.
  14. Set variable “$oaiSourceRepository” to the OAI repository address of Wikipedia, so that your MediaWiki can access its stream of updates. Please note that this URL should be augmented with the username and password required to access Wikipedia's update stream.
  15. Run the command “php oaiUpdate.php”, which will fetch the updates from Wikipedia and feed them into your local Wikipedia.

3 DBpedia-Live source code download and configuration

  1. Download the latest version of DBpedia source code. This can be done by executing the command "hg clone http://dbpedia.hg.sourceforge.[..]xtraction_framework".
  2. Navigate to the live subfolder of the project, and find the file called “live.ini”.
  3. Change the settings of DBpedia-Live using that file, particularly set the following parameters:
    • Store.dsn = jdbc:virtuoso://localhost:1111; The address of your local Virtuoso
    • Store.user = dba; Username required to access Virtuoso
    • Store.pw = dba; Password of Virtuoso
    • graphURI = http://live.dbpedia.org; The URI of the graph to which the updates will be made
    • publishDiffRepoPath=/home/xyz/dbpedia_publish; The folder to which the changesets will be written
    • statisticsFilePath=/home/xyz/publishdata/instancesstats.txt; The filename of the statistics file
  4. Create a file called “pw.txt” and write in it the password required to access the update feed of your local Wikipedia.
  5. Modify file “live.xml”, in order to specify which extractors are active and which are not.
  6. Modify the AbstractExtractor by setting the variable “apiUrl” to the address of your local MediaWiki. This step is important in order for DBpedia-Live to able to extract the abstracts correctly.
  7. Compile the application using “mvn scala:compile”.
  8. Run the application using “mvn scala:run”.


For more details about the project please read the paper titled “DBpedia and the Live Extraction of Structured Data from Wikipedia”. This paper is available here.