Wikidata + DBpedia
Wikidata is more and more replacing the infoboxes in Wikipedia. This is a great chance for DBpedia to innovate as it frees up resource previously bound to scraping infoboxes (with a lot of potential errors and a lot of effort). Now, we want to use the data provided by Wikidata and also help the project to become the editing interface for DBpedia.
Students applying for this topic should first address how they are going to solve the prerequisite in a professional sustainable way. Then they should choose one of the remaining high-level topics and elaborate on this.
DBpedia already has a mechanism to map Wikipedia infoboxes to the DBpedia ontology. The idea here is to reuse (if possible) the same codebase & mapping syntax in order to map the Wikidata properties to the DBpedia ontology. Another challenge for this task is to synchronize changes in WikiData with DBpedia. This means, that each relevant edit in WikiData should be reflected in the triple store at http://live.dbpedia.org. Depending on the numbers of edits on WikiData and the Mappings this is quite a performance challenge (currently DBpedia Live has much less than a second time to parse one Wikipedia article already).
Note: the Wikidata topic is really important to DBpedia
Prerequisite Wikidata Live
Similar to the DBpedia Live extraction for Wikipedia, we need a live extractor that pulls changes on a minutely basis from Wikidata and loads it into the DBpedia triple store.
Todos are to create the necessary framework additions & refactoring to accommodate the parsing and extraction of Wikidata’s data. The goal here is to create and deploy a live instance that continuously extracts data. This involves the creation of the necessary extractors to get all Wikidata data and interface with DBpedia ontology mappings for Wikidata to generate triples directly into the DBpedia ontology.
See also: http://www.mediawiki.org/wiki/[..]013#3rd_party_client
If you have convinced us that you are capable of creating a Wikidata Live instance (Note that done right (good docu and intensive testing ) this is already worth 3/4th of the project), then you can choose to go deeper into one of the following directions:
- Entity recommendations and fact validation – Create a web service that recommends properties and values for new entries. This idea is from here actually: http://www.mediawiki.org/wiki/[..]013#Entity_Suggester Additionally, it might be possible to easily check some facts by comparing with authoritative datasets: http://datahub.io/dataset/uk-postcodes
- Data Customization – Several DBpedia stakeholders from academia and enterprises are looking for a way to customize the current mappings to the ontology. We need a power tool, that allows them to provide custom-tailored mappings (private or public) and filter and then run the Wikidata Live themselves to get exactly the data they want. Data quality assurance and tests to improve data quality for academia and enterprises (not for wikipedia infoboxes) would be nice.
- Admin tools – create an admin interface, make good & fancy statistics, documentation. Maybe implement new feeders or a new type on syncing in addition to OAI-PMH.