Next Steps
The next steps for the DBpedia project are:
- Synchronize Wikipedia and DBpedia by deploying the DBpedia live extraction which updates the DBpedia knowledge base immediately when a Wikipedia article changes.
- enable the community to edit and maintain the DBpedia ontology and the infobox mappings that are used by the extraction framework in a public Wiki.
- Increase the quality of the extracted data by improving and fine-tuning the extraction code.
Other next steps include:
-
Integrate both data sets into a single data set under a shared URI schema. The new URIs for Wikipedia concepts are likely to be
http://DBpedia.org/resource/{article name from the English edition of Wikipedia}. -
Extend the data set to all 1.6 million concepts within the English version of Wikipedia. This will lead to a RDF data set containing about 20–50 million triples. -
Set up a better server to serve the integrated data set as linked data and to provide a SPARQL endpoint over the data set.Data now hosted entirely in and SPARQL endpoint provided by
OpenLink Virtuoso, courtesy of
OpenLink Software. See Architecture.
- Improve the information extraction algorithms and apply some data cleansing heuristics to extracted information.
- Put some user-friendly search and browse interfaces on top of DBpedia. Candidates include Longwell.
Search DBpedia.org - Experiment with domain knowledge and inference over the data set.
- Implement some cool client applications for specific use cases.
- Set up the data extraction process to run on a regular schedule.
- Make the DBpedia data set more useful by interlinking it with additional data sources. Candidates include
-
DBLP Bibliography -
the
RDF Book Mashup -
the
MusicBrainz database -
Freebase -
Project Gutenberg -
CIA World Factbook -
statistical data from
EuroStat -
classification information from the
YAGO data set - Christian's Flickr Wrapper (this appears to have been at least partially done, as of DBpedia 3.0)
- Open Cyc (see work on Open Cyc Foundation) (this appears to have been at least partially done, as of DBpedia 3.0)
-
- Improve the classification of DBpedia entries. We are currently trying different approaches, including importing classification information from the
YAGO data set and from
Freebase. An overview about the work is given in this
blog-post by Michael Bergman.
- Give feedback to the Wikipedia community on how their templates could be changed to ease information extraction.
- Grow the DBpedia community and engage more interested parties into the project. We currently highly welcome any support in improving classification and linking external data sets to DBpedia. See also
Linking Open Data project for the last point.
- Extract infobox data from more
language versions of Wikipedia. Top candidates, in order of official Wikipedia article count —
-
English -
German -
French -
Polish -
Japanese -
Italian -
Dutch -
Portuguese -
Spanish - Russian
-
Swedish -
Chinese - Norwegian (Bokmål)
- Finnish
- Catalan
-
- Improve extraction of infobox data from supported non-English versions of Wikipedia.
- Look for somebody who wants to implement chemistry and bio extractors so that we also get the non-infobox data for these domains.
There are no files on this page.
[Display files/form]
Comments [Hide comments/form]
Information
Last Modification:
2009-11-09 16:41:22 by Chris Bizer
