Datasets

You can find our latest DBpedia release below, as well as a number of previously released datasets. Enjoy!

Latest Release

This release took us longer than expected. We had to deal with multiple issues and included new data. Most notable is the addition of the NIF annotation datasets for each language, recording the whole wiki text, its basic structure (sections, titles, paragraphs, etc.) and the included text links. We hope that researchers and developers, working on NLP-related tasks, will find this addition most rewarding. The DBpedia Open Text Extraction Challenge (next deadline Mon 17 July for SEMANTiCS 2017) was introduced to instigate new fact extraction based on these datasets.

Previous Releases

 This pages provides downloads of the DBpedia datasets.

The DBpedia data set uses a large multi-domain ontology which has been derived from Wikipedia. The English version of the DBpedia 3.8 data set describes 3.77 million "things" with 400 million "facts".

This pages provides downloads of the DBpedia datasets.

This pages provides downloads of the DBpedia datasets.

This pages provides downloads of the DBpedia datasets.

This pages provides downloads of the DBpedia datasets. The DBpedia datasets are licensed under the terms of the Creative Commons Attribution-ShareAlike License and the GNU Free Documentation License

 The downloads are provided as N-Triples and in CSV format. All files are bz2 packed.

This pages provides downloads of the DBpedia datasets.

This pages provides downloads of the DBpedia datasets.

 This pages provides downloads of the DBpedia datasets.

This pages provides downloads of the DBpedia datasets.

This pages provides the DBpedia dataset for download. The dataset has been extracted from the July 16th, 2007 (enwiki20070716)database dump of Wikipedia.

we are happy to announce the release of DBpedia 2015-04 (also known as: 2015 A). The new release is based on updated Wikipedia dumps dating from February/March 2015 and features an enlarged DBpedia ontology with more infobox to ontology mappings, leading to richer and cleaner data.

This pages provides all DBpedia datasets as links to files in bzip2 1 compression.

The English Wikipedia has more than a hundred edits per minute. A large part of the knowledge in Wikipedia is not static, but frequently updated, e.g., new movies or sports and political events. This makes Wikipedia an extremely rich, crowdsourced information hub for events. We have created a dataset based on DBpedia Live. Therefore, events are extracted not based on resource description them selves, but on the changes that happen to resource descriptions. The dataset gets daily updated and provides a list of headlined events linked to the actual update and resource snapshots.

 

This DBpedia release is based on updated Wikipedia dumps dating from October 2015 featuring a significantly expanded base of information as well as richer and (hopefully) cleaner data based on the DBpedia ontology.

Lexicalization is defined by WordNet as "the process of making a word to express a concept" [1]. In the context of this project, lexicalizations are surface forms referring to a given DBpedia Resource. The DBpedia Lexicalizations Dataset stores the relationships between DBpedia Resources and a set of surface forms that we found to be referent to those resources in Wikipedia.


This release is based on updated Wikipedia dumps dating from March/April 2016 featuring a significantly expanded base of information as well as richer and (hopefully) cleaner data based on the DBpedia ontology.

 

Currently available languages: 
English, German, French, Russian, Greek, Vietnamese
In the works: Greece, Vietnamese
Need data from other languages? Help us creating wrappers for each language editions (if you know Regex, XML and Wiktionary, an initial wrapper can be created in less than one day.)

Intro

 The DBpedia data set uses a large multi-domain ontology which has been derived from Wikipedia. The English version of the DBpedia 2014 data set currently describes 4.58 million "things" with 583 million "facts".