Downloads 2015-10

Dataset category: 
Publication Year: 
2016

This pages provides all DBpedia datasets as links to files in bzip2 1 compression. The DBpedia datasets are licensed under the terms of the Creative Commons Attribution-ShareAlike License and the GNU Free Documentation License
In addition to the RDF version of the data, we also provide a tabular version of some of the core DBpedia data sets as CSV and JSON files. See DBpediaAsTables.

See also the change log for recent changes and developments.

Contents

 

1. Wikipedia Input Files

 

All xml soure files of Wikipedia are now hosted alongside the extracted data and can be found in the dataset table as 'pages-articles' dataset.

All datasets were extracted from these Wikipedia dump files, generated in October 2015. See also all specific dump dates and times.

 

 

2. Ontology

 

The ontology version used while extracting all datasets can be downloaded here:

 

 

3. Datasets

 

The following table provides all datasets extracted by the extraction framework for every wikipedia language with more than 10.000 articles.
Select the languages you are interested in on the top of the table, filter the list of datasets with the search function.
Click on the dataset names to obtain additional information. Click on the question mark next to a download link to preview file contents.

Starting with this release we provide all datasets in two serializations:

  • turtle (ttl): provides data in n-triple format (<subject>  <predicate>  <object> .) as a subset of turtle serialization
  • quad-turtle (tql): the quad turtle serialization (<subject>  <predicate>  <object> <graph/context>.) adds context information to every triple, containing the graph name and provenance information

Some datasets are available in two versions:

  • localized: These datasets contain triples extracted from the respective Wikipedia, including the ones whose URIs do not have an equivalent English article.
  • cannonicalized: These datasets contain triples extracted from the respective Wikipedia whose subject and object resource have an equivalent English article (marked with an *).

NOTE: You can find DBpedia dumps in 128 languages at our DBpedia download server,or alternatively at the Leipzig University DBpedia download server.

 
 

 

4. Links to other datasets

 

These datasets contain triples linking DBpedia to many other datasets.

The URIs in these dumps use the generic namespace http://dbpedia.org/resource/ .

Click on the dataset names to obtain additional information. Click on the question mark next to a download link to preview file contents.

Dataset links
Links to Amsterdam Museum data nt ?
Links to BBC Wildlife Finder nt ?
Links to RDF Bookmashup nt ?
Links to Bricklink nt ?
Links to CORDIS nt ?
Links to DailyMed nt ?
Links to DBLP nt ?
Links to DBTune nt ?
Links to Diseasome nt ?
Links to DrugBank nt ?
Links to EUNIS nt ?
Links to Eurostat (Linked Statistics) nt ?
Links to Eurostat (WBSG) nt ?
Links to CIA World Factbook nt ?
Links to flickr wrappr nt ?
Links to Freebase nt ?
Links to GADM nt ?
Links to GeoNames nt ?
Links to GeoSpecies nt ?
Links to GHO nt ?
Links to Project Gutenberg nt ?
Links to Italian Public Schools nt ?
Links to LinkedGeoData nt ?
Links to LinkedMDB nt ?
Links to MusicBrainz nt ?
Links to New York Times nt ?
Links to OpenCyc nt ?
Links to OpenEI (Open Energy Info) nt ?
Links to Revyu nt ?
Links to SIDER nt ?
Links to RDF-TCM nt ?
Links to UMBEL nt ?
Links to US Census nt ?
Links to WikiCompany nt ?
Links to WordNet Classes nt ?
YAGO links nt ?
YAGO type links nt ?
YAGO types nt ?
YAGO type hierarchy nt ?

 

5. NLP Datasets

 

DBpedia also includes a number of NLP Datasets — datasets specifically targeted at supporting Computational Linguistics and Natural Language Processing (NLP) tasks. Among those, we highlight the Lexicalization Dataset, Topic Signatures, Thematic Concepts and Grammatical Genders.

1 Most files were packed with pbzip2, which generates concatenated streams. Some older bzip2 decompressors, for example Apache Commons Compress before version 1.4, cannot handle this format. Please make sure that you use the latest version. Let us know if you experience any problems.


6. Dataset Metadata as DataIDs

 

Starting with this release we provide extensive dataset metadata by adding DataIDs for all extracted languages to the respective language directories. Use these files to gather additional information about the Datasets and the files which represent them. 
A dcat:Catalog file (ttl, json-ld) pointing to all DataIDs (via dcat:record) can be found in the root folder of this release.


7. Previous versions of DBpedia