DBpedia version 2015-10

Dataset category: 
Publication Year: 
2016

 

This DBpedia release is based on updated Wikipedia dumps dating from October 2015 featuring a significantly expanded base of information as well as richer and (hopefully) cleaner data based on the DBpedia ontology.

You can download the new DBpedia datasets in variety of RDF Document formats from http://wiki.dbpedia.org/Downloads2015-10 or directly here: http://downloads.dbpedia.org/2015-10/.

Statistics

The English version of the DBpedia knowledge base currently describes 6.2M things of which 4.6M have abstracts, 955K have geo coordinates, and 1.54M have depictions. In total, 5M resources are classified in a consistent ontology, which comprises 1.6M persons, 800K places (including 500K populated places), 480K works (including 133K music albums, 102K films, and 20K video games), 267K organizations (including 66K companies and 52K educational institutions), 293K species, and 5K diseases. The total number of resources in English DBpedia is 16.4M which, besides the 6.2M resources, include 1.3M skos concepts (categories), 7.1M redirect pages, 254K disambiguation pages, and 1.6M intermediate nodes.

 

Altogether the DBpedia 2015-10 release consists of 8.8 billion (2015-04: 6.9 billion) pieces of information (RDF triples) out of which 1.1 billion (2015-04: 737 million) were extracted from the English edition of Wikipedia, 4.4 billion (2015-04: 3.8 billion) were extracted from other language editions, and 3.2 billion (2015-04: 2.4 billion) came from  DBpedia Commons and Wikidata. In general we observed a significant growth in raw infobox and mapping-based statements of close to 10%.

 

Thorough statistics can be found on the Statistics page, and general information on the DBpedia datasets is here.

Community

The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2015-10 ontology encompasses

  • 739 classes (DBpedia 2015-04: 735)

  • 1,099 properties with reference values (a/k/a object properties) (DBpedia 2015-04: 1,098)

  • 1,596 properties with typed literal values (a/k/a datatype properties) (DBpedia 2015-04: 1,583)

  • 132 specialized datatype properties (DBpedia 2015-04: 132)

  • 407 owl:equivalentClass and 222 owl:equivalentProperty mappings external vocabularies (DBpedia 2015-04: 408 and 200, respectively)

 

The editors community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2015-10 extraction, we used a total of 5553 template mappings (DBpedia 2015-04: 4317 mappings). For the first time the top language, gauged by number of mappings, is Dutch (606 mappings), surpassing the English community (600 mappings).

(Breaking) Changes

  • English DBpedia switched to IRIs from URIs. Some URIs will no longer resolve, and we provide the “uri-same-as-iri” dataset for English to ease the transition. For more technical details on this issue read section 6 p. 19-23 (old but still valid).

  • The instance-types dataset is now split to two files:

    • "instance-types" contains only direct types.

    • "Instance-types-transitive" contains the transitive types of a resource based on the DBpedia ontology.

  • The "mappingbased-properties" file is now split into three (3) files:

    • “geo-coordinates-mappingbased” contains the coordinates originating from the mappings wiki. (The “geo-coordinates” continues to provide the coordinates originating from the GeoExtractor.)

    • “mappingbased-literals” contains mapping based statements with literal values.

    • “mappingbased-objects” contains mapping based statements with object values.

      • “mappingbased-objects-disjoint-[domain|range]” contains statements that were filtered out from the “mappingbased-objects” datasets as errors, but are still provided

  • We added a new extractor for citation data.

  • All datasets are available in .ttl and .tql serialization (nt and nq serializations were neglected for reasons of redundancy and server capacity).

  • We are providing DBpedia as a Docker image.
    Dockerized-DBpedia: Creates and runs an Virtuoso Open Source instance preloaded with the latest DBpedia dataset inside a Docker container.

  • Starting with this release, we provide extensive dataset metadata by adding DataIDs for all extracted languages to the respective language directories.

  • In addition, we revamped the dataset table on the download-page. It’s created dynamically based on the DataID of all languages. Likewise, the tables on the statistics-page are now based on files providing information about all mapping languages.

  • From now on, we also include the original Wikipedia dump files(‘pages_articles.xml.bz2’) alongside the extracted datasets.

  • A complete changelog can always be found in the git log.

Upcoming Changes

  • We are working to move away from the mappings wiki but we will have at least one more mapping sprint.

  • We have some cool ideas for GSOC this year. Additional mentors are more than welcome. :-)

Extended Type System to cover Articles without Infobox

Until the DBpedia 3.8 release, a concept was only assigned a type (like person or place) if the corresponding Wikipedia article contains an infobox indicating this type. Starting from the 3.9 release, we provide type statements for articles without infobox that are inferred based on the link structure within the DBpedia knowledge base using the algorithm described in Paulheim/Bizer 2014. For the new release, an improved version of the algorithm was run to produce type information for 400,000 things that were formerly not typed. A similar algorithm (presented in the same paper) was used to identify and remove potentially wrong statements from the knowledge base.

 

In addition, this release include four new type datasets, although not included in the online SPARQL endpoint: 1) LHD datasets for English, German and Dutch and 2) DBTax for English.

Both of these datasets use a typing system beyond the DBpedia ontology and we provide a subset, mapped to the DBpedia ontology (dbo) and a full one with all types (ext).

Credits

Lots of thanks to

  • Markus Freudenberg (University of Leipzig) for taking over the whole release process and creating the revamped download & statistics pages.

  • Dimitris Kontokostas for conveying his considerable knowledge of the extraction and release process.

  • Volha Bryl (University of Mannheim / Springer) for their work on the previous release and their continuous support in this release.

  • All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.

  • The whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward.

  • Heiko Paulheim (University of Mannheim) for re-running his algorithm to generate additional type statements for formerly untyped resources and identify and removed wrong statements.

  • Václav Zeman and the whole LHD team (University of Prague) for their contribution of additional DBpedia types

  • Marco Fossati (FBK) for contributing the DBTax types

  • Alan Meehan (TCD) for performing a big external link cleanup

  • Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing the links from DOLCE to DBpedia ontology.

  • Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that provides 5-Star Linked Open Data publication and SPARQL Query Services.

  • OpenLink Software (http://www.openlinksw.com/) altogether for providing the SPARQL Query Services and Linked Open Data publishing  infrastructure for DBpedia in addition to their continuous infrastructure support.

  • Ruben Verborgh from Ghent University – iMinds for publishing the dataset as Triple Pattern Fragments, and iMinds for sponsoring DBpedia’s Triple Pattern Fragments server.

  • Ali Ismayilov (University of Bonn) for extending the DBpedia Wikidata dataset.

  • Vladimir Alexiev (Ontotext) for leading a successful mapping and ontology clean up effort.

  • All the GSoC students and mentors working directly or indirectly on the DBpedia release

  • Special thanks to members of the DBpedia Association, the AKSW and the department for Business Information Systems of the University of Leipzig.

 

The work on the DBpedia 2015-10 release was financially supported by the European Commission through the project ALIGNED – quality-centric, software and data engineering  (http://aligned-project.eu/).

More information about DBpedia is found at http://dbpedia.org as well as in the new overview article about the project available at http://wiki.dbpedia.org/Publications.

 

Have fun with the new DBpedia 2015-10 release!

 

 

 

Released by the DBpedia Association 2016
Markus Freudenberg,
Dimitris Kontokostas,
Sebastian Hellmann