DBpedia Blog

YEAH! We did it !! (and it is not an April fool’s joke) – New 2015-10 DBpedia release

Summary

We proudly present our new 2015-10 DBpedia release, which is abailable now via:  http://dbpedia.org/sparql. Go an check it out!

This DBpedia release is based on updated Wikipedia dumps dating from October 2015 featuring a significantly expanded base of information as well as richer and cleaner data based on the DBpedia ontology.

So, what did we do?

The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2015-10 ontology encompasses

  • 739 classes (DBpedia 2015-04: 735)
  • 1,099 properties with reference values (a/k/a object properties) (DBpedia 2015-04: 1,098)
  • 1,596 properties with typed literal values (a/k/a datatype properties) (DBpedia 2015-04: 1,583)
  • 132 specialized datatype properties (DBpedia 2015-04: 132)
  • 407 owl:equivalentClass and 222 owl:equivalentProperty mappings external vocabularies (DBpedia 2015-04: 408 and 200, respectively)

The editors community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2015-10 extraction, we used a total of 5553 template mappings (DBpedia 2015-04: 4317 mappings). For the first time the top language, gauged by number of mappings, is Dutch (606 mappings), surpassing the English community (600 mappings).

And what are the (breaking) changes ?

  • English DBpedia switched to IRIs from URIs. 
  • The instance-types dataset is now split to two files:
    • “instance-types” contains only direct types.
    • “Instance-types-transitive” contains transitive types.
    • The “mappingbased-properties” file is now split into three (3) files:
      • “geo-coordinates-mappingbased”
      • “mappingbased-literals” contains mapping based statements with literal values.
      • “mappingbased-objects”
  • We added a new extractor for citation data.
  • All datasets are available in .ttl and .tql serialization 
  • We are providing DBpedia as a Docker image.
  • From now on, we provide extensive dataset metadata by adding DataIDs for all extracted languages to the respective language directories.
  • In addition, we revamped the dataset table on the download-page. It’s created dynamically based on the DataID of all languages. Likewise, the tables on the statistics- page are now based on files providing information about all mapping languages.
  • From now on, we also include the original Wikipedia dump files(‘pages_articles.xml.bz2’) alongside the extracted datasets.
  • A complete changelog can always be found in the git log.

And what about the numbers?

Altogether the new DBpedia 2015-10 release consists of 8.8 billion (2015-04: 6.9 billion) pieces of information (RDF triples) out of which 1.1 billion (2015-04: 737 million) were extracted from the English edition of Wikipedia, 4.4 billion (2015-04: 3.8 billion) were extracted from other language editions, and 3.2 billion (2015-04: 2.4 billion) came from  DBpedia Commons and Wikidata. In general we observed a significant growth in raw infobox and mapping-based statements of close to 10%.  Thorough statistics are available via the Statistics page.

And what’s up next?

We will be working to move away from the mappings wiki but we will have at least one more mapping sprint. Moreover, we have some cool ideas for GSOC this year. Additional mentors are more than welcome. 🙂

And who is to blame for the new release?

We want to thank all editors that contributed to the DBpedia ontology mappings via the Mappings Wiki, all the GSoC students and mentors working directly or indirectly on the DBpedia release and the whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward.

Special thanks go to Markus Freudenberg and Dimitris Kontokostas (University of Leipzig), Volha Bryl (University of Mannheim / Springer), Heiko Paulheim (University of Mannheim), Václav Zeman and the whole LHD team (University of Prague), Marco Fossati (FBK), Alan Meehan (TCD), Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy), Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software), OpenLink Software (http://www.openlinksw.com/), Ruben Verborgh from Ghent University – iMinds, Ali Ismayilov (University of Bonn), Vladimir Alexiev (Ontotext) and members of the DBpedia Association, the AKSW and the department for Business Information Systems of the University of Leipzig for their committment in putting tremendous time and effort to get this done.

The work on the DBpedia 2015-10 release was financially supported by the European Commission through the project ALIGNED – quality-centric, software and data engineering  (http://aligned-project.eu/).

 

Detailed information about the new release are available here. For more information about DBpedia, please visit our website or follow us on Facebook!

Have fun and all the best!

Yours

DBpedia Association