DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. We hope that this work will make it easier for the huge amount of information in Wikipedia to be used in some new interesting ways. Furthermore, it might inspire new mechanisms for navigating, linking, and improving the encyclopedia itself.


Feed Title: News (last 3 items)

DBpedia Version 2014 released

Hi all,

we are happy to announce the release of DBpedia 2014.

The most important improvements of the new release compared to DBpedia 3.9 are:

1. the new release is based on updated Wikipedia dumps dating from April / May 2014 (the 3.9 release was based on dumps from March / April 2013), leading to an overall increase of the number of things described in the English edition from 4.26 to 4.58 million things.

2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner data.

The English version of the DBpedia knowledge base currently describes 4.58 million things, out of which 4.22 million are classified in a consistent ontology (http://wiki.dbpedia.org/Ontology2014), including 1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases.

We provide localized versions of DBpedia in 125 languages. All these versions together describe 38.3 million things, out of which 23.8 million are localized descriptions of things that also exist in the English version of DBpedia. The full DBpedia data set features 38 million labels and abstracts in 125 different languages, 25.2 million links to images and 29.8 million links to external web pages; 80.9 million links to Wikipedia categories, and 41.2 million links to YAGO categories. DBpedia is connected with other Linked Datasets by around 50 million RDF links.

Altogether the DBpedia 2014 release consists of 3 billion pieces of information (RDF triples) out of which 580 million were extracted from the English edition of Wikipedia, 2.46 billion were extracted from other language editions.

Detailed statistics about the DBpedia data sets in 28 popular languages are provided at Dataset Statistics page (http://wiki.dbpedia.org/Datasets2014/DatasetStatistics).

The main changes between DBpedia 3.9 and 2014 are described below. For additional, more detailed information please refer to the DBpedia Change Log (http://wiki.dbpedia.org/Changelog).

 1. Enlarged Ontology

The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2014 ontology encompasses

  • 685  classes (DBpedia 3.9: 529)
  • 1,079 object properties (DBpedia 3.9: 927)
  • 1,600 datatype properties (DBpedia 3.9: 1,290)
  • 116 specialized datatype properties (DBpedia 3.9: 116)
  • 47 owl:equivalentClass and 35 owl:equivalentProperty mappings to http://schema.org

2. Additional Infobox to Ontology Mappings

The editors community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2014 extraction, we used 4,339 mappings (DBpedia 3.9: 3,177 mappings), which are distributed as follows over the languages covered in the release.

  • English: 586 mappings
  • Dutch: 469 mappings
  • Serbian: 450 mappings
  • Polish: 383 mappings
  • German: 295 mappings
  • Greek: 281 mappings
  • French: 221 mappings
  • Portuguese: 211 mappings
  • Slovenian: 170 mappings
  • Korean: 148 mappings
  • Spanish: 137 mappings
  • Italian: 125 mappings
  • Belarusian: 125 mappings
  • Hungarian: 111 mappings
  • Turkish: 91 mappings
  • Japanese: 81 mappings
  • Czech: 66 mappings
  • Bulgarian: 61 mappings
  • Indonesian: 59 mappings
  • Catalan: 52 mappings
  • Arabic: 52 mappings
  • Russian: 48 mappings
  • Basque: 37 mappings
  • Croatian: 36 mappings
  • Irish: 17 mappings
  • Wiki-Commons: 12 mappings
  • Welsh: 7 mappings
  • Bengali: 6 mappings
  • Slovak: 2 Mappings

3. Extended Type System to cover Articles without Infobox

 Until the DBpedia 3.8 release, a concept was only assigned a type (like person or place) if the corresponding Wikipedia article contains an infobox indicating this type. Starting from the 3.9 release, we provide type statements for articles without infobox that are inferred based on the link structure within the DBpedia knowledge base using the algorithm described in Paulheim/Bizer 2014 (http://www.heikopaulheim.com/documents/ijswis_2014.pdf). For the new release, an improved version of the algorithm was run to produce type information for 400,000 things that were formerly not typed. A similar algorithm (presented in the same paper) was used to identify and remove potentially wrong statements from the knowledge base.

 4. New and updated RDF Links into External Data Sources

 We updated the following RDF link sets pointing at other Linked Data sources: Freebase, Wikidata, Geonames and GADM. For an overview about all data sets that are interlinked from DBpedia please refer to http://wiki.dbpedia.org/Interlinking.

Accessing the DBpedia 2014 Release 

 You can download the new DBpedia datasets in RDF format from http://wiki.dbpedia.org/Downloads.
In addition, we provide 
some of the core DBpedia data also in tabular form (CSV and JSON formats) at http://wiki.dbpedia.org/DBpediaAsTables.

 As usual, the new dataset is also available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql.

Credits

 Lots of thanks to

  1. Daniel Fleischhacker (University of Mannheim) and Volha Bryl (University of Mannheim) for improving the DBpedia extraction framework, for extracting the DBpedia 2014 data sets for all 125 languages, for generating the updated RDF links to external data sets, and for generating the statistics about the new release.
  2. All editors that contributed to the DBpedia ontology mappings via the Mappings Wiki.
  3.  The whole DBpedia Internationalization Committee for pushing the DBpedia internationalization forward.
  4. Dimitris Kontokostas (University of Leipzig) for improving the DBpedia extraction framework and loading the new release onto the DBpedia download server in Leipzig.
  5. Heiko Paulheim (University of Mannheim) for re-running his algorithm to generate additional type statements for formerly untyped resources and identify and removed wrong statements.
  6. Petar Ristoski (University of Mannheim) for generating the updated links pointing at the GADM database of Global Administrative Areas. Petar will also generate an updated release of DBpedia as Tables soon.
  7. Aldo Gangemi (LIPN University, France & ISTC-CNR, Italy) for providing the links from DOLCE to DBpedia ontology.
  8.  Kingsley Idehen, Patrick van Kleef, and Mitko Iliev (all OpenLink Software) for loading the new data set into the Virtuoso instance that serves the Linked Data view and SPARQL endpoint.
  9.  OpenLink Software (http://www.openlinksw.com/) altogether for providing the server infrastructure for DBpedia.
  10. Michael Moore (University of Waterloo, as an intern at the University of Mannheim) for implementing the anchor text extractor and and contribution to the statistics scripts.
  11. Ali Ismayilov (University of Bonn) for implementing Wikidata extraction, on which the interlanguage link generation was based.
  12. Gaurav Vaidya (University of Colorado Boulder) for implementing and running Wikimedia Commons extraction.
  13. Andrea Di Menna, Jona Christopher Sahnwaldt, Julien Cojan, Julien Plu, Nilesh Chakraborty and others who contributed improvements to the DBpedia extraction framework via the source code repository on GitHub.
  14.  All GSoC mentors and students for working directly or indirectly on this release: https://github.com/dbpedia/extraction-framework/graphs/contributors

 The work on the DBpedia 2014 release was financially supported by the European Commission through the project LOD2 – Creating Knowledge out of Linked Data (http://lod2.eu/).

More information about DBpedia is found at http://dbpedia.org/About as well as in the new overview article about the project available at  http://wiki.dbpedia.org/Publications.

Have fun with the new DBpedia 2014 release!

Cheers,

Daniel Fleischhacker, Volha Bryl, and Christian Bizer

 

 

DBpedia Spotlight V0.7 released

DBpedia Spotlight is an entity linking tool for connecting free text to DBpedia through the recognition and disambiguation of entities and concepts from the DBpedia KB.

We are happy to announce Version 0.7 of DBpedia Spotlight, which is also the first official release of the probabilistic/statistical implementation.

More information about as well as updated evaluation results for DBpedia Spotlight V0.7 are found in this paper:

Joachim Daiber, Max Jakob, Chris Hokamp, Pablo N. Mendes: Improving Efficiency and Accuracy in Multilingual Entity ExtractionISEM2013. 

The changes to the statistical implementation include:

  • smaller and faster models through quantization of counts, optimization of search and some pruning
  • better handling of case
  • various fixes in Spotlight and PigNLProc
  • models can now be created without requiring a Hadoop and Pig installation
  • UIMA support by @mvnural
  • support for confidence value

See the release notes at [1] and the updated demos at [4].

Models for Spotlight 0.7 can be found here [2].

Additionally, we now provide the raw Wikipedia counts, which we hope will prove useful for research and development of new models [3].

A big thank you to all developers who made contributions to this version (with special thanks to Faveeo and Idio). Huge thanks to Jo for his leadership and continued support to the community.

Cheers,
Pablo Mendes,

on behalf of Joachim Daiber and the DBpedia Spotlight developer community.

[1] – https://github.com/dbpedia-spotlight/dbpedia-spotlight/releases/tag/release-0.7

[2] – http://spotlight.sztaki.hu/downloads/

[3] – http://spotlight.sztaki.hu/downloads/raw

[4] – http://dbpedia-spotlight.github.io/demo/

(This message is an adaptation of Joachim Daiber’s message to the DBpedia Spotlight list. Edited to suit this broader community and give credit to him.)

Call for Ideas and Mentors for GSoC 2014 DBpedia + Spotlight joint proposal (please contribute within the next days)

We started to draft a document for submission at Google Summer of Code 2014:
http://dbpedia.org/gsoc2014

We are still in need of ideas and mentors.  If you have any improvements on DBpedia or DBpedia Spotlight that you would like to have done, please submit it in the ideas section now. Note that accepted GSoC students will receive about 5000 USD for a three months, which can help you to estimate the effort and size of proposed ideas. It is also ok to extend/amend existing ideas (as long as you don’t hi-jack them). Please edit here:
https://docs.google.com/document/d/13YcM-LCs_W3-0u-s24atrbbkCHZbnlLIK3eyFLd7DsI/edit?pli=1

Becoming a mentor is also a very good way to get involved with DBpedia. As a mentor you will also be able to vote on proposals, after Google accepts our project. Note that it is also ok, if you are a researcher and have a suitable student to submit an idea and become mentor. After acceptance by Google the student then has to apply for the idea and get accepted.

Please take some time this week to add your ideas and apply as a mentor, if applicable. Feel free to improve the introduction as well and comment on the rest of the document.

Information on GSoC in general can be found here:
http://www.google-melange.com/gsoc/homepage/google/gsoc2014

Thank you for your help,
Sebastian and Dimitris

The DBpedia Knowledge Base

Knowledge bases are playing an increasingly important role in enhancing the intelligence of Web and enterprise search and in supporting information integration. Today, most knowledge bases cover only specific domains, are created by relatively small groups of knowledge engineers, and are very cost intensive to keep up-to-date as domains change. At the same time, Wikipedia has grown into one of the central knowledge sources of mankind, maintained by thousands of contributors.


The DBpedia project leverages this gigantic source of knowledge by extracting structured information from Wikipedia and by making this information accessible on the Web under the terms of the Creative Commons Attribution-ShareAlike 3.0 License and the GNU Free Documentation License.



The English version of the DBpedia knowledge base describes 4.58 million things, out of which 4.22 million are classified in a consistent ontology, including 1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases.


In addition, we provide localized versions of DBpedia in 125 languages. All these versions together describe 38.3 million things, out of which 23.8 million are localized descriptions of things that also exist in the English version of DBpedia. The full DBpedia data set features 38 million labels and abstracts in 125 different languages, 25.2 million links to images and 29.8 million links to external web pages; 80.9 million links to Wikipedia categories, and 41.2 million links to YAGO categories. DBpedia is connected with other Linked Datasets by around 50 million RDF links. Altogether the DBpedia 2014 release consists of 3 billion pieces of information (RDF triples) out of which 580 million were extracted from the English edition of Wikipedia, 2.46 billion were extracted from other language editions. Detailed statistics about the DBpedia datasets in 24 popular languages are provided at Dataset Statistics.


The DBpedia knowledge base has several advantages over existing knowledge bases: it covers many domains; it represents real community agreement; it automatically evolves as Wikipedia changes, and it is truly multilingual. The DBpedia knowledge base allows you to ask quite surprising queries against Wikipedia, for instance “Give me all cities in New Jersey with more than 10,000 inhabitants” or “Give me all Italian musicians from the 18th century”. Altogether, the use cases of the DBpedia knowledge base are widespread and range from enterprise knowledge management, over Web search to revolutionizing Wikipedia search.

Nucleus for the Web of Data

Within the W3C Linking Open Data (LOD) community effort, an increasing number of data providers have started to publish and interlink data on the Web according to Tim Berners-Lee’s Linked Data principles. The resulting Web of Data currently consists of several billion RDF triples and covers domains such as geographic information, people, companies, online communities, films, music, books and scientific publications. In addition to publishing and interlinking datasets, there is also ongoing work on Linked Data browsers, Linked Data crawlers, Web of Data search engines and other applications that consume Linked Data from the Web.


The DBpedia knowledge base is served as Linked Data on the Web. As DBpedia defines Linked Data URIs for millions of concepts, various data providers have started to set RDF links from their data sets to DBpedia, making DBpedia one of the central interlinking-hubs of the emerging Web of Data.

Wiki Contents

This Wiki provides information about the DBpedia community project:

  • Datasets gives an overview about the DBpedia knowledge base.
  • Ontology gives an overview about the DBpedia ontology.
  • Online Access describes how the data set can be accessed via a SPARQL endpoint and as Linked Data.
  • Downloads provides the DBpedia data sets for download.
  • Interlinking describes how the DBpedia data set is interlinked with various other datasets on the Web.
  • Use Cases lists different use cases for the DBpedia data set.
  • Extraction Framework describes the DBpedia information extraction framework.
  • Data Provision Architecture paints a picture of the software and protocols used to serve DBpedia on the Web.
  • Community explains how the DBpedia community collaborates and how people can contribute to the DBpedia effort.
  • DBpedia Mapping Wiki containing the mappings used by the DBpedia extraction.
  • DBpedia Internationalization Effort working towards providing multiple language-specific versions of DBpedia.
  • DBpedia-Live presents the new DBpedia-Live framework.
  • DBpedia Spotlight presents the DBpedia Spotlight tool for the semantic annotation of textual content.
  • Credits lists the people and institutions that have contributed to DBpedia so far.
  • Change Log lists the DBpedia releases and gives an overview about the changes for earch release.
  • Next steps describes ideas and future plans for the DBpedia project.

This material is Open Knowledge


       


For a recent overview paper about DBpedia, please refer to: