Share Your Idea

DBpedia citations & reference challenge

asked 2016-06-07 08:40:30 +0100

updated 2017-02-28 08:56:05 +0100

Announcement updates here::

In the latest release (2015-10) DBpedia started exploring the citation and reference data from Wikipedia and we were pleasantly surprised by the rich data we managed to extract.

update: use this new location for the citation data:
as detailed here

This data holds huge potential, especially for the Wikidata challenge of providing a reference source for every statement. It describes not only a lot of bibliographical data, but also a lot of web pages and many other sources around the web.

The data we extract at the moment is quite raw and can be improved in many different ways. Some of the potential improvements are:

  • Extend the citation extractor to handle other Wikipedia language editions; currently only English Wikipedia is supported.
  • Map the data to a relevant Bibliographic ontology (there are many candidates and, although BIBO got most votes we are open to other ontologies as well as wikidata properties)
  • Map the data to existing Bibliographic LOD (eg TEL has 100M records, Worldcat 300M) or online books (eg Google Books). See the citationIri issue.
  • Ways to merge / fuse identical citations from multiple articles
  • Use the citation data in the Wikidata primary sources tool
  • Surprise us with your ideas!

We welcome contributions that improve the existing citation dataset in any way; and we are open to collaboration and helping. Results will be presented at the next DBpedia meeting: 15 September 2016 in Leipzig, co-located with SEMANTiCS 2016. Each participant should submit a short description of his/her contribution by Monday 12 September 2016 and present his/her work at the meeting. Comments, questions can be posted on the DBpedia discussion & developer lists or in our new DBpedia ideas page. Submissions will be judged by the Organizing Committee and the best two will receive a prize.

Organizing Committee

  • Vladimir Alexiev, Ontotext and DBpedia BG
  • Anastasia Dimou, Ghent University, iMinds
  • Dimitris Kontokostas, KILT/AKSW, DBpedia Association
edit retag flag offensive close merge delete

1 comment

Sort by » oldest newest most voted

commented 2016-06-26 13:14:07 +0100

updated 2017-02-28 08:55:24 +0100

Challenge results

The prize for the challenge was received by Dr. Krzysztof Węcel from Poznań University of Economics and Business (Poland). He presented results of analysis of Polish citations in Wikipedia and supplemented them by contextual analysis with regard to several other languages. The team worked on Polish citations extractor for DBpedia framework, own PyCiExtractor in Python, mappings of references to external ontologies (BIBO, FABIO), linking to external datasets like Worldcat. Results were presented as part of the project, an effort to assess the relative quality of Wikipedia articles and thus to choose "best" values of infobox attriutes:

Link to presentation:

Link to the project:

Follow up work by David Nazarian

originally posted here:

Source code announced here:

edit flag offensive delete link more
Login/Signup to Comment

Idea Tools



Asked: 2016-06-07 08:40:30 +0100

Seen: 8,992 times

Last updated: Feb 28 '17