Blog

DBpedia Spotlight V0.7 released

Monday, July 21, 2014 - 9:58am

DBpedia Spotlight is an entity linking tool for connecting free text to DBpedia through the recognition and disambiguation of entities and concepts from the DBpedia KB.

We are happy to announce Version 0.7 of DBpedia Spotlight, which is also the first official release of the probabilistic/statistical implementation.

More information about as well as updated evaluation results for DBpedia Spotlight V0.7 are found in this paper:

Joachim Daiber, Max Jakob, Chris Hokamp, Pablo N. Mendes: Improving Efficiency and Accuracy in Multilingual Entity ExtractionISEM2013. 

The changes to the statistical implementation include:

  • smaller and faster models through quantization of counts, optimization of search and some pruning
  • better handling of case
  • various fixes in Spotlight and PigNLProc
  • models can now be created without requiring a Hadoop and Pig installation
  • UIMA support by @mvnural
  • support for confidence value

See the release notes at [1] and the updated demos at [4].

Models for Spotlight 0.7 can be found here [2].

Additionally, we now provide the raw Wikipedia counts, which we hope will prove useful for research and development of new models [3].

A big thank you to all developers who made contributions to this version (with special thanks to Faveeo and Idio). Huge thanks to Jo for his leadership and continued support to the community.

Cheers,
Pablo Mendes,

on behalf of Joachim Daiber and the DBpedia Spotlight developer community.

[1] – https://github.com/dbpedia-spotlight/dbpedia-spotlight/releases/tag/release-0.7

[2] – http://spotlight.sztaki.hu/downloads/

[3] – http://spotlight.sztaki.hu/downloads/raw

[4] – http://dbpedia-spotlight.github.io/demo/

(This message is an adaptation of Joachim Daiber’s message to the DBpedia Spotlight list. Edited to suit this broader community and give credit to him.)

Call for Ideas and Mentors for GSoC 2014 DBpedia + Spotlight joint proposal (please contribute within the next days)

Wednesday, February 12, 2014 - 8:32am

We started to draft a document for submission at Google Summer of Code 2014:
http://dbpedia.org/gsoc2014

We are still in need of ideas and mentors.  If you have any improvements on DBpedia or DBpedia Spotlight that you would like to have done, please submit it in the ideas section now. Note that accepted GSoC students will receive about 5000 USD for a three months, which can help you to estimate the effort and size of proposed ideas. It is also ok to extend/amend existing ideas (as long as you don’t hi-jack them). Please edit here:
https://docs.google.com/document/d/13YcM-LCs_W3-0u-s24atrbbkCHZbnlLIK3eyFLd7DsI/edit?pli=1

Becoming a mentor is also a very good way to get involved with DBpedia. As a mentor you will also be able to vote on proposals, after Google accepts our project. Note that it is also ok, if you are a researcher and have a suitable student to submit an idea and become mentor. After acceptance by Google the student then has to apply for the idea and get accepted.

Please take some time this week to add your ideas and apply as a mentor, if applicable. Feel free to improve the introduction as well and comment on the rest of the document.

Information on GSoC in general can be found here:
http://www.google-melange.com/gsoc/homepage/google/gsoc2014

Thank you for your help,
Sebastian and Dimitris

Making sense out of the Wikipedia categories (GSoC2013)

Friday, November 29, 2013 - 2:21pm

(Part of our DBpedia+spotlight @ GSoC mini blog series)

Mentor: Marco Fossati @hjfocs <fossati[at]spaziodati.eu>
Student: Kasun Perera <kkasunperera[at]gmail.com>

The latest version of the DBpedia ontology has 529 classes. It is not well balanced and shows a lack of coverage in terms of encyclopedic knowledge representation.

Furthermore, the current typing approach involves a costly manual mapping effort and heavily depends on the presence of infoboxes in Wikipedia articles.

Hence, a large number of DBpedia instances is either un-typed, due to a missing mapping or a missing infobox, or has a too generic or too specialized type, due to the nature of the ontology.

The goal of this project is to identify a set of senseful Wikipedia categories that can be used to extend the coverage of DBpedia instances.

How we used the Wikipedia category system

Wikipedia categories are organized in some kind of really messy hierarchy, which is of little use from an ontological point of view.

We investigated how to process this chaotic world.

Here’s what we have done

We have identified a set of meaningful categories by combining the following approaches:

  1. Algorithmic, programmatically traversing the whole Wikipedia category system.

Wow! This was really the hardest part. Kasun made a great job! Special thanks to the category guru Christian Consonni for shedding light in the darkness of such a weird world.

  1. Linguistic, identifying conceptual categories with NLP techniques.

We got inspired by the YAGO guys.

  1. Multilingual, leveraging interlanguage links.

Kudos to Aleksander Pohl for the idea.

  1. Post-mortem, cleaning out stuff that was still not relevant

No resurrection without Freebase!

Outcomes

We found out a total amount of 3751 candidates that can be used to type the instances.

We produced a dataset in the following format:

<Wikipedia_article_page> rdf:type <article_category>

You can access the full dump here. This has not been validated by humans yet.

If you feel like having a look at it, please tell us what do you think about.

Take a look at the Kasun’s progress page for more details.

Pages