Call for Ideas and Mentors for GSoC 2014 DBpedia + Spotlight joint proposal (please contribute within the next days)
Wednesday, February 12, 2014 - 8:32am
We started to draft a document for submission at Google Summer of Code 2014:
We are still in need of ideas and mentors. If you have any improvements on DBpedia or DBpedia Spotlight that you would like to have done, please submit it in the ideas section now. Note that accepted GSoC students will receive about 5000 USD for a three months, which can help you to estimate the effort and size of proposed ideas. It is also ok to extend/amend existing ideas (as long as you don’t hi-jack them). Please edit here:
Becoming a mentor is also a very good way to get involved with DBpedia. As a mentor you will also be able to vote on proposals, after Google accepts our project. Note that it is also ok, if you are a researcher and have a suitable student to submit an idea and become mentor. After acceptance by Google the student then has to apply for the idea and get accepted.
Please take some time this week to add your ideas and apply as a mentor, if applicable. Feel free to improve the introduction as well and comment on the rest of the document.
Information on GSoC in general can be found here:
Thank you for your help,
Sebastian and Dimitris
Friday, November 29, 2013 - 2:21pm
(Part of our DBpedia+spotlight @ GSoC mini blog series)
Mentor: Marco Fossati @hjfocs <fossati[at]spaziodati.eu>
Student: Kasun Perera <kkasunperera[at]gmail.com>
The latest version of the DBpedia ontology has 529 classes. It is not well balanced and shows a lack of coverage in terms of encyclopedic knowledge representation.
Furthermore, the current typing approach involves a costly manual mapping effort and heavily depends on the presence of infoboxes in Wikipedia articles.
Hence, a large number of DBpedia instances is either un-typed, due to a missing mapping or a missing infobox, or has a too generic or too specialized type, due to the nature of the ontology.
The goal of this project is to identify a set of senseful Wikipedia categories that can be used to extend the coverage of DBpedia instances.
How we used the Wikipedia category system
Wikipedia categories are organized in some kind of really messy hierarchy, which is of little use from an ontological point of view.
We investigated how to process this chaotic world.
Here’s what we have done
We have identified a set of meaningful categories by combining the following approaches:
Algorithmic, programmatically traversing the whole Wikipedia category system.
Linguistic, identifying conceptual categories with NLP techniques.
We got inspired by the YAGO guys.
Multilingual, leveraging interlanguage links.
Kudos to Aleksander Pohl for the idea.
Post-mortem, cleaning out stuff that was still not relevant
No resurrection without Freebase!
We found out a total amount of 3751 candidates that can be used to type the instances.
We produced a dataset in the following format:
<Wikipedia_article_page> rdf:type <article_category>
You can access the full dump here. This has not been validated by humans yet.
If you feel like having a look at it, please tell us what do you think about.
Take a look at the Kasun’s progress page for more details.