Wednesday, January 27, 2016 - 7:21pm
The submission deadline for mentoring organizations to submit their application for the 2016 Google Summer of Code is approaching quickly. As DBpedia is again planning to be a vital part of the Mentoring Summit, we like to take that opportunity to give you a little recap of the projects mentored by DBpedia members during the past GSoC, in November 2015.
Dimitris Kontokostas, Marco Fossati, Thiago Galery, Joachim Daiber and Reuben Verborgh, members of the Dbpedia community, mentored 8 great students from around the world. Following are some of the projects they completed.
Fact Extraction from Wikipedia Text by Emilio Dorigatti
DBpedia is pretty much mature when dealing with Wikipedia semi-structured content like infoboxes, links and categories. However, unstructured content (typically text) plays the most crucial role, due to the amount of knowledge it can deliver, and few efforts have been carried out to extract structured data out of it. Marco and Emilio built a fact extractor, which understands the semantics of a sentence thanks to Natural Language Processing (NLP) techniques. If you feel playful, you can download the produced datasets. For more details, check out this blog post. P.S.: the project has been cited by Python Weekly and Python Trending! Mentor: Marco Fossati (SpazioDati)
Better context vectors for disambiguation by Philipp Dowling
Better Context Vectors aimed to improve the representation of context used by DBpedia Spotlight by incorporating novel methods from distributional semantics. We investigated the benefits of replacing a word-count based method for one that uses a model based on word2vec. Our student, Philipp Dowling, implemented the model reader based on a preprocessed version of Wikipedia (leading to a few commits to the awesome library gensim) and the integration with the main DBpedia Spotlight pipeline. Additionally, we integrated a method for estimating weights for the different model components that contribute to disambiguating entities. Mentors: Thiago Galery (Analytyca), Joachim Daiber (Amsterdam Univ.), David Przybilla (Idio)
Wikipedia Stats Extractor by Naveen Madhire
Wikipedia Stats Extractor aimed to create a reusable tool to extract raw statistics for Name Entity Linking out of a Wikipedia dump. Naveen built the project on top of Apache Spark and Json-wikipedia which makes the code more maintainable and faster than its previous alternative (pignlproc). Furthermore Wikipedia Stats Extractor provides an interface which makes easier the task of processing Wikipedia dumps for purposes other than Entity Linking. Extra changes were made in the way surface forms stats are extracted and lots of noise was removed, both of which should in principle help Entity Linking.
Special regards to Diego Ceccarelli who gave us great insight on how Json-wikipedia worked. Mentors: Thiago Galery (Analytyca), Joachim Daiber (Amsterdam Univ.), David Przybilla (Idio)
DBpedia Live extensions by Andre Pereira
DBpedia Live provides near real-time knowledge extraction from Wikipedia. As wikipedia scales we needed to move our caching infrastructure from MySQL to MongoDB. This was the first task of Andre’s project. The second task was the implementation of a UI displaying the current status of DBpedia Live along with some admin utils. Mentors: Dimitris Kontokostas (AKSW/KILT), Magnus Knuth (HPI)
Adding live-ness to the Triple Pattern Fragments server by Pablo Estrada
DBpedia currently has a highly available Triple Pattern Fragments interface that offloads part of the query processing from the server into the clients. For this GSoC, Pablo developed a new feature for this server so it automatically keeps itself up to date with new data coming from DBpedia Live. We do this by periodically checking for updates, and adding them to an auxiliary database. Pablo developed smart update, and smart querying algorithms to manage and serve the live data efficiently. We are excited to let the project out in the wild, and see how it performs in real-life use cases. Mentors: Ruben Verborgh (Ghent Univ. – iMinds) and Dimitris Kontokostas (AKSW/KILT)
Registration for mentors @ GSoC 2016 is starting next month and DBpedia would of course try to participate again. If you want to become a mentor or just have a cool idea that seems suitable, don’t hesitate to ping us via the DBpedia discussion or developer mailing lists.
Your DBpedia Association
Friday, January 15, 2016 - 4:03pm
A belated Happy New Year to all DBpedia enthusiasts !!!
Two weeks of 2016 have already passed and it is about time to reflect on the past three months which were revolving around the 5th DBpedia meeting in the USA.
After 4 successful meetings in Poznan, Dublin, Leipzig and Amsterdam, we thought it is about time to cross the Atlantic and meet the US-part of the DBpedia community. On November 5th 2015, our 5th DBpedia Community meeting was held at the world famous Stanford University, in Palo Alto California.
First and foremost, we would like to thank Michel Dumontier, Associate Professor of Medicine at Stanford University, and his Laboratory for Biomedical Knowledge Discovery for hosting this great event and giving so many US-based DBpedia enthusiasts a platform for exchange and to meet in person. The event was constantly commented on and discussed not just inside University premises but also online, via Twitter #DBpedia CA. We would also like to thank the rest of the organizers: Pablo Mendes, Marco Fossati, Dimitris Kontokostas and Sebastian Hellmann for devoting a lot of time to plan the meeting and coordinate with the presenters.
We set out to the US with two main goals. Firstly, we wanted DBpedia and Knowledge Graph professionals and enthusiasts to network and discuss ideas about how to improve DBpedia. Secondly, the event also aimed at finding new partners, developers and supporters to help DBpedia grow professionally, in terms of competencies and data, as well as to enlarge the DBpedia community itself to spread the word and to raise awareness of the DBpedia brand.
Therefore, we invited representatives of the best-known actors in the Data community such as:
…who addressed interesting topics and together with all the DBpedia enthusiasts engaged in productive discussion and raised controversial questions.
The meeting itself was co-located with an pre-event designed as workshop, giving the attending companies a lot of room and time to raise questions and discuss “hot topics”. Classification schemas and multilingualism have been on top of the list of topics that were most interesting for the companies invited. In this interactive setting, our guest from Evernote, BlippAR, World University and Wikimedia answered questions about the DBpedia ontology and mappings, Wikipedia categories as well as about similarities and differences with Wikidata.
Following the pre-event, the main event attracted attendees with lightning talks from major companies interesting to the DBpedia community.
The host of the DBpedia Meeting, Michel Dumontier from Stanford opened the main event with a short introduction of his group’s focus in biomedical data. He and his group currently focus on integrating datasets to extract maximal value from data. Right in the beginning of the DBpedia meeting, Dumontier highlighted the value of already existing yet unexploited data out there.
During the meeting there have been two main thematic foci, one concerning the topics companies were interested in and raised during the session. Experts from Yahoo, Netflix, Diffbot, IBM Watson and Unicode addressed issue such as fact extraction from text via NLP, knowledge base construction techniques and recommender systems leveraging data from a knowledge base and multilingual abbreviation datasets.
The second focus of this event revolved around DBpedia and encyclopedic Knowledge Graphs including augmented reality addressed by BlippAR and by Nuance. We have some of the talks summed up for you here. Also check out the slides provided in addition to the summary of some talks to get a deeper insight into the event.
Nicolas Torzec, Yahoo! – Wikipedia, DBpedia and the Yahoo! Knowledge Graph
He described how DBpedia played a key role in the beginning of the Knowledge Graph effort at Yahoo! They decided on using the Extraction Framework directly, not the provided data dumps, which allowed them to continuously update as Wikipedia changed. Yashar, also from Yahoo! focused on multilingual NE detection and linking. He described how users make financial choices based on availability of products in their local language, which highlights the importance of multilinguality (also a core objective of the DBpedia effort).
Anshu Jain, IBM Watson – Watson Knowledge Graph – DBpedia Meetup
The focus of this presentation was the effort by IBM Watson team their effort as not building a knowledge graph, but building a platform for working with knowledge graphs. For them, graph is just an abstraction, not a data structure. He highlighted that context is very important, and one
Yves Raimond, Netflix – Knowledge Graphs @ NetflixYves Raimond from Netflix observed that in their platform, every impression is a recommendation. They rely on lots of machine learning algorithms, and pondered on the role of knowledge graphs in that setting. Will everything (user + metadata) end up in a graph so that algorithms learn from that?Click here for the complete presentation.
Joakim Soderberg, BlippAR –
Joakim Soderberg mentioned that at Blippar it’s all about the experience. They are focusing on augmented reality, which can benefit from information drawn from many sources including DBpedia.
David Martin, Nuance – using DBpedia with Nuance
David Martin from Nuance talked about how DBpedia is used as a source of named entities. He observes that multi role ranking is an important issue, for instance, the difference in the role of Arnold Schwarzenegger as politician or actor. Click here for the complete presentation.
Karthik Gomadam, Accenture Technology Labs – Rethinking the Enterprise Data Stack
Karthik Gomadam discussed data harmonization in the context of linked enterprise data.
Alkis Simitsis, Hewlett Packard – Complex Graph Computations over Enterprise Data
He talked about complex graph computations over enterprise data, while Georgia Koutrika from HP Labs presented their solution for fusing knowledge into recommendations.
Other topics discussed were:
You find some more presentations here:
Feedback from attendees and via our Twitter stream #DBpediaCA was generally very positive and insightful. The choice of invited talks was appreciated unanimously, and so was the idea of having lightning talks. In the spirit of previous DBpedia Meetings, we allocated time for all attendees that were interested in speaking. Some commented that they would have liked to have more time to ask questions and discuss, while others thought the meeting was too late. We will consider the trade-offs and try to improve in the next iteration. There was strong support from attendees for meeting again as soon as possible!
So now, we are looking forward to the next DBpedia community meeting which will be held on February 12, 2016 in the Hague, Netherlands. So, save the date and visit the event page. We will keep you informed via the DBpedia Website and Blog.
Finally, we would like to thank Yahoo! for sponsoring the catering during the DBpedia community meeting. We would also like to acknowledge Google Summer of Code as the reason Marco and Dimitris were in California and for covering part of their travel expenses.
The event was initiated by the DBpedia association. The following people received travel grants by the DBpedia association: Marco Fossati; Dimitris Kontokostas; Joachim Daiber
Friday, September 4, 2015 - 3:30pm
we are happy to announce the release of DBpedia 2015-04 (also known as: 2015 A). The new release is based on updated Wikipedia dumps dating from February/March 2015 and features an enlarged DBpedia ontology with more infobox to ontology mappings, leading to richer and cleaner data.
The English version of the DBpedia knowledge base currently describes 5.9M things out of which 4.3M resources have abstracts, 452K geo coordinates and 1.45M depictions. In total, 4 million resources are classified in a consistent ontology and consists of 2,06M persons, 682K places (including 455K populated places), 376K creative works (including 92K music albums, 90K films and 17K video games), 188K organizations (including 51K companies and 33K educational institutions), 278K species and 5K diseases. The total number of resources in English DBpedia is 15.3M that, besides the 5.9M resources, includes 1.2M skos concepts (categories), 6.83M redirect pages, 256K disambiguation pages and 1.13M intermediate nodes.
We provide localized versions of DBpedia in 128 languages. All these versions together describe 38.3 million things, out of which 23.8 million are localized descriptions of things that also exist in the English version of DBpedia. The full DBpedia data set features 38 million labels and abstracts in 128 different languages, 25.2 million links to images and 29.8 million links to external web pages; 80.9 million links to Wikipedia categories, and 41.2 million links to YAGO categories. DBpedia is connected with other Linked Datasets by around 50 million RDF links.
In addition we provide DBpedia datasets for Wikimedia Commons and Wikidata.
Altogether the DBpedia 2015-04 release consists of 6.9 billion pieces of information (RDF triples) out of which 737 million were extracted from the English edition of Wikipedia, 3.76 billion were extracted from other language editions and 2.4 billion from DBpedia Commons and Wikidata.
From this release on we will try to provide two releases per year, one in April and the next in October. The 2015-04 release was delayed by 3 months but we will try to keep the schedule and release the 2015-10 at the end of October or early November.
On our plans for the next release is to remove the URI encoding of English DBpedia (dbpedia.org) and switch to IRIs only. This will simplify the release process and will be aligned with all other DBpedia language datasets. We know that this will probably break some links to DBpedia but we feel is the only way to move forward. If you have any reasons against this action, please let us know now.
A complete list of changes in this release can be found on GitHub.
From this release we adjusted the download page folder structure, giving us more flexibility to offer more datasets in the near future
The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2015 ontology encompasses
The editors community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. There are six new languages with mappings: Arabic, Bulgarian, Armenian, Romanian, Swedish and Ukrainian.
For the DBpedia 2015 extraction, we used a total of 4317 template mappings (DBpedia 2014: 3814 mappings).
Until the DBpedia 3.8 release, a concept was only assigned a type (like person or place) if the corresponding Wikipedia article contains an infobox indicating this type. Starting from the 3.9 release, we provide type statements for articles without infobox that are inferred based on the link structure within the DBpedia knowledge base using the algorithm described in Paulheim/Bizer 2014. For the new release, an improved version of the algorithm was run to produce type information for 400,000 things that were formerly not typed. A similar algorithm (presented in the same paper) was used to identify and remove potentially wrong statements from the knowledge base.
Both of these datasets use a typing system beyond the DBpedia ontology and we provide a subset, mapped to the DBpedia ontology (dbo) and a full one with all types (ext).
We updated the following RDF link sets pointing at other Linked Data sources: Freebase, Wikidata, Geonames and GADM.
You can download the new DBpedia datasets in RDF format from http://wiki.dbpedia.org/Downloads or
From the following releases we will provide additional datasets related to DBpedia. For 2015-04 we provide a pagerank dataset for English and German, provided by HPI.
As usual, the new dataset is also published in 5-Star Linked Open Data form and accessible via the SPARQL Query Service endpoint at http://dbpedia.org/sparql and Triple Pattern Fragments service at http://fragments.dbpedia.org/.
Lots of thanks to
The work on the DBpedia 2015-04 release was financially supported by the European Commission through the project ALIGNED – quality-centric, software and data engineering (http://aligned-project.eu/).
Have fun with the new DBpedia 2015-04 release!
Markus Freudenberg, Dimitris Kontokostas, Sebastian Hellmann
Tuesday, September 9, 2014 - 10:58am
we are happy to announce the release of DBpedia 2014.
The most important improvements of the new release compared to DBpedia 3.9 are:
1. the new release is based on updated Wikipedia dumps dating from April / May 2014 (the 3.9 release was based on dumps from March / April 2013), leading to an overall increase of the number of things described in the English edition from 4.26 to 4.58 million things.
2. the DBpedia ontology is enlarged and the number of infobox to ontology mappings has risen, leading to richer and cleaner data.
The English version of the DBpedia knowledge base currently describes 4.58 million things, out of which 4.22 million are classified in a consistent ontology (http://wiki.dbpedia.org/Ontology2014), including 1,445,000 persons, 735,000 places (including 478,000 populated places), 411,000 creative works (including 123,000 music albums, 87,000 films and 19,000 video games), 241,000 organizations (including 58,000 companies and 49,000 educational institutions), 251,000 species and 6,000 diseases.
We provide localized versions of DBpedia in 125 languages. All these versions together describe 38.3 million things, out of which 23.8 million are localized descriptions of things that also exist in the English version of DBpedia. The full DBpedia data set features 38 million labels and abstracts in 125 different languages, 25.2 million links to images and 29.8 million links to external web pages; 80.9 million links to Wikipedia categories, and 41.2 million links to YAGO categories. DBpedia is connected with other Linked Datasets by around 50 million RDF links.
Altogether the DBpedia 2014 release consists of 3 billion pieces of information (RDF triples) out of which 580 million were extracted from the English edition of Wikipedia, 2.46 billion were extracted from other language editions.
Detailed statistics about the DBpedia data sets in 28 popular languages are provided at Dataset Statistics page (http://wiki.dbpedia.org/Datasets2014/DatasetStatistics).
The main changes between DBpedia 3.9 and 2014 are described below. For additional, more detailed information please refer to the DBpedia Change Log (http://wiki.dbpedia.org/Changelog).
1. Enlarged Ontology
The DBpedia community added new classes and properties to the DBpedia ontology via the mappings wiki. The DBpedia 2014 ontology encompasses
2. Additional Infobox to Ontology Mappings
The editors community of the mappings wiki also defined many new mappings from Wikipedia templates to DBpedia classes. For the DBpedia 2014 extraction, we used 4,339 mappings (DBpedia 3.9: 3,177 mappings), which are distributed as follows over the languages covered in the release.
3. Extended Type System to cover Articles without Infobox
Until the DBpedia 3.8 release, a concept was only assigned a type (like person or place) if the corresponding Wikipedia article contains an infobox indicating this type. Starting from the 3.9 release, we provide type statements for articles without infobox that are inferred based on the link structure within the DBpedia knowledge base using the algorithm described in Paulheim/Bizer 2014 (http://www.heikopaulheim.com/documents/ijswis_2014.pdf). For the new release, an improved version of the algorithm was run to produce type information for 400,000 things that were formerly not typed. A similar algorithm (presented in the same paper) was used to identify and remove potentially wrong statements from the knowledge base.
4. New and updated RDF Links into External Data Sources
We updated the following RDF link sets pointing at other Linked Data sources: Freebase, Wikidata, Geonames and GADM. For an overview about all data sets that are interlinked from DBpedia please refer to http://wiki.dbpedia.org/Interlinking.
Accessing the DBpedia 2014 Release
You can download the new DBpedia datasets in RDF format from http://wiki.dbpedia.org/Downloads.
In addition, we provide some of the core DBpedia data also in tabular form (CSV and JSON formats) at http://wiki.dbpedia.org/DBpediaAsTables.
As usual, the new dataset is also available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql.
Lots of thanks to
The work on the DBpedia 2014 release was financially supported by the European Commission through the project LOD2 – Creating Knowledge out of Linked Data (http://lod2.eu/).
Have fun with the new DBpedia 2014 release!
Daniel Fleischhacker, Volha Bryl, and Christian Bizer
Monday, July 21, 2014 - 9:58am
DBpedia Spotlight is an entity linking tool for connecting free text to DBpedia through the recognition and disambiguation of entities and concepts from the DBpedia KB.
We are happy to announce Version 0.7 of DBpedia Spotlight, which is also the first official release of the probabilistic/statistical implementation.
More information about as well as updated evaluation results for DBpedia Spotlight V0.7 are found in this paper:
Joachim Daiber, Max Jakob, Chris Hokamp, Pablo N. Mendes: Improving Efficiency and Accuracy in Multilingual Entity Extraction. ISEM2013.
The changes to the statistical implementation include:
See the release notes at  and the updated demos at .
Models for Spotlight 0.7 can be found here .
Additionally, we now provide the raw Wikipedia counts, which we hope will prove useful for research and development of new models .
A big thank you to all developers who made contributions to this version (with special thanks to Faveeo and Idio). Huge thanks to Jo for his leadership and continued support to the community.
on behalf of Joachim Daiber and the DBpedia Spotlight developer community.
(This message is an adaptation of Joachim Daiber’s message to the DBpedia Spotlight list. Edited to suit this broader community and give credit to him.)
Wednesday, February 12, 2014 - 8:32am
We started to draft a document for submission at Google Summer of Code 2014:
We are still in need of ideas and mentors. If you have any improvements on DBpedia or DBpedia Spotlight that you would like to have done, please submit it in the ideas section now. Note that accepted GSoC students will receive about 5000 USD for a three months, which can help you to estimate the effort and size of proposed ideas. It is also ok to extend/amend existing ideas (as long as you don’t hi-jack them). Please edit here:
Becoming a mentor is also a very good way to get involved with DBpedia. As a mentor you will also be able to vote on proposals, after Google accepts our project. Note that it is also ok, if you are a researcher and have a suitable student to submit an idea and become mentor. After acceptance by Google the student then has to apply for the idea and get accepted.
Please take some time this week to add your ideas and apply as a mentor, if applicable. Feel free to improve the introduction as well and comment on the rest of the document.
Information on GSoC in general can be found here:
Thank you for your help,
Sebastian and Dimitris
Friday, November 29, 2013 - 2:21pm
(Part of our DBpedia+spotlight @ GSoC mini blog series)
Mentor: Marco Fossati @hjfocs <fossati[at]spaziodati.eu>
Student: Kasun Perera <kkasunperera[at]gmail.com>
The latest version of the DBpedia ontology has 529 classes. It is not well balanced and shows a lack of coverage in terms of encyclopedic knowledge representation.
Furthermore, the current typing approach involves a costly manual mapping effort and heavily depends on the presence of infoboxes in Wikipedia articles.
Hence, a large number of DBpedia instances is either un-typed, due to a missing mapping or a missing infobox, or has a too generic or too specialized type, due to the nature of the ontology.
The goal of this project is to identify a set of senseful Wikipedia categories that can be used to extend the coverage of DBpedia instances.
Wikipedia categories are organized in some kind of really messy hierarchy, which is of little use from an ontological point of view.
We investigated how to process this chaotic world.
We have identified a set of meaningful categories by combining the following approaches:
Algorithmic, programmatically traversing the whole Wikipedia category system.
Linguistic, identifying conceptual categories with NLP techniques.
We got inspired by the YAGO guys.
Multilingual, leveraging interlanguage links.
Kudos to Aleksander Pohl for the idea.
Post-mortem, cleaning out stuff that was still not relevant
No resurrection without Freebase!
We found out a total amount of 3751 candidates that can be used to type the instances.
We produced a dataset in the following format:
<Wikipedia_article_page> rdf:type <article_category>
You can access the full dump here. This has not been validated by humans yet.
If you feel like having a look at it, please tell us what do you think about.
Take a look at the Kasun’s progress page for more details.