GSoC 2014

DBpedia & DBpedia Spotlight @ GSoC 2014



DBpedia is accepted again at Google Summer of Code 2014 and would like to invite everybody who is interest to join our joint DBpedia and DBpedia Spotlight application.

For those who don't know what GSoC is, please scroll at the end of the document

1 Our GSoC14 ideas

Our ideas list, along with other information can be found here:

2 For Students

Before you apply please read this:

You should also check out application template:

3. Our awesome mentors

Our mentor list consists of reliable members of the DBpedia and DBpedia Spotlight community. Most of them have been active for more than one year now and have submitted various commits to our code base or the mappings wiki. They all hold a stake in the DBpedia project, as their daily work or academic projects (e.g. PhD thesis) relies in some form on the output produced by DBpedia and DBpedia Spotlight.

(name, google account, mail if different, homepage, short bio):

  • Andrea Di Menna (ninniuz, Andrea lives in Rome, Italy. He has been working on mobile devices and social applications, now focusing on BigData analytics.
  • Christina Unger (f060047, is a researcher at the Semantic Computing group in Bielefeld. She has a background in formal linguistics and now works on NLP over linked data, including question answering, ontology-based interpretation of natural language, and automatic grammar generation.
  • Dimitris Kontokostas (jimkont, Dimitris lives in Veria, Greece. He is a researcher at AKSW Group of Leipzig University and co-maintains the DBpedia project.
  • Heiko Paulheim (heiko.paulheim, is a PostDoc at Mannheim University and works on the crossroads of data mining and LOD. Among others, he has worked on automatic data completion and correction algorithms for DBpedia.
  • Jim O’Regan (jimregan, lives in Ireland. He has been involved in GSoC for a number of years with Apertium, and is involved in the internationalization of DBpedia.
  • Jungyeu Park (jungyeul.park, Jungyeul lives in Lannion, France. He is currently a visiting lecturer in the Department of Computer Science at the IUT Lannion (Université de Rennes 1). His research interests include statistical syntactic and semantic parsing and machine translation.
  • Kyungtae Lim (jujbob, Kyungtae lives in Daejeon, Korea. He is a phd student in Korea Advanced Institute of Science and Technology, now he focused on relation extraction and question answering system over linked data.
  • Magnus Knuth (mgns, magnus.knuth@hpi.uni-potsdam.de[..]ts/magnus_knuth.html) Magnus is researcher and PhD student at HPI in Potsdam, living in Berlin. His work focusses on Linked Data cleansing and change management.
  • Marco Fossati (marfos, fossati@spaziodati.eu lives in Trento, Italy. He is the Italian DBpedia chapter representative and was a GSoC mentor last year. He works as a data scientist at SpazioDati, in collaboration with the Web of Data research unit at Fondazione Bruno Kessler. Currently focused on structured data quality, crowdsourcing for lexical semantics, Linked data-enabled recommender systems.
  • Martin Brümmer (der.bruemmer, bruemmer@informatik.uni-leipzig.de Martin works in Leipzig, with the AKSW Research group at University of Leipzig. He is focused on the creation of LOD language resources and interested in everything that is OpenData and NLP.
  • Mariano Rico (mariano.rico, mariano.rico@upm.es lives and woks in Madrid (Spain). He is the person responsible for the Spanish chapter of DBpedia. His research concerning DBpedia is focused on finding the most common SPARQL queries made by human users in order to provide them an intuitive user interface.
  • Sebastian Hellmann (kurzum, hellmann@informatik.uni-leipzig.de lives in Leipzig and develops the NLP Interchange Format. He was a GSoC mentor for two years now for DBpedia and on behalf of Apertium and has developed the initial version of the DBpedia-Live extraction. Furthermore he (co-)chaired the Open Knowledge Conference 2011, the Linked Data Cup 2012 and is a member of over a dozen open source projects on Google code, Sourceforge and Github.
  • Uroš Milošević (uros.milosevic, uros.milosevic@pupin.rs is a researcher at the Mihajlo Pupin Institute (Belgrade, Serbia), and a PhD student at the University of Leipzig, currently working on LOD2 and GeoKnow.
  • Michel Dumontier (micheldumontier, michel.dumontier@stanford.edu is an Associate Professor of Medicine (Biomedical Informatics) at Stanford University. His research aims to understand disease and drug mechanism of action and to use this to develop new and improved drug therapies. His work focuses on large scale data integration, semantic data mining, formal knowledge representation and automated reasoning.
  • Alexandru Todor (bakaranma, todor@inf.fu-berlin.dehttp://www.corporate-semantic-[..]alexandru-todor.html) lives in Berlin, Germany and works as a researcher at the Free University of Berlin. He is the maintainer of the German Chapter of DBpedia and his current PhD work focuses on entity recognition.
  • Michele Mostarda (michele.mostarda, mostarda@fbk.eu is a software engineer working on Semantic Web since 2006. He is currently employed as a technologist at Fondazione Bruno Kessler in Trento, Italy. He has been involved in Big Data, Linked Open Data and Machine Learning projects like SindiceAny23JSONpedia and MachineLinking.
  • Petar Petrovski(petar@informatik.uni-mannheim.dehttp://dws.informatik.uni-mann[..]ers/petar-petrovski/) is a researcher and a PhD student at the University of Mannheim, currently working on the LOD2 and the WebDataCommons projects (

4 Document structure



5 What is GSoC

Google funds open source projects by paying students to work for 3 months on a specific task. The gain here is twofold: Students gain experience and OS projects get some work done and (possibly) new community members.

The workflow is as follows:

  • OS projects apply on google by providing a list of possible projects that students can fulfill in the timeframe of 3 montsh (the application ends on 14/02
  • Once a project is accepted there is an application period where students apply for specific ideas on a project.
  • Google grands a number of student slots to each project and mentors vote for the student selection.
  • Once the selection is over, there is a bonding period (1 month) where selected students take some warm up tasks to get familiar with the technologies
  • Each student is assigned with an official mentor and additional co-mentors. The only difference is that the official one is responsible to fill 2 evaluation forms, one in the middle of the programming period and one at the end

5.1 (co-)mentors

By becoming a DBpedia GSoC (co-)mentor you get:

  • a free google t-shirt :)
  • an opportunity to flight to SF at the Google headquarters for the Google mentor summit
  • You help DBpedia

OK, there are some responsibilities but not too many. We try to assign multiple mentors for each student to divide the workload. It will take some time during the application period (2 weeks) to help students write good applications in the ideas you have expertise.

In the end not all candidate mentors will be assigned a student. It depends on the number of students Google gives us and the ideas students applied for. To those who finally become a (co-)mentor, their responsibility will be to guide the student and make sure they are on schedule.

There are plenty of links and FAQs in the GSoC homepage:[..]page/google/gsoc2014