Almost every major Web company has now announced their work on a Knowledge Graph, including Google’s Knowledge Graph, Yahoo!’s Web of Objects, Walmart Lab’s Social Genome, Microsoft's Satori Graph / Bing Snapshots and Facebook’s Entity Graph.
DBpedia is a community-run project that has been working on a free, open-source knowledge graph since 2006!
DBpedia currently describes 38.3 million “things” of 685 different “types” in 125 languages, with over 3 billion “facts” (September 2014). It is interlinked to many other databases (e.g., Freebase, Wikidata, New York Times, CIA Factbook). The knowledge in DBpedia is exposed through a set of technologies called Linked Data. Linked Data has been revolutionizing the way applications interact with the Web. While the Web2.0 technologies opened up much of the “guts” of websites for third-parties to reuse and repurpose data on the Web, they still require that developers create one client per target API. With Linked Data technologies, all APIs are interconnected via standard Web protocols and languages.
One can navigate this Web of facts with standard Web browsers, automated crawlers or pose complex queries with SQL-like query languages (e.g., SPARQL). Have you thought of asking the Web about all cities with low criminality, warm weather and open jobs? That's the kind of query we are talking about.
This new Web of interlinked databases provides useful knowledge that can complement the textual Web in many ways. See, for example, how bloggers tag their posts or assign them to categories in order to organize and interconnect their blog posts. This is a very simple way to connect unstructured text to a structure (hierarchy of tags). For more advanced examples, see how BBC has created the World Cup 2010 website by interconnecting textual content and facts from their knowledge base. Identifiers and data provided by DBpedia were greatly involved in creating this knowledge graph. Or, more recently, did you see that IBM's Watson used DBpedia data to win the Jeopardy challenge?
DBpedia Spotlight is an open source (Apache license) text annotation tool that connects text to Linked Data by marking names of things in text (we call that Spotting) and selecting between multiple interpretations of these names (we call that Disambiguation). For example, “Washington” can be interpreted in more than 50 ways including a state, a government or a person. You can already imagine that this is not a trivial task, especially when we're talking about millions of things and hundreds of types.
We are regularly growing our community through GSoC and can deliver more and more opportunities to you.
We got excited with our new ideas, we hope you will get excited too!
What is GSoC
TLDR: Google funds open source projects by paying students to work for three months on a specific task. The gain here is twofold: Students gain experience and OS projects get some work done and (possibly) new community members.
The workflow is as follows:
- Open Source projects apply at Google by providing a list of possible projects that students can fulfill in the timeframe of 3 months (the application ends on February 20).
- Once a project is accepted there is an application period where students apply for specific ideas on a project (February 20 – March 16).
- Google grands a number of student slots to each project and mentors vote for the student selection.
- Once the selection is over, there is a bonding period (1 month) where selected students take some warm up tasks to get familiar with the technologies.
- Each student is assigned with an official mentor and additional co-mentors. The only difference is that the official one is responsible to fill 2 evaluation forms, one in the middle of the programming period and one at the end.
Become a DBpedia (co-)mentor
By becoming a DBpedia GSoC (co-)mentor you get:
- a free Google T-Shirt :)
- an opportunity to flight to SF at the Google headquarters for the Google mentor summit
- You help DBpedia
OK, there are some responsibilities but not too many. We try to assign multiple mentors for each student to divide the workload. It will take some time during the application period (2 weeks) to help students write good applications in the ideas you have expertise.
In the end not all candidate mentors will be assigned a student. It depends on the number of students Google gives us and the ideas students applied for. To those who finally become a (co-)mentor, their responsibility will be to guide the student and make sure they are on schedule.
There are plenty of links and FAQs in the GSoC homepage: http://www.google-melange.com/[..]page/google/gsoc2015