DBpedia Spotlight – User's Manual


DBpedia Spotlight is a tool for annotating mentions of DBpedia concepts in plain text.


We offer three basic functions: Annotate, Disambiguate and Candidates (Best K). They can be accessed from a Scala / Java API, REST Web Service and from a user interface on the Web (HTML/Javascript). For the Scala / Java API, there are a number of configuration parameters that can be used to instruct the annotation and disambiguation functions. The classes Default Annotator, Default Disambiguator and Default Paragraph Disambiguator offer the configuration that we found to provide the best results. The configuration interface offers ways to control the quality of the output of the two above tasks.

Architecture


The DBpedia Spotlight Architecture is composed by the following modules:

  • Web application, a demonstration client (HTML/Javascript interface) that allows users to enter/paste text into a Web browser and visualize the resulting annotated text.
  • Web Service, a RESTful/SOAP? Web API that exposes the functionality of annotating and/or disambiguating entities in text.
  • Annotation Java / Scala API, exposing the underlying logic that performs the annotation/disambiguation.
  • Indexing Java / Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.
  • Evaluation module, where we test disambiguators, log results and use those to train our system to perform better.

External dependencies:

  • DBpedia Extraction Framework, (only for the index module) extracting the necessary data from the Wikipedia dumps.
  • Lucene 2.9.3, providing the low level indexing framework used by DBpedia Spotlight.
  • Ling Pipe 4.0.0, providing the string matching implementation used for the Spotter module.

System Requirements

  • Java 1.6+
  • Scala 2.8+

  • Spotlight JAR 
  • Spotlight Library JARs

  • Lucene disambiguation index
  • Spotter dictionary
  • large RAM to set the heap size big enough for the Spotter (approx. 8G)

  • Maven 2 for the automagic installation of dependencies.

  • Indexing Java / Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.

Programmatic usage


If you want to use DBpedia Spotlight in your Java / Scala code, take a look at core/SpotlightFactory to see how you can create your objects, and then look at rest/Candidates.java to see how you can wire them together.

Online Usage

Web Application

The Web Application is located at http://spotlight.dbpedia.org/demo/index.xhtml .

Web Service

The Web Service is located at http://spotlight.dbpedia.org/rest/annotate and http://spotlight.dbpedia.org/rest/disambiguate.
It is also possible to query it with the filtering parameters specified above. Examples calls are provided below.

Content Negotiation


You can request different types of output by setting the Accept request header.
For example, in order to request JSON output, you can add "Accept:application/json" to the request headers.


One example using cURL:



The content types we currently support are:

  • «text/html»
  • «application/xhtml+xml»
  • «text/xml»
  • «application/json»

The application/xhtml+xml comes with embedded RDFa that you can give to the RDFa Distiller and get RDF triples in Turtle, RDF+XML, etc. as output.


If your input text is long, you may prefer using POST instead of GET.


curl -i -X POST \
-H "Accept:application/json" \
-H «content-type:application/x-www-form-urlencoded" \
-d «disambiguator=Document&confidence=-1&support=-1&text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package» \
http://spotlight.dbpedia.org/dev/rest/annotate/

Please not that you *must* use content-type application/x-www-form-urlencoded for POST requests.

Example 1: without type restriction


http://spotlight.dbpedia.org/rest/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20


returns the XML


<Annotation text="President Obama called Wednesday on Congress to extend a tax break
for students included in last year's economic stimulus package, arguing that the policy
provides more generous assistance."
confidence="0.2" support="20">
<Resources>
<Resource URI="http://dbpedia.org/resource/Barack_Obama"
support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0"
similarityScore="0.31504717469215393" percentageOfSecondRank="-1.0"/>

<Resource URI="http://dbpedia.org/resource/United_States_Congress"

support="8569" types="Organisation,Legislature" surfaceForm="Congress" offset="36"
similarityScore="0.2348192036151886" percentageOfSecondRank="0.8635579006818564"/>

<Resource URI="http://dbpedia.org/resource/Tax_break"

support="32" types= surfaceForm="tax break" offset="57"
similarityScore="0.35041093826293945" percentageOfSecondRank="-1.0"/>
<Resource URI="http://dbpedia.org/resource/Student"
support="1701" types= surfaceForm="students" offset="71"
similarityScore="0.32534149289131165" percentageOfSecondRank="-1.0"/>

<Resource URI="http://dbpedia.org/resource/Policy"

support="557" types="" surfaceForm="policy" offset="148"
similarityScore="0.3228176236152649" percentageOfSecondRank="-1.0"/>
</Resources>
</Annotation>

Example 2: with type restriction


http://spotlight.dbpedia.org/rest/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20&types=Person,Organisation


returns the XML


<Annotation text="President Obama called Wednesday on Congress to extend a tax break
for students included in last year's economic stimulus package, arguing that the policy
provides more generous assistance."
confidence="0.2" support="20" types="Person,Organisation">
<Resources>
<Resource URI="http://dbpedia.org/resource/Barack_Obama"
support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0"
similarityScore="0.31504717469215393" percentageOfSecondRank="-1.0"/>

<Resource URI="http://dbpedia.org/resource/United_States_Congress"

support="8569" types="Organisation,Legislature" surfaceForm="Congress" offset="36"
similarityScore="0.2348192036151886" percentageOfSecondRank="0.8635579006818564"/>
</Resources>
</Annotation>

Example 3: with SPARQL restriction


http://spotlight.dbpedia.org/rest/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20&sparql=SELECT+DISTINCT+%3Fx%0D%0AWHERE+%7B%0D%0A%3Fx+a+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2FOfficeHolder%3E+.%0D%0A%3Fx+%3Frelated+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FChicago%3E+.%0D%0A%7D


returns the XML


<Annotation text="President Obama called Wednesday on Congress to extend a tax break
for students included in last year's economic stimulus package, arguing that the policy
provides more generous assistance."
confidence="0.2" support="20"
sparql="SELECT DISTINCT ?x WHERE { ?x a <http://dbpedia.org/ontology/OfficeHolder>; .
?x ?related <http://dbpedia.org/resource/Chicago>; }"
policy="whitelist">
<Resources>
<Resource URI="http://dbpedia.org/resource/Barack_Obama"
support="5761" types="Person,Politician,President" surfaceForm="President Obama" offset="0"
similarityScore="0.2730408310890198" percentageOfSecondRank="-1.0"/>
</Resources>
</Annotation>

Example 4: Candidates Interface


Input:
The parameters are the same as in /annotate, but you will send your request to http://spotlight.dbpedia.org/rest/candidates


Output example:

<annotation text="President Obama on Monday will call for a new minimum tax rate for individuals making more than $1 million a year to ensure that they pay at least the same percentage of their earnings as other taxpayers, according to administration officials. ">
<surfaceForm name="individuals" offset="67">
<resource label="Individual" uri="Individual" contextualScore="0.26683980226516724" percentageOfSecondRank="-1.0" support="312" priorScore="0.0" finalScore="0.26683980226516724"/>
<resource label="The Individuals (New Jersey band)" uri="The_Individuals_%28New_Jersey_band%29" contextualScore="0.011762913316488266" percentageOfSecondRank="-1.0" support="17" priorScore="0.0" finalScore="0.011762913316488266"/>
<resource label="The Individuals (Chicago band)" uri="The_Individuals_%28Chicago_band%29" contextualScore="0.0" percentageOfSecondRank="-1.0" support="0" priorScore="0.0" finalScore="0.0"/>

</surfaceForm>
<surfaceForm name="officials" offset="233">

<resource label="Official" uri="Official" contextualScore="0.1324356347322464" percentageOfSecondRank="-1.0" support="196" priorScore="0.0" finalScore="0.1324356347322464"/>
<resource label="Rugby league match officials" uri="Rugby_league_match_officials" contextualScore="0.04376954212784767" percentageOfSecondRank="-1.0" support="9" priorScore="0.0" finalScore="0.04376954212784767"/>

</surfaceForm>
<surfaceForm name="President Obama" offset="0">

<resource label="Presidency of Barack Obama" uri="Presidency_of_Barack_Obama" contextualScore="0.5634340643882751" percentageOfSecondRank="-1.0" support="134" priorScore="0.0" finalScore="0.5634340643882751"/>

</surfaceForm>
<surfaceForm name="1 million" offset="97">

<resource label="Million" uri="Million" contextualScore="0.527919590473175" percentageOfSecondRank="-1.0" support="492" priorScore="0.0" finalScore="0.527919590473175"/>

</surfaceForm>
<surfaceForm name="percentage" offset="156">

<resource label="Percentage" uri="Percentage" contextualScore="0.6362485885620117" percentageOfSecondRank="-1.0" support="165" priorScore="0.0" finalScore="0.6362485885620117"/>

</surfaceForm>
<surfaceForm name="earnings" offset="176">

<resource label="Income" uri="Income" contextualScore="0.5776156187057495" percentageOfSecondRank="-1.0" support="648" priorScore="0.0" finalScore="0.5776156187057495"/>

</surfaceForm>
<surfaceForm name="taxpayers" offset="194">

<resource label="Tax" uri="Tax" contextualScore="0.7484055757522583" percentageOfSecondRank="-1.0" support="1540" priorScore="0.0" finalScore="0.7484055757522583"/>
<resource label="Tax Payers' Alliance" uri="Tax Payers%27_Alliance" contextualScore="0.12765906751155853" percentageOfSecondRank="-1.0" support="15" priorScore="0.0" finalScore="0.12765906751155853"/>
<resource label="The Taxpayer (Luxembourg)" uri="The_Taxpayer_%28Luxembourg%29" contextualScore="0.024930020794272423" percentageOfSecondRank="-1.0" support="3" priorScore="0.0" finalScore="0.024930020794272423"/>
<resource label="The Taxpayers" uri="The_Taxpayers" contextualScore="0.0" percentageOfSecondRank="-1.0" support="0" priorScore="0.0" finalScore="0.0"/>
</surfaceForm>
</annotation>


 
There are no files on this page. [Display files/form]
There is no comment on this page. [Display comments/form]

Information

Last Modification: 2011-09-29 18:28:46 by Pablo Mendes