SPARQL over Online Databases

SPARQL over Online Databases

Querying DBpedia

The DBpedia data set enables some astonishing queries against Wikipedia data to be answered. You can query the DBpedia data set online via a SPARQL endpoint (described on this page) and as Linked Data.

Public SPARQL Endpoint

A public SPARQL endpoint for querying the DBpedia data set is at http://dbpedia.org/sparql.  OpenLink Virtuoso serves as both the back-end database SPARQL query engine and the front-end HTTP/SPARQL server with an nginx overlay primarily to cache results for each submitted query string.

NOTE: The public endpoint does NOT include all available DBpedia data sets. The Loaded Datasets subsection below provides a list of all DBpedia data sets currently loaded into the public SPARQL endpoint.

You can run queries against DBpedia using:

NOTE: Please read the documentation and usage notes about the public SPARQL endpoint carefully before you run queries against it. Be aware that as part of our Fair Use Policy, complex queries may be subject to various restrictions and limitations.

Triple Pattern Fragments

Triple Pattern Fragments provide triple-pattern-based access to a dataset. This enables client-side querying of live data with high availability at low cost. You can access the DBpedia dataset as Triple Pattern Fragments, and even perform federated querying over multiple datasets.

Public Faceted Web Service Interface

There is a public Faceted Browser “search and find” user interface at http://dbpedia.org/fct. Usage details can be found in the Virtuoso Facets Web Service documentation.

REST API

There is a new and experimental REST style API based on Spring and Swagger driven by the DBpedia Ontology and public English SPARQL endpoint. More details can be found in the documentation and usage notes.

Demo Query Script for Text Search on Virtuoso

We developed and published a simple script as a software study before starting the development of Relfinder. We think that it will help you get familiar with SPARQL + String search on a Virtuoso server which hosts DBpedia. The demo is deployed here and you can find the source code here.

Another demo is deployed here for getting familiar with Relfinder and the query plan. The code for this demo exists here.

Example Queries displayed with the Berlin SNORQL Query Explorer

Example rendering DBpedia Data with Google Map

Example displaying DBpedia Data with Exhibit

  • Persons by birthplace (in French, does not work with Internet Explorer) — demo does not appear to work anymore as of 2009-11-09 – the link might be removed in the future

Example displaying DBpedia Data with gFacet

  • gFacet is a new approach for browsing RDF data, which combines graph based visualization and faceted filtering techniques. A demo for DBpedia and other Linked Data resources is available online: http://www.visualdataweb.org/gfacet.php

This site describes basic configuration parameters and important usage notes or limitations regarding the public DBpedia SPARQL Endpoint query service. The information on this side only applies for the English DBpedia SPARQL endpoint. For information about the endpoints of individual chapters see the website of the according chapter.

Details of the Public SPARQL Endpoint

Back-end and front-end type

The SPARQL back-end is powered by Virtuoso and hosted by OpenLink Software. The HTTP/S front-end uses nginx to cache requests for each submitted query string.

Rates and Limitations

A Fair Use Policy is in place in order to provide a stable and responsive endpoint for the community. The number of connections-per-second you can make is limited, as are result set sizes and query times, conforming to the following settings:

ResultSetMaxRows           = 10000
MaxQueryExecutionTime      =   120  (seconds)
MaxQueryCostEstimationTime =  1500  (seconds)
Connection limit           =    50  (parallel connections per IP address)
maximum request rate       =   100  (requests per second per IP address, 
                                     with an initial burst of 120 requests)

NOTE: Queries which time out will return PARTIAL results in a best effort fashion, and will NOT return an error. This is realized by Virtuoso’s Anytime Query feature which is enabled for the public endpoint. See the subsection below for more details.

Partial Results for Large Result Sets

The public endpoint limits the size of result sets, currently to 10000 rows.

ATTENTION: Partial result sets are returned the same way as complete result sets.  There is no HTTP error status code; just a 200 OK code). 

To check whether a query has delivered partial results, an application must evaluate the HTTP return headers. If full execution of the query would have returned more than the configured maximum number of rows, the X-SPARQL-MaxRows HTTP response header is added, as shown below:

X-SPARQL-default-graph: http://dbpedia.org
X-SPARQL-MaxRows: 10000
Expires: Tue, 07 Jan 2018 12:00:00 GMT

Partial Results for Timed-Out Queries (Anytime Queries)

The public endpoint features Virtuoso’s anytime queries to return partial results for timed-out queries. See the Virtuoso product manual for a detailed explanation.

NOTE: Timed-out queries will return partial results the same way as complete result sets.  There is no HTTP error status code; just a 200 OK code).  This means that the values returned for any aggregate query may not be correct with respect to the whole DBpedia dataset and can vary between query executions!

To discover whether a query has delivered partial results, an application must evaluate the HTTP return headers. When results are impacted by the anytime query feature, several additional response headers (X-SQL-State, X-SQL-Message, X-Exec-Milliseconds, and X-Exec-DB-Activity) will be added to the query response, as shown here:

X-SPARQL-default-graph: http://dbpedia.org
X-SQL-State: S1TAT
X-SQL-Message: RC...: Returning incomplete results, query interrupted by result timeout.  Activity:      7 rnd  64.87M seq      0 same seg       1 same pg      0 same par      0 disk      0 spec disk      0B /      0 mess
X-Exec-Milliseconds: 30000
X-Exec-DB-Activity: 7 rnd  64.87M seq      0 same seg       1 same pg      0 same par      0 disk      0 spec disk      0B /      0 messages      0 fork
Expires: Tue, 07 Jan 2018 12:00:00 GMT

Loaded Datasets

The public endpoint for the English chapter does NOT include all available DBpedia data sets. Data in the endpoint is loaded from the Databus Latest Core Collection, which is explained on the Databus Latest Core Releases page.

Before the existence of the frequently updated latest core releases, the endpoint loaded the 2016-10 dataset:

Further Reading

For more information about the current restrictions on the public DBpedia endpoint and how to deal with them, you can read  this usage report from October 2020.

You may also find that these threads from the DBpedia Discussion mailing list are useful: