DBpedia as Tables


As some of the potential users of DBpedia might not be familiar with the RDF data model and the SPARQL query language, we provide some of the core DBpedia 3.9 data also in tabular form as Comma-Separated-Values (CSV) files and as JSON files, which can easily be processed using standard tools, such as spreadsheet applications, relational databases or data mining tools.


For each class in the DBpedia ontology (such as Person, Radio Station, Ice Hockey Player, or Band) we provide a single CSV/JSON file which contains all instances of this class. Each instance is described by its URI, an English label and a short abstract, the mapping-based infobox data describing the instance (extracted from the English edition of Wikipedia), and geo-coordinates.


Altogether we provide 530 CSV/JSON files in the form of a single ZIP file (size 3 GB compressed and 73.4 GB when uncompressed). In addition, we also provide seperate CSV/JSON files for each class for download.


Contents

File Structure

Each of the CSV files corresponds to a class from the DBpedia ontology, and contains all of it instances. The names of the files are derived from the classes labels.


Each file starts with 4 headers:

  • The first header contains the properties labels.
  • The second header contains the properties URIs.
  • The third header contains the properties range labels.
  • The fourth header contains the properties range URIs.

The first column of each file, contains the URIs of the instances of the current class. The remaining columns contain the values for the properties that are defined for this class in the DBpedia ontology.


The tables contain data properties and object properties. The data properties have literals values, while the object properties links the instance to another instance, from the same class or different class. The properties labels are retrieved from the DBpedia ontology, but for the cases where the property label is missing in the ontology, we derive the property label from the property URI. The range of the properties is retrieved from the DBpedia ontology, but for the cases where the property range is missing in the ontology, we use type guesser to identify the property range. In the cases where the type guesser cannot identify the range of the properties, the properties are assigned with http://www.w3.org/2001/XMLSchema#string range, for the data properties, and http://www.w3.org/2002/07/owl#Thing, for the object properties.
In order to simplify the use of the object properties, for each object property there are two columns in the table. The first column contains the objects labels, where the property label header is concatenation of the property label and the string “_label”, and the property range header is always set to http://www.w3.org/2001/XMLSchema#string. The second column contains the objects URIs, where the label header is the property label, and the property range header is the object's class URI.


It is important to note that the tables may contain multivalue properties. The values are represented as an array inside brackets, where the values are pipe-separated.

  • Example:{value1|value2|...|valueN}

As we wanted to preserve the ontology structure and not to lose any class abstraction level, each instance might be included in several files.

Loading the Tables into a Relational Database

The files can be easily imported in a relational database, as each file correspond to a table.
Each of the headers corresponds to a relational database term:

  • The first header is equivalent with an attribute name in a table.
  • The second header represents the same attribute as the first header, but it is more specific.
  • The third header is equivalent with an attribute type.
  • The fourth header represents the same attribute type as the first header, but it is more specific.
  • The first column of the files contains URIs which uniquely identify the instances in the file and can be used as primary key.

If you want to use the relationships between the instances in your relational database, you need to convert them into primary key / foreign key relationships:

  • For this, you need to convert the object properties into relationships. If you don't want to completely resolve the relationships, then you can use only the first column of the object properties, which contains the label of the related insistence. Otherwise, the object properties needs to be used as an foreign keys, and based on the values of the properties you can set the cardinality of the relationships.
  • To resolve the is-a relationships you can use the DBpedia ontology structure, where the is-a relationships are explicitly shown.

Generating your own Custom Tables


The tables that we provide for download were generated from the following DBpedia dumps downloaded from the 
DBpedia download server:

  • instance_types_en
  • labels
  • short_abstracts_en
  • mappingbased_properties_en
  • geo_coordinates_en

If you want other DBpedia data to be included into the tables, you can download the code that we used to generate the CVS tables and run it using other DBpedia dump files. For instance, you could generate CVS dumps for a language other than English or also include the raw infobox data.


The project can be downloaded here.


In order to run the extractor you need to provide DBpedia SPARQL endpoint. You can choose if you want to use the official DBpedia SPARQL endpoint or you can provide your local SPARQL endpoint. When using the official DBpedia SPARQL endpoint, you should be aware of the time and resource consumption, as the endpoint contains additional external types, links and properties. If you choose to use a local SPARQL endpoint, you might find helpful this tutorial on setting up a local DBpedia mirror with Virtuoso.

Feedback


Please send questions and feedback to Petar Ristoski and/or the DBpedia-discussion mailing list.


 
There are no files on this page. [Display files/form]
There is no comment on this page. [Display comments/form]

Information

Last Modification: 2013-12-13 09:43:01 by Chris Bizer