DBpedia DataID

April 29, 2015, Categories: User Applications, Query Results Visualization, URI Lookup Services

DataID

The DBpedia DataID vocabulary is a meta-data system for detailed descriptions of datasets and their different manifestations, as well as relations to agents like persons or organizations, in regard to their rights and responsibilities.

 

Read more about DBpedia Groups at http://dbpedia.org/get-involved/dbpedia-groups
You can support DBpedia Groups via Donations to the DBpedia Association

DataID Ontology:
https://github.com/dbpedia/DataId-Ontology

Issues:
please use the issue track of GitHub if you have issues with the ontology:
https://github.com/dbpedia/DataId-Ontology/issues

for every other concern please use the mailing list

Mailing list:
Subscribe | mailto:dbpedia-dataid@lists.sourceforge.net


Content

 

The DBpedia DataID Unit is a DBpedia Group with the goal of describing datasets of any kind via RDF files, to host and deliver these metadata files together with the dataset in a uniform way, create and validate such files and deploy the results for the DBpedia and its local chapters. Established vocabularies like DCAT, VoID, Prov-O and FOAF are reused for maximum compatibility to establish a uniform and accepted way to describe and deliver dataset metadata for arbitrary datasets and to put existing standards into practice. Many use-cases might profit by adding a simple top level ontology on top of the DataID vocabulary to fit a singular domain, as demonstrated in the DMP example below. In addition, DBpedia DataID Unit is also creating a service stack to implement a simple API for managing and validating DataIDs. A website for creating and versioning DataIDs, as well as a search interface for existing datasets will go online in a short while.

How to participate

You can join the DataID unit by writing your name and affiliation under members. At the moment discussion will take place in the DBpedia discussion mailing list.

We are open for suggestions concerning the ontology, possible use cases support for additional metadata platforms and your general participation in this project.

Motivation

A number of established vocabularies to describe information about datasets exist and are recommended to use by WC. They can be used to indicate where and how the dataset is distributed, what category it belongs to, what other datasets are linked, where example resources can be found, who published it under which license and much more. However, there is no best practice on where this metadata should be published, how it should be maintained and what it is supposed to contain. Distributing this metadata with the dataset can greatly ease the maintenance of dataset entries in data repositories like http://datahub.io/, semantic search and dataset usage. By defining rights and responsibilities of agents together with the dataset metadata deals with common uncertainties as to whom to contact about a dataset or who published certain datasets (and many more).

Multi layer ontology

Due to the growing complexity and different usage purposes we modularised the DataID ontology in a core and multiple mid-layer ontologies. While the core ontology is mandatory to import for any of the mid-level ontologies presented, non of those are required for describing data. That said, in many use cases some or all of the mid-level ontologies will be a useful extension.

 

The DataID core model

You can take a look at the data model of the DataID core here:

 

The model integrates DCAT, VoID, Prov-O and FOAF. Extensions can be made for typical use cases. Please refer to the mailing lists for more information.

 

Reporting

DBpedia Groups are reporting to relevant other community groups to get feedback, e.g.. W3C groups, OKFN or Wikimedia.
Furthermore, summary reports are sent to associated industry partners of DBpedia (sign-up via dbpedia@infai.org )
This group will report to:

 

Specific goals

  1. Creating a DataID file for the DBpedia project as a whole: In the process of creating this file, upcoming development and modeling questions will be solved iteratively on the go. The result will be deployed for the main DBpedia project, as well as the local chapters.
  2. DataID generator: A generator app will be developed that can be used to generate a DataID file from metadata entered into a form.
  3. Validator Service for the File: RDFUnit will be used to establish a validator service for DataID files. This service will consume a dataset URI, access the DataID file for this datasetand validate it for compliance to the established format.
  4. Compliance with DataHub: We will try to either establish a service that automatically transfers the DBpedia DataID metadata to http://datahub.io/ or prefereably get the datahub.io team to allow for automatic retrieval of DataID files by datahub.io in regular intervals.
  5. Statistical module: A statistical module will be developed that automatically generates statistical data about the dataset (like triple count, SPARQL service uptime, Ontology usage and links to other datasets) after Input of a DataID file. LODStats and SparqlES will be used to facilitate this task.
  6. Data Management Plan: Document the progress of a DMP for (research) projects by extending the DataID base ontology with additional information like a preservation concept. This should be of help for any project dealing with datasets of any sort.
  7. Spread the word: Implement the resulting practice by establishing DataID for as many datasets as possible to finally have a universally accepted way of dataset description that is delivered by the datasets themselves.

 

Benefits

More than just providing an expressive meta vocabulary for datasets, DataID and its Ecosystem will provide many additional benefits for Data engineers, publishers, maintainers and users. We try to give an overview of our overall vision in this presentation:

In summary:

  • Easy generation of DataIDs via web interface
  • Validation of DataIDs
  • Publishing validated IDs to the dataid.dbpedia.org, datahub.io and others
  • Providing a search interface for existing datasets
  • Dockerizing datasets
  • RDF datsets are provided with additional perks (Dynamic LOD...)
  • Automation for many use cases via a simple API
  • Event message support for important life cycle events
  • Version control system for DataIDs and datasets

 

Members

Markus Freudenberg - <PI>
Martin Brümmer – http://aksw.org/MartinBruemmer
Ciro Baron - http://aksw.org/CiroBaron.html
Ivan Ermilov – http://aksw.org/IvanErmilov.html
Dimitris Kontokostas – http://aksw.org/DimitrisKontokostas

Results

DataID generator

A visual, easy to use tool to create DataIDs can be found here:

http://dataid.dbpedia.org/