DBpedia DataID Unit

Read more about DBpedia Groups at http://wiki.dbpedia.org/coop/
You can support DBpedia Groups via Donations to the DBpedia Association
GitHub: https://github.com/dbpedia/dataId
Mailing list: Subscribe | emailmailto:dbpedia-dataid@lists.sourceforge.net


The DBpedia DataID Unit is a DBpedia Group with the goal of describing LOD datasets via RDF files, to host and deliver these metadata files together with the dataset in a uniform way, create and validate such files and deploy the results for the DBpedia and its local chapters. Established vocabularies like DCAT, VoID, Prov-O and SPARQL Service Description are to be reused for maximum compatibility. This way, we hope to establish a uniform and accepted way to describe and deliver dataset metadata for arbitrary LOD datasets and to put existing standards into practice.

How to participate


You can join the DataID unit by writing your name and affiliation under members. At the moment discussion will take place in the DBpedia discussion mailing list.


Your contribution can be manifold, i.e. implement a service for DataID, generate statistics with your own tool and link them, add custom properties, i.e. dataid:lodStatsLink or dataid:sparqlesLink (or whatever you might need).
A simple but important contribution is also to add a DataId Turtle file to your data to describe your data yourself.

Motivation


A number of established vocabularies to describe information about datasets exist and are recommended to use by W3C. They can be used to indicate where and how the dataset is distributed, what category it belongs to, what other datasets are linked, where example resources can be found, who published it under which license and much more. However, there is no best practice on where this metadata should be published, how it should be maintained and what it is supposed to contain. Distributing this metadata with the dataset can greatly ease the maintenance of dataset entries in data repositories like http://datahub.io/, semantic search and dataset usage.


Reporting

DBpedia Groups are reporting to relevant other community groups to get feedback, e.g. W3C groups, OKFN or Wikimedia.
Furthermore, summary reports are sent to associated industry partners of DBpedia (sign-up via dbpedia@infai.org )
This group will report to:

Specific goals


  1. Creating a DataID file for the DBpedia project as a whole: In the process of creating this file, upcoming development and modeling questions will be solved iteratively on the go. The result will be deployed for the main DBpedia project, as well as the local chapters.
  2. DataID generator: A generator app will be developed that can be used to generate a DataID file from metadata entered into a form.
  3. Validator Service for the File: RDFUnit will be used to establish a validator service for DataID files. This service will consume a dataset URI, access the DataID file for this datasetand validate it for compliance to the established format.
  4. Compliance with DataHub: We will try to either establish a service that automatically transfers the DBpedia DataID metadata to http://datahub.io/ or prefereably get the datahub.io team to allow for automatic retrieval of DataID files by datahub.io in regular intervals.
  5. Statistical module: A statistical module will be developed that automatically generates statistical data about the dataset (like triple count, SPARQL service uptime, Ontology usage and links to other datasets) after Input of a DataID file. LODStats and SparqlES will be used to facilitate this task.
  6. Spread the word: Implement the resulting practice by establishing DataID for as many datasets as possible to finally have a universally accepted way of dataset description that is delivered by the datasets themselves.

Members


Martin Brümmer – http://aksw.org/MartinBruemmer
Ciro Baron
Ivan Ermilov – http://aksw.org/IvanErmilov.html
Markus Freudenberg
Dimitris Kontokostas – http://aksw.org/DimitrisKontokostas

Results

Data model


You can take a look at the data model of the DataID here:


http://mlode.nlp2rdf.org/dataid_vocab.png


The model integrates DCAT, VoID, Prov-O and SPARQL Service Description. Extensions can be made for typical use cases. Please refer to the mailing lists for more information.

DataID generator


A visual, easy to use tool to create DataIDs can be found here:


http://dataid.dbpedia.org/