NIF Abstract Datasets

DBpedia currently primarily focus on representing factual knowledge as contained in Wikipedia infoboxes. A vast amount of information, however, is contained in the unstructured Wikipedia article texts. In order to broaden and deepen the amount of structured DBpedia data, we are going a step further.
With the representation of wiki pages in the NLP Interchange Format (NIF) we provide all information directly extractable from the HTML source code divided into three datasets:

  • nif-context: the full text of a page as context (including begin and end index)
  • nif-page-structure: the structure of the page in sections and paragraphs (titles, subsections etc.)
  • nif-text-links: all in-text links to other DBpedia resources as well as external references

These datasets will serve as the groundwork for further NLP fact extraction tasks to enrich the gathered knowledge of DBpedia.

Note: The first iteration of this extraction only covers the abstracts of every wiki page as a trail run. Starting from release 2016-10, we will provide the whole wiki page text in the NIF format.

IRIs: As you will see in the examples below, opposed to the IRI regime used for other DBpedia datasets, we use queries containing the version of DBpedia under which these instances were extracted. 

If you find inconsistencies in these files, please contact the DBpedia mailing lists or the DBpedia association directly, thank you.

Downloads

All files are listed in the table below or directly available here: http://downloads.dbpedia.org/2016-04/ext/nif-abstracts/ .

Language nif-abstract-context nif-page-structure nif-text-links
de .ttl  .tql .ttl  .tql .ttl  .tql
en .ttl  .tql .ttl  .tql .ttl  .tql
es .ttl  .tql .ttl  .tql .ttl  .tql
fr .ttl  .tql .ttl  .tql .ttl  .tql
it .ttl  .tql .ttl  .tql .ttl  .tql
ja .ttl  .tql .ttl  .tql .ttl  .tql
ko .ttl  .tql .ttl  .tql .ttl  .tql
pl .ttl  .tql .ttl  .tql .ttl  .tql
pt .ttl  .tql .ttl  .tql .ttl  .tql


Examples

nif-context.ttl

The full text of a wiki page as the context for all subsequent information about this page.

dbr:Anthropology?dbpv=2016-04&nif=context     a     nif:#Context .

dbr:Anthropology?dbpv=2016-04&nif=context    nif:isString    "Anthropology is the study of humanity. Its main subdivisions are social anthropology and cultural anthropology, which describes the workings of societies around the world, linguistic anthropology, which investigates the influence of language in social life, and biological or physical anthropology, which concerns long-term development of the human organism. Archaeology, which studies past human cultures through investigation of physical evidence, is thought of as a branch of anthropology in the United States, although in Europe, it is viewed as a discipline in its own right, or grouped under related disciplines such as history." .

dbr:Anthropology?dbpv=2016-04&nif=context    nif:beginIndex    "0"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=context    nif:endIndex      "634"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=context    nif:sourceUrl     <http://en.wikipedia.org/wiki/Anthropology> .
dbr:Anthropology?dbpv=2016-04&nif=context    nif:predLang     <http://lexvo.org/id/iso639-3/eng> .

nif-page-structure​.ttl

The structure of the wiki page as nif:Structure instances, such as Section, Paragraph and Title.

dbr:Anthropology?dbpv=2016-04&nif=context    nif:hasSection    dbr:Anthropology?dbpv=2016-04&nif=section_0_634    .

dbr:Anthropology?dbpv=2016-04&nif=section_0_634    a    nif:Section    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:beginIndex    "0"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:endIndex    "634"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:hasParagraph    dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:hasParagraph    dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:firstParagraph    dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:lastParagraph    dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_63    .

dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    a    nif:Paragraph    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    nif:beginIndex    "0"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    nif:endIndex    "330"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    nif:superString    dbr:Anthropology?dbpv=2016-04&nif=section_0_634    .

dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    a    nif:Paragraph    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    nif:beginIndex    "331"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    nif:endIndex    "634"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    nif:superString    dbr:Anthropology?dbpv=2016-04&nif=section_0_634    .

 

nif-text-links.ttl

All in-text links of a wiki page as nif:Word or nif:Phrase.

dbr:Anthropology?dbpv=2016-04&nif=word_29_37    a    nif:Word .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:beginIndex    "29"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:endIndex    "37"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:superString    dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_634 .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    <http://www.w3.org/2005/11/its/rdf#taIdentRef>    dbr:Human .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:anchorOf    "humanity" .

dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    a    nif:Phrase    .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:beginIndex    "65"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:endIndex    "84"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:superString    dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_634 .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    <http://www.w3.org/2005/11/its/rdf#taIdentRef>    dbr:Social_anthropology .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:anchorOf    "social anthropology" .