DBpedia 2015-10 Dataset Statistics
This page provides elaborate statistics about the DBpedia 2015-10 release. This release contains localized editions of DBpedia for 127 languages which have been extracted from the Wikipedia editions in the corresponding language. For all languages with mappings between the raw Wikipedia infoboxes and the DBpedia ontology (53) we extracted statistical overviews containing the overall number of things (instances) as well as the number of facts (statements) that have been extracted from infoboxes describing these things. We compare these numbers with the corresponding ones of the last release (2015-04) in the second table. Additionally, we report the number of instances of popular classes and properties within these 53 DBpedia editions.
All statistics are also available as files in JSON format here.
1. Entity Type Instances, Properties, and Statements per Language
The same thing, for instance a person or city, might be described by multiple pages within Wikipedia editions in different languages. Pages describing the same thing are often interlinked by cross-language links within Wikipedia.
When DBpedia extracts data from these pages, it produces two types of data sets. The localized data sets contain all things that are described in a specific language and in which things are identified with a language specific IRI. In addition, we produce a canonicalized data set for each language. The canonicalized data sets only contain things for which a corresponding page in the English edition of Wikipedia exists. Within all canonicalized datasets, the same thing is identified with the same IRI from the generic namespace http://dbpedia.org/resource/.
DBpedia uses two different extractors to extract data from Wikipedia infoboxes. The mapping-based extractor extracts data only for the infoboxes for which a language-specific extraction mapping to the DBpedia ontology exists in the DBpedia mapping wiki. Based on these mappings, it normalizes the different names that are used in various languages to refer to the same property. The second extractor is the raw infobox extractor which uses a generic heuristic to extract data from all infoboxes. The raw infobox extractor does not normalize property names but produces language-specific properties that directly reflect the property name in the Wikipedia infobox.
Below we report the overall number of things (instances), different ontology and raw-infobox properties, infobox statements and type statements for all 53 languages for which mappings exist in the DBpedia mapping wiki.
2. Comparison with the 2015-04 releaseThe following table integrates the Dataset Statistic for DBpedia 2015-04 with the statistics presented above, thus allowing for comparison between the versions. %-columns contain the increase in the number of instances/statements in version 2015-10 with respect to 2015-04.
3. Entity Type (or Class) Statistics
This table provides an overview for number of instances in a language edition of a certain type.