Dataset 2015-04 statistics

under construction

DBpedia 2015-04 Data Set Statistics


This page provides statistics about the DBpedia 2015-04 release. The release contains localized editions of DBpedia for 128 languages which have been extracted from the Wikipedia edition in the corresponding language. For 28 out of these languages, we report the overall number of things (instances) being described in the localized version of DBpedia as well as the number of facts (statements) that have been extracted from infoboxes describing these things. Afterwards, we report the number of instances of popular classes within these 28 DBpedia editions.

Dataset statistics for DBpedia 2014 can be found here. Below we compare the numbers between the two releases.



1 Instances, Properties, and Statements per Language

The same thing, for instance a person or city, might be described by multiple pages within Wikipedia editions in different languages. Pages describing the same thing are often interlinked by cross-language links within Wikipedia.

When DBpedia extracts data from these pages, it produces two types of data sets. The localized data sets contain all things that are described in a specific language and in which things are identified with a language specific URI. In addition, we produce a canonicalized data set for each language. The canonicalized data sets only contain things for which a corresponding page in the English edition of Wikipedia exists. Within all canonicalized datasets, the same thing is identified with the same URI from the generic namespace http://dbpedia.org/resource/.

DBpedia uses two different extractors to extract data from Wikipedia infoboxes. The mapping-based extractor extracts data only for the infoboxes for which a language-specific extraction mapping to the DBpedia ontology exists in the DBpedia mapping wiki. Based on these mappings, it normalizes the different names that are used in various languages to refer to the same property. The second extractor is the raw infobox extractor which uses a generic heuristic to extract data from all infoboxes. The raw infobox extractor does not normalize property names but produces language-specific properties that directly reflect the property name in the Wikipedia infobox.

Below we report the overall number of things (instances), different ontology and raw-infobox properties, infobox statements and type statements for all 28 languages for which mappings exist in the DBpedia mapping wiki. The rows are sorted according to the number of instances for which mapping-based infobox data exists (Instances, CD, withMD column).

The column heading have the following meaning:

  • LD = Localized Data Sets.
  • CD = Canonicalized Data Sets.
  • all = Overall number of instances in the data set, calculated based on the labels and redirects dumps.
  • withMD = Number of instances for which mapping-based infobox data exists.
  • Raw Properties = Number of different properties that are generated by the raw infobox extractor.
  • Mapping Properties = Number of different properties that are generated by the mapping-based infobox extractor.
  • Raw Statements = Number of statements (facts) that are generated by the raw infobox extractor.
  • Mapping Statements = Number of statements (facts) that are generated by the mapping-based infobox extractor; include type statements.

 

Language Instances, LD, all Instances, CD, all Raw Properties, CD Mapping Properties, CD Raw Statements, CD Mapping Statements, CD Type Statements, CD
en 4,806,150 4,563,644 58,781 1,354 73,627,718 67,054,254 35,361,157
sv 1,957,255 317,205 8,653 292 6,368,428 2,424,462 1,744,362
nl 1,809,753 470,836 8,410 682 8,593,067 7,010,748 3,986,099
de 1,783,367 618,708 12,278 449 10,558,444 9,531,499 4,941,011
fr 1,596,749 479,250 15,867 617 13,060,439 7,844,148 4,393,714
ru 1,200,495 289,957 15,210 146 7,739,284 4,130,013 2,483,020
it 1,184,624 572,527 11,026 243 15,145,492 8,376,000 4,697,637
es 1,137,454 452,002 17,739 461 9,804,993 7,478,271 4,170,975
pl 1,093,887 421,177 7,931 216 9,268,776 6,431,245 3,905,668
ja 954,225 139,230 17,093 341 4,901,217 2,286,184 1,227,785
pt 867,242 333,853 15,798 509 7,813,707 5,492,790 2,730,559
uk 562,859 173,765 11,477 146 4,868,431 1,812,301 1,472,988
sr 313,993 143,815 6,219 465 2,516,774 2,151,739 1,147,021
ca 451,597 131,582 10,683 172 3,956,645 1,878,009 1,245,404
eu 206,201 102,029 3,153 124 2,099,038 1,262,626 893,497
hu 285,462 85,716 8,002 261 2,678,349 1,055,096 718,434
eo 210,455 72,421 4,779 76 1,656,603 674,639 533,400
cs 313,670 64,037 6,519 318 2,495,725 1,086,169 603,325
ko 300,615 62,977 8,990 377 1,537,444 1,047,136 578,888
tr 243,976 51,731 9,250 366 1,824,827 780,119 447,588
id 357,191 42,072 12,325 319 1,763,491 656,726 376,475
ro 264,973 507 8,950 27 2,622,736 7,984 4,235
ar 337,476 41,032 12,598 248 2,248,799 508,173 342,520
el 104,882 38,517 5,132 455 487,085 469,462 328,483
sl 144,508 31,157 5,702 371 1,035,836 393,844 296,179
az 100,222 30,826 3,341 31 390,279 260,739 247,801
be 79,966 23,604 5,471 151 631,289 328,558 196,752
cy 63,315 12,321 2,452 29 236,659 62,020 57,622
ga 33,002 5,398 1,248 102 90,768 83,038 52,555
sk 202,590 5,303 4,916 24 1,852,647 72,249 27,467
hy 146,738 266 4,161 14 802,168 4,061 3,458
hr 139,886 4,941 3,723 115 853,743 93,634 51,037
bn 33,915 474 7,532 54 346,442 5,656 3,481
bg 189,486 71,944 5,073 247 1,303,777 925,181 546,523


The following table integrates the Dataset Statistic for DBpedia 2014 with the statistics presented above, thus allowing for comparison between the versions. %-columns contain the increase in the number of instances/statements in version 2015-04 with respect to 2014. There are six new languages in the 2015-04 release: Romanian (ro), Swedish (sv), Ukrainian (uk), Esperanto (eo), Armenian (hy), Azerbaijani (az), for which property mappings has become available.

  Instances, CD, all Instances, CD, withMD Instances, LD, all Mapping Properties, CD Mapping Statements, CD Raw Properties, CD Raw Statements, CD Type Statements, CD  
Language 2014 2015-04 % 2014 2015-04 % 2014 2015-04 % 2014 2015-04 % 2014 2015-04 % 2014 2015-04 % 2014 2015-04 % 2014 2015-04 % Language
en 4,584,616 4,806,150 +5% 4,232,626 4,563,644 +8% 4,584,616 4,806,150 +5% 1,122 1,354 +21% 56,549,445 67,054,254 +19% 55,986 58,781 +5% 68,091,260 73,627,718 +8% 28,563,803 35,361,157 +24% en
fr 942,505 1,003,870 +7% 415,390 479,250 +15% 1,504,453 1,596,749 +6% 595 617 +4% 6,234,623 7,844,148 +26% 15,111 15,867 +5% 11,521,313 13,060,439 +13% 3,396,756 4,393,714 +29% fr
de 857,196 901,733 +5% 479,731 618,708 +29% 1,692,634 1,783,367 +5% 420 449 +7% 6,059,745 9,531,499 +57% 11,695 12,278 +5% 9,677,586 10,558,444 +9% 3,468,237 4,941,011 +42% de
it 745,345 790,000 +6% 540,474 572,527 +6% 1,128,909 1,184,624 +5% 249 243 -2% 7,413,922 8,376,000 +13% 10,591 11,026 +4% 13,840,025 15,145,492 +9% 3,929,338 4,697,637 +20% it
es 683,251 717,023 +5% 419,328 452,002 +8% 1,086,296 1,137,454 +5% 457 461 +1% 6,538,847 7,478,271 +14% 17,347 17,739 +2% 9,728,204 9,804,993 +1% 3,190,529 4,170,975 +31% es
nl 674,849 710,952 +5% 455,222 470,836 +3% 1,774,536 1,809,753 +2% 634 682 +8% 5,857,801 7,010,748 +20% 8,100 8,410 +4% 8,044,539 8,593,067 +7% 3,118,581 3,986,099 +28% nl
pl 653,571 685,754 +5% 411,883 421,177 +2% 1,043,400 1,093,887 +5% 219 216 -1% 5,590,196 6,431,245 +15% 7,751 7,931 +2% 8,554,227 9,268,776 +8% 3,189,677 3,905,668 +22% pl
ru 579,612 621,769 +7% 266,562 289,957 +9% 1,119,142 1,200,495 +7% 141 146 +4% 3,717,635 4,130,013 +11% 15,665 15,210 -3% 8,825,572 7,739,284 -12% 1,986,532 2,483,020 +25% ru
pt 552,362 583,018 +6% 321,211 333,853 +4% 812,610 867,242 +7% 522 509 -2% 4,801,340 5,492,790 +14% 14,637 15,798 +8% 7,069,586 7,813,707 +11% 2,185,948 2,730,559 +25% pt
sv (new)   537,237     317,205     1,957,255     292     2,424,462     8,653     6,368,428     1,744,362   sv (new)
ja 397,907 417,820 +5% 134,380 139,230 +4% 913,488 954,225 +4% 342 341 +0% 2,028,745 2,286,184 +13% 15,981 17,093 +7% 4,403,612 4,901,217 +11% 1,002,180 1,227,785 +23% ja
uk (new)   310,496     173,765     562,859     146     1,812,301     11,477     4,868,431     1,472,988   uk (new)
ca 289,485 306,031 +6% 128,544 131,582 +2% 426,696 451,597 +6% 175 172 -2% 1,574,797 1,878,009 +19% 10,183 10,683 +5% 3,643,659 3,956,645 +9% 962,352 1,245,404 +29% ca
sr 189,158 207,131 +10% 138,166 143,815 +4% 246,996 313,993 +27% 470 465 -1% 1,853,525 2,151,739 +16% 6,069 6,219 +2% 2,278,757 2,516,774 +10% 873,394 1,147,021 +31% sr
cs 193,674 205,124 +6% 48,356 64,037 +32% 296,094 313,670 +6% 291 318 +9% 649,900 1,086,169 +67% 6,368 6,519 +2% 2,272,303 2,495,725 +10% 377,149 603,325 +60% cs
ar 170,430 203,150 +19% 44,298 41,032 -7% 266,386 337,476 +27% 254 248 -2% 479,823 508,173 +6% 11,008 12,598 +14% 1,185,465 2,248,799 +90% 316,167 342,520 +8% ar
ro (new)   202,450     507     264,973     27     7,984     8,950     2,622,736     4,235   ro (new)
ko 178,872 188,800 +6% 58,937 62,977 +7% 276,881 300,615 +9% 377 377 +0% 878,745 1,047,136 +19% 8,503 8,990 +6% 1,409,638 1,537,444 +9% 458,870 578,888 +26% ko
hu 171,391 187,830 +10% 76,273 85,716 +12% 260,512 285,462 +10% 268 261 -3% 830,290 1,055,096 +27% 7,806 8,002 +3% 2,429,115 2,678,349 +10% 536,290 718,434 +34% hu
id 142,616 160,343 +12% 43,980 42,072 -4% 354,326 357,191 +1% 329 319 -3% 653,002 656,726 +1% 11,514 12,325 +7% 1,599,822 1,763,491 +10% 347,255 376,475 +8% id
eo (new)   156,618     72,421     210,455     76     674,639     4,779     1,656,603     533,400   eo (new)
tr 143,914 153,822 +7% 57,034 51,731 -9% 233,737 243,976 +4% 370 366 -1% 825,459 780,119 -5% 9,008 9,250 +3% 1,636,893 1,824,827 +11% 443,345 447,588 +1% tr
eu 139,023 152,220 +9% 90,948 102,029 +12% 178,822 206,201 +15% 118 124 +5% 916,523 1,262,626 +38% 2,947 3,153 +7% 2,010,728 2,099,038 +4% 577,224 893,497 +55% eu
sk 138,492 142,334 +3% 5,268 5,303 +1% 192,410 202,590 +5% 25 24 -4% 70,207 72,249 +3% 4,757 4,916 +3% 1,814,997 1,852,647 +2% 21,148 27,467 +30% sk
bg 112,571 139,171 +24% 44,698 71,944 +61% 161,427 189,486 +17% 223 247 +11% 599,891 925,181 +54% 5,095 5,073 +0% 964,269 1,303,777 +35% 333,355 546,523 +64% bg
hr 92,952 97,073 +4% 12,003 4,941 -59% 135,272 139,886 +3% 139 115 -17% 200,690 93,634 -53% 3,674 3,723 +1% 827,890 853,743 +3% 106,691 51,037 -52% hr
sl 85,167 89,710 +5% 25,494 31,157 +22% 140,612 144,508 +3% 406 371 -9% 323,292 393,844 +22% 4,844 5,702 +18% 950,604 1,035,836 +9% 212,340 296,179 +39% sl
el 67,390 74,115 +10% 36,255 38,517 +6% 96,301 104,882 +9% 445 455 +2% 382,708 469,462 +23% 4,437 5,132 +16% 389,068 487,085 +25% 252,492 328,483 +30% el
hy (new)   66,215     266     146,738     14     4,061     4,161     802,168     3,458   hy (new)
be 52,040 57,153 +10% 23,512 23,604 +0% 71,656 79,966 +12% 175 151 -14% 301,188 328,558 +9% 4,998 5,471 +9% 557,540 631,289 +13% 168,132 196,752 +17% be
az (new)   56,533     30,826     100,222     31     260,739     3,341     390,279     247,801   az (new)
cy 43,127 46,540 +8% 11,945 12,321 +3% 57,127 63,315 +11% 28 29 +4% 59,428 62,020 +4% 2,084 2,452 +18% 204,058 236,659 +16% 54,578 57,622 +6% cy
bn 26,136 29,729 +14% 2,160 474 -78% 29,631 33,915 +14% 83 54 -35% 30,350 5,656 -81% 6,609 7,532 +14% 271,070 346,442 +28% 19,015 3,481 -82% bn
ga 27,674 29,708 +7% 4,176 5,398 +29% 30,670 33,002 +8% 67 102 +52% 51,086 83,038 +63% 1,231 1,248 +1% 83,457 90,768 +9% 31,872 52,555 +65% ga

 

 

2 Instances of Selected Classes per Language

The table below reports the number of instances for a set of selected classes within the canonicalized DBpedia data sets for each language.
Type En Ar Az Be Bg Bn Ca Cs Cy De El Eo Es Eu Fr Ga Hr Hu Hy Id It Ja Ko Nl Pl Pt Ro Ru Sk Sl Sr Sv Tr Uk
Agent 2,248,238 12,121 6,745 7,528 23,722 59 11,498 23,011 591 301,034 9,852 5,043 126,438 5,997 182,282 2,107 2,841 33,435 266 16,830 221,708 60,628 29,978 80,902 120,096 76,022 60 112,943 695 15,728 19,527 55,028 21,388 40,979
Person 2,060,507 9,435 6,564 7,208 20,261 59 10,207 22,259 512 272,326 7,077 4,319 109,897 4,930 152,529 1,866 2,841 28,479 266 15,771 205,223 51,792 23,574 61,148 103,774 62,223 4 96,603 695 14,928 16,064 40,865 16,265 33,497
Politician 36,221 0 0 35 283 0 2,129 2,538 97 0 1,003 656 7,724 1,151 13,117 320 0 2,828 0 0 5,776 2,787 721 2,085 11,094 4,440 0 0 0 65 1,321 0 0 0
Athlete 278,512 1,973 0 904 3,453 0 565 5,672 0 47,880 1,796 0 36,596 561 74,136 201 0 8,697 0 7,084 74,688 18,940 6,534 29,119 49,789 20,535 0 20,076 695 1,719 2,947 25,791 1,508 13,903
SoccerPlayer 102,618 1,973 0 560 2,705 0 565 3,636 0 23,196 1,484 0 737 561 25,372 200 0 7,579 0 6,980 38,330 10,706 5,061 15,465 27,220 12,107 0 20,076 0 0 46 6,699 164 6,907
Artist 100,919 4,320 1,712 2,080 4,201 0 3,011 8,571 0 0 2,028 1,129 37,225 1,168 37,512 979 2,204 7,182 266 2,504 17,306 24,059 11,146 17,807 21,798 16,019 0 33,533 0 876 4,629 61 8,434 5,025
Actor 6,591 2,773 0 27 1,792 0 2,224 2,436 0 0 0 324 14,881 1,168 15,956 552 82 2,892 0 0 516 12,768 7,579 8,580 10,987 8,517 0 0 0 57 2,270 61 3,047 183
MusicalArtist 47,238 1,124 843 458 1,244 0 787 6,132 0 0 315 661 15,402 0 12,099 193 2,044 0 0 2,198 16,790 8,345 2,917 6,379 7,399 6,750 0 9,853 0 664 952 0 3,851 2,094
Place 681,916 20,426 6,472 11,387 22,303 0 76,944 25,151 11,688 175,856 5,301 44,227 165,217 51,917 163,464 2,556 172 22,423 0 5,374 182,412 21,186 11,802 185,036 221,880 126,580 0 96,002 0 12,083 89,210 66,650 10,134 91,944
ArchitecturalStructure 132,100 898 0 460 287 0 10 609 0 9,404 147 1,195 13,667 1,137 14,532 0 16 971 0 492 12,118 13,542 6,998 15,134 13,140 1,576 0 466 0 78 463 2,188 652 2,247
Infrastructure 68,233 712 0 45 0 0 10 178 0 6,284 111 1,138 7,282 1 5,846 0 0 518 0 245 7,527 12,139 6,717 9,956 10,384 1,320 0 371 0 40 195 1,047 348 1,343
Building 58,720 186 0 364 287 0 0 431 0 2,153 32 57 5,357 1,136 8,282 0 16 453 0 247 4,543 1,104 281 2,796 2,756 256 0 95 0 38 256 1,068 304 691
NaturalPlace 53,890 87 0 926 483 0 566 3,441 413 19,109 298 1,015 6,783 846 10,262 487 137 1,585 0 151 2,511 1,517 496 4,341 4,989 5,988 0 4,720 0 248 1,901 343 133 2,000
PopulatedPlace 455,398 19,075 6,472 9,530 18,292 0 76,368 20,557 671 102,021 4,552 41,660 138,473 49,510 128,837 2,069 19 18,613 0 4,522 166,034 5,106 3,645 163,644 200,998 117,349 0 90,816 0 11,703 86,292 56,723 9,347 86,161
Settlement 430,834 17,043 6,472 0 3,658 0 76,364 17,388 437 87,219 2,890 15,100 0 162 122,049 1,856 19 17,517 0 3,672 0 5,059 2,817 89,896 157,268 70,212 0 0 0 11,380 55,325 11,368 7,730 14,592
City 19,381 160 2,532 0 0 0 76,364 11,053 437 1,688 2,233 13,521 0 0 44,041 1,273 19 0 0 35 0 5,059 0 625 16,072 6,891 0 0 0 8,154 55,201 850 0 2,125
Region 19,459 1,630 0 80 0 0 4 2,871 0 10,238 629 12,492 135,193 49,095 1,189 0 0 0 0 804 2,912 47 107 69,171 40,832 7,229 0 12,841 0 52 1,463 42,752 1,220 50,113
Work 376,340 5,933 995 495 4,468 415 6,200 9,068 42 65,636 15,371 1,206 55,202 1,683 69,778 132 1,769 13,297 0 5,827 84,639 30,644 10,212 33,076 25,394 46,936 447 50,918 0 894 4,058 24,683 11,432 12,279
WrittenWork 51,019 397 0 111 224 18 110 33 42 27,401 13,017 50 5,554 0 8,030 0 56 858 0 110 7,529 685 381 4,460 2,633 2,420 447 21,719 0 192 297 3,339 1,200 1,526
Book 31,172 395 0 79 30 18 110 1 42 0 12,995 50 3,381 0 4,121 0 53 682 0 15 6,542 571 235 1,057 1,783 1,514 447 21,719 0 185 235 3,039 785 1,150
Software 25,458 1,257 0 104 0 30 547 49 0 5,541 305 0 6,552 0 9,586 0 0 1,159 0 196 7,511 5,395 1,815 4,172 0 3,883 0 0 0 150 236 2,909 1,176 1,964
TelevisionShow 25,802 237 0 0 119 0 0 0 0 3,566 191 0 4,197 0 4,879 0 0 1,259 0 0 624 972 1,436 2,112 3,306 4,019 0 0 0 122 582 1,343 828 195
MusicalWork 143,119 435 0 11 1,865 0 1 5,625 0 8,205 1,399 64 22,595 156 23,889 1 64 5,562 0 2,182 35,693 9,659 2,375 9,382 5,456 21,873 0 13,335 0 208 909 8,873 4,127 937
Album 92,342 212 0 0 1,419 0 1 4,444 0 5,861 1,226 64 13,660 156 15,495 0 52 4,098 0 1,054 34,416 5,705 1,373 5,081 0 13,819 0 9,424 0 24 621 7,789 2,216 840
Film 90,060 3,477 995 106 2,199 367 5,542 3,234 0 19,857 152 692 13,068 1,527 18,272 131 1,491 3,734 0 3,335 27,105 10,708 4,042 11,905 13,307 13,444 0 15,864 0 202 1,949 7,216 3,945 7,101
Species 277,642 0 13,558 2,811 19,849 0 36,940 0 0 29,539 0 8,178 68,524 0 0 276 0 95 0 12,688 24,236 11,086 6,942 146,184 24,821 50,370 0 25,050 0 1 7,642 159,481 0 8,887
Eukaryote 270,244 0 13,558 2,761 19,729 0 36,552 0 0 61 0 8,178 67,441 0 0 262 0 95 0 12,301 31 2,712 6,264 114,014 23,444 49,893 0 0 0 1 14 896 0 8,887
Plant 51,170 0 13,558 1,079 1,739 0 4,878 0 0 0 0 0 19,365 0 0 6 0 35 0 1,411 25 1,398 1,657 0 5,501 14,256 0 0 0 0 14 0 0 0
Animal 208,283 0 0 1,540 17,910 0 31,052 0 0 61 0 1,936 47,300 0 0 251 0 60 0 10,840 6 1,140 4,607 113,872 17,943 34,962 0 0 0 1 0 409 0 66
Insect 114,855 0 0 0 0 0 1,145 0 0 0 0 0 7,084 0 0 0 0 0 0 8,320 0 75 0 74,888 0 6,320 0 0 0 0 0 0 0 0
Organisation 187,731 2,686 181 320 3,461 0 1,291 752 79 28,708 2,775 724 16,541 1,067 29,753 241 0 4,956 0 1,059 16,485 8,836 6,404 14,100 16,322 13,799 0 16,340 0 800 3,424 6,306 5,123 5,751
Company 50,945 1,379 0 13 470 0 184 98 0 10,101 173 155 1,152 0 8,919 59 0 844 0 770 5,740 3,641 1,908 2,842 3,292 2,772 0 4,636 0 135 344 1,849 891 1,722
SportsTeam 25,110 786 0 51 1,144 0 0 0 0 4,542 1,187 186 5,412 536 6,110 74 0 2,641 0 84 8,303 2,130 1,346 6,355 5,370 3,709 0 4,108 0 86 1,797 2,622 2,360 1,917
EducationalInstitution 33,036 394 181 53 20 0 0 157 79 2,791 117 67 2,030 0 3,120 0 0 166 0 109 1,002 2,018 1,058 837 955 326 0 1,516 0 43 113 323 485 254
Band 31,771 0 0 100 873 0 285 0 0 6,793 348 0 0 334 5,497 108 0 1,002 0 0 0 0 1,232 2,224 4,274 4,779 0 4,983 0 14 357 0 19 0
CelestialBody 27,004 0 827 81 142 0 0 7 0 5,415 3,169 12,870 2,667 0 0 88 0 10,587 0 0 17,155 3,691 0 1,946 13,354 15,867 0 608 4,608 734 5,537 1,686 707 14,684
Event 40,219 0 0 539 607 0 0 1,503 0 5,755 352 114 6,268 0 25,179 0 9 3,759 0 765 20,513 1,533 1,690 7,441 4,165 6,964 0 3,687 0 919 1,859 6,219 2,186 2,218
SocietalEvent 32,466 0 0 539 607 0 0 1,503 0 5,483 323 114 6,268 0 25,174 0 9 3,759 0 765 20,472 1,533 1,690 7,192 4,165 6,964 0 3,687 0 919 1,859 6,141 2,186 2,218
MeanOfTransportation 33,644 1,780 0 32 167 0 0 1,230 0 7,874 681 0 5,361 0 2,177 0 0 1,268 0 469 7,081 2,123 464 3,212 4,431 3,581 0 0 0 89 1,119 757 276 520
Disease 5,378 268 0 177 122 0 0 289 0 1,367 275 0 2,387 0 0 0 0 0 0 0 1,990 401 537 1,158 1,479 1,229 0 0 0 239 49 0 320 0