Faceted Wikipedia Search
Faceted Wikipedia Search allowed users to ask complex queries, like “Which Rivers flow into the Rhine and are longer than 50 kilometers?” or “Which Skyscrapers in China have more than 50 floors and have been constructed before the year 2000?” against Wikipedia. The answers to these queries are not generated using key word matching as the answers of search engines like Google or Yahoo, but are generated based on structured information that has been extracted from many different Wikipedia articles. Faceted Wikipedia Search thus allows you to query Wikipedia like a structured database and enables you to truly exploit Wikipedia’s collective intelligence.
Unfortunately, the application cannot be offered any more. It was taken down from the public web in 2012.
DBpedia Search implements the faceted search paradigm. Faceted search, also called faceted browsing, lets users find items by restricting the overall set of items along multiple criteria (facets) – for example, the location of a place, the birth date of a person, the height of a building. While faceted browsing is a common feature of e-commerce sites and other websites that deal with structured data, it is less common on text-oriented websites, which usually offer text- or category-based search features only. By using the DBpedia framework to extract structured information from Wikipedia, it was possible to implement faceted browsing for Wikipedia as well.
The User Interface
The user interface consists of several interacting components, which are highlighted in the following screenshot and described below.
- Search Results: The names, abstracts and (if available) images of the Wikipedia entries matching the current criteria are displayed in the center of the page. If no criteria are selected yet, the entry about Wikipedia itself is displayed by default.
- Facet Selection: Facets that can be used to further restrict the current selection are shown in a list on the left of the results. As not all of the many hundreds facets extracted by DBpedia apply to all Wikipedia articles, this list changes with the current selection. By default, it displays the six most common facets of the selected item type. The list can be extended by clicking the more facets link below it. For each facet, its three most common values in the current results are shown and can be clicked to modify the selection. Click the more button to see more values or start typing in the input box to get suggestions for values.
- Item Type Selection: The item type is a special facet and thus always shown at the top of the facet list. DBpedia assigns a type to each extracted Wikipedia entry. It is usually a good idea to select an item type to start a faceted search, as the type determines which facets are relevant.
- Free Text Search: Restricts the results to those in whose name or abstracts the given key words appear.
- Selected Facets: The current criteria are displayed above the result list. Each criterion can be removed by clicking on the yellow X next to it, or the criteria list can be cleared completely by clicking on yellow X next to reset filters.
- Result Navigation: The matching results are displayed in batches of six per page. Links to the next, previous, first or last result page can be found above and below the result list. The total number of results and the position of the current batch are displayed in the top right corner of the result list.
Faceted Wikipedia Search has been jointly developed by neofonie GmbH, Berlin and the Web-based Systems Group at Freie Universität Berlin. Technically, Faceted Wikipedia Search is based on the DBpedia data extraction framework and neofonie search technology.
The DBpedia data extraction framework extracts structured data from Wikipedia, such as the content of infoboxes which summarize relevant facts as a table on the top right-hand side of Wikipedia articles. The extracted data is represented using the Resource Description Framework, a data model for web-based systems. Currently, the framework extracts around 190 million facts from the English editon of Wikipedia and 289 million facts from Wikipedia editions in 90 further languages. The DBpedia data extraction framework is developed by the Web-based Systems group at Freie Universität Berlin and the Agile Knowledge Engineering and Semantic Web group at Universität Leizpig.
The neofonie search engine, neofonie search, is employed to execute complex queries over the extracted data. neofonie search aggregates RDF data from DBpedia with full-text data from Wikipedia. The aggregated data is then divided into hierarchical facets, composed of 200 types with 2.9 million values. In addition to providing the search technology and processing power, neofonie is also responsible for the hosting of the Faceted Wikipedia/DBpedia Search on the Amazon Elastic Compute Cloud (Amazon EC2).
Land der Ideen Competition
The German federal government has proclaimed Faceted Wikipedia Search as one of the 365 most innovative ideas in Germany within the Deutschland – Land der Ideen (Germany – Land of Ideas) competition. The competition showcases innovative ideas in areas such as science and technology, business, education, art and ecology. The patron of the competition is the German President Horst Köhler. Please refer to this blog post for more details on the prize.
BIS2010 Paper about Faceted Wikipedia Search
Rasmus Hahn, Christian Bizer, Christopher Sahnwaldt, Christian Herta, Scott Robinson, Michaela Bürgle, Holger Düwiger, Ulrich Scheel: Faceted Wikipedia Search. 13th International Conference on Business Information Systems (BIS 2010), Berlin, Germany, May 2010