Evaluation

Controlled vocabulary of concepts (FCSS)

Evaluation

Levels Of Knowledge Organization:

Hoppe in [Hopp2020], pp. 115-129 defines five Levels Of Knowledge Organization (LOKO), all of which are covered by our Controlled Vocabulary of Concepts (CVC): The lowest level 1 is the controlled vocabulary. Sets of concepts may have assignments to classes, to the set of concept definitions, and to the set of synonyms. The second LOKO level is the taxonomy level. It adds hypernym / hyponym relationships. Thesaurus level 3 also adds hierarchical relationships between concepts, which are not necessarily hypernym / hyponym relationships. The WordNet level 4 also includes lexical types such as nouns, verbs, adjectives, and adverbs of concepts, as well as SynSets, senses, and glosses. Finally, ontology level 5 adds ontological concepts such as instantiations of classes, axioms, and rules for inference. The CVC can also support automated indexing of documents [Hopp2020], p. 161ff. By passing the Semantic Distance (SD) as a parameter to SbM, the user can decide in which depth of the Concept Binary Tree of a concept he wants to find the entries in the controlled vocabulary. The query SbM('lion', 'EN',1) returns {leo (in the zodiac), lion, roar (lion)(to), sea lion} as a result set and demonstrates how disambiguation can be supported [Hopp2020], p. 172ff. ‘Lion' in German and ‘lion' in English are polysemes. Based on a given context, it should be possible in the majority of cases to find out which of the concepts is the relevant one. For example, SbM(‘animal lion', 2) returns exactly the one relevant concept ‘lion’.

Using Search by Meaning (SbM) in Apps:

The rObby App is a hyper personalized notification and search app for iOS and Android devices that also offers multilingual group chat functionality. Based on user-defined rules, it sends intra-day push alerts for exactly those news stories from press, sports, finance, culture, technology, and science that are highly relevant to the user. By using previous query results and testing queries with SbM support, the user can refine the rules to get even more relevant results. In the sciences, researchers spend a significant amount of time researching literature. The following sample scenario is based on rObby's Publication Database (RPDB) and shows how the rObby app can help the researcher to save work and time. The RPDB currently contains about 6,5 million scientific publications and its knowledge graph has more than 550 million triples with a database size of more than 260 gigabytes. Crawlers add several thousand new publications from scientific publisher sources daily as Conceptual Binary Trees (CBT). Throughout the day, the rObby rule processor checks the rules of all users to see if they match the newly added documents. If a rule is triggered, the user is notified with a push alert in the rObby App. The push alert includes a link to the source of the document. This way, the user is automatically informed about new relevant searches in his field on the same day they are published online.

SbM compared with Google Search:

Google is by far the most popular search engine in the western hemisphere. There is almost no information accessible how large the knowledge graph (KG) behind Google’s Search (GS) is and how is supports search. Though, one could expect, that the KG is also used to support semantic search. Therefore, we executed some sample queries with GS. GS delivers as result snippets (RS) links to websites (URIs) which contain all or some keywords of the query, and short excerpts of the target page, containing the text fragments which contain the keywords. The following examples discuss which of the results of GS can be regarded as results of semantic search. For that purpose, the around 10 results of the first result page are analyzed for relevant hits. RSs not containing semantically related hits, are called Non-semantic Search Results (NSSR): (1) GS('wind, keyboard, instrument') → 22,5 million RSs; Page 1: only ca. 5 hits for melodica and the rest are NSSRs; (2) GS(', large, wind, keyboard, instrument') → 19,1 million RSs; Page 1 contains only NSSRs; (3) GS('large, pipe, wind, keyboard, instrument') → 66,4 million RSs; Page 1 lists correctly ‘pipe organ’ as first result, the rest contains only NSSRs. From the examples given, there is no indication that GS, to some extent supports Deep Semantic Search (DSS).

SbM compared with ChatGPT:

ChatGPT is not a typical search engine but it can assist to retrieve semantical relevant concepts. When querying ChatGTP with “Give a list of wind keyboard instruments” it delivers as result “Organ, Harmonium, Accordion, Melodica, Claviola, Free-Reed Organ, Sho, Regal Harmonium and Harmoniflute” along with quite precise descriptions of the instruments. On the other hand, there is no indication that ChatGPT represents the concepts in a way comparable to Concept Binary Trees (CBT). So, it is not possible to control the search by the Semantic Distance (SD) of concepts to contained sub concepts for the purpose of narrowing or widening the search results.

Applying Deep Semantic Search in the EnArgus Project:

The EnArgus project was funded by the German Federal Ministry of Economics and Energy. The goal of the project is to make government funding policies in the field of energy research more transparent and to facilitate the evaluation of technological developments. The EnArgus Internet Portal (EIP), provides information on current and completed research projects in the field of energy research. The EnArgus2 ontology was originally developed in Protégé and then imported into our controlled vocabulary (CVC) and extended by concept binary trees to facilitate and support retrieval by SbM (Deep Semantic Search). The paper [ScBe2015] shows how Search by Meaning (SbM) has been used to improve the semantic search of the EIP.

ICIB Classification:

Another use case for SbM is to synthesize a nomenclature of knowledge fields based on the ICIB-1 classification by Ingetraut Dahlberg [Dahl1982]: SbM('knowledge, mathematics', 2) = {analysis (mathematics), applied mathematics, logical foundations of mathematics, mathematician, mathematics}. This again demonstrates how SbM finds more semantically related concepts with intra-concept search as opposed to SS, only using Extra Concept Relations (ECRs). It is at the same time an example for Multi-Language Support: When new concepts are entered into the CVC, names of concepts are automatically translated from English to German, French, Spanish, Italian, Dutch, Polish and Russian using the deepl-API. Therefore, it is possible to apply SbM with combinations of words in all of these languages, e.g., a German-English combination such as “Statistik science”.

Extension: deriver.app

Back to Controlled vocabulary of concepts; Deriver documentation.

Source: taoke.de — Evaluation.