IN3 Summary
InInIn — research goals and outlook
Summary
Achievement of Research Goals
Here we give a compact summary of how the research goals which have been formulated in section Research Methods and Goals were achieved. They are thematically structured according to tasks for the definition of the UNFOLD framework, goals for the technical implementation, goals for the improvement of conceptual modelling, goals for the evaluation of ontologies, goals for linguistic engineering and goals for the consolidation of methods and terminologies.
Methodological Fundamentals
- RG10: Section Mathematics brings together the mathematical definitions and axioms that represent the formal framework for our methodologies. These include the axiom for the equality of sets and finding the cardinalities of sets formed from union, intersection and difference and as power sets. Of particular importance for our methodology are the axioms on binary relations. These concern, among other things, identity, relexivity, transitivity, symmetry, inverse formation and the formation of transitive closures.
- RG11: The O4Top-Level Ontology is a minimal set of concepts with which all domain-specific ontologies can be modeled. These include
^Class, °Process, .DataProperty,◊ObjectPropertyand the resource concepts »Resourceand^DataPropertyResource. It also contains the definition patterns for Data Properties and Object Properties. In addition, it defines counters for instances, particulars and the number of upper and lower concepts as class attributes. - RG12: Section Parsimony summarizes how extreme savings potential can be used for the representation of knowledge both on the schema layer and on the particulars layer. This includes the short definitions for data properties and object properties, the use of relational concept composition (RCC), the use of partitioning classes and the use of power type absorbance in connection with saving the use of the materialization pattern.
Technical Implementations
- RG20: An lean functional and portable query language was developed with OQL, which manages with only three retrieval functions getRelation, getSubject and getObject, as well as only three update functions insert, delete and update. The implementation of all knowledge repository operations discussed in this work, including the possibility of forming transitive closures and providing inferencing functions, is based on this.
- RG21: In order to increase the speed of processing conjunctive queries, which play a major role in inferencing in particular, a memory storage system was implemented in addition to the triple store implementation with the O4Store system, which uses a separate table for each property. More complex queries can be accelerated by a factor of 60 compared to normal processing. In addition, the main memory requirement during processing is significantly lower.
- RG22: The complete history of the development of an ontology is obtained by inserting, deleting and changing individual triples. A transactional logic is defined for this in the Ontology Evolution section. For each operation on an ontology, it can first be determined whether the prerequisites for the excution are met. Either the operation is then not carried out at all, or if an error occurs during processing, the knowledge repository is reset to the state before the operation was carried out. Performing an operation can also automatically force subsequent operations to be performed. Overall, the follow-up operations belonging to an operation run in a transaction bracket, so that all follow-up operations can also be reversed if necessary. Among other things, the discovery of cycles in transitive relations or the detection of syntactic errors in triples can largely prevent the knowledge repository from being transferred into an inconsistent state.
- RG23: The basic functions for reasoning are already covered by the query language Ontology Query Language (OQL). This can then cover more complex reasing tasks such as Checking the integrity of the Knowledge Graph, detecting cycles in transitive object properties, as well as other tasks familiar from DL reasoners such as the Object Property Subsumption, the Retrieval Problem and the review of Property Fillers. Other reasoning functions covered by our technology are the determination of Inverse Object Properties and the high-performance processing of conjunctive queries (CQA). With these functions, a large part of the required reasoning functions is already covered, with the advantage that our reasoners do not have to run as a separate, autonomous process, but can be seamlessly integrated into the applications that work on the ontologies via the OQL query language.
- RG24: This thesis is not only a proof of concept, but also shows that an ontology-based publishing system (ObPS) could be developed, which can be used for productive development. The ontology on which this thesis is based currently consists of approx. 1.1 million triples. The website table contains about 100 subpages for the thesis. . Using the cms2latex.pub generation tool, around 6,100 pages of latex source are generated from this in around 10 seconds. All bibliographic information for creating the bibliography is also obtained from the ontology.
Progress for Conceptual Modeling
- RG30: With the introduction of two additional types of data properties, conceptual models can be significantly simplified. This is done, for example, by merging pairs of classes using the PowerType Absorbance method. This is only possible because data properties can now be specified as to whether they can only be instantiated in the particulars layer, or only in the schema layer, or as a third option in both layers at the same time.
- RG31: The Simplifying the modeling of Object Properties help the modeler to find names for object properties quickly and reliably. The recommendation to derive these from nouns and not to use verbs makes it possible to automatically derive the name of the inverse object property for each designation of an object property. In addition, the direction in which the object property is defined is decisive for the choice of name. The recommendation here is to make the definition from the aggregating concept to the aggregated concept, e.g.
(^Vehicle,◊Motor, ^Motor). Then the name of the object property results from the aggregated concept by simply replacing the^character with ◊(^Motor →◊Motor =◊hasMotor). This results in the name of the inverse object property by appending the suffixOf, i.e. ◊Motor-1 =◊MotorOf =◊isMotorOf. - RG32: The topic Labeled Property Graphs has received a lot of attention in recent years in the area of graph database systems such as neo4J. In the field of formal ontologies, however, it has hardly been discussed so far. We have closed this gap here by allowing the connection between knowledge subjects to be attributed via object properties in the same way as the knowledge subjects themselves. In addition, the connection subjects can also be connected to other connection subjects in order to map state of affairs of any complexity.This methodology also supports modeling and inferencing over meta-knowledge.
- RG33: In addition to the regular classes of ontology modeling, Partitioning Classes were formally introduced as a supplementary modeling concept. They make it possible to model characteristics not only as internal properties of knowledge subjects, but also additionally or alternatively as instances of classes with only one attribute value, e.g.
(^Bird,◊is, ^.Legs-2)or(^Orange,◊is, ^.orange)or(>NPS-Elizabeth_Taylor,◊iof, ^.female). We have shown that the combination of partitioning classes can result in significant savings in the number of triples while also resulting in faster query processing. In addition, it is possible to introduce meta-attributes in the partitioning classes, which e.g. log the number of members of the partitioning class. Additional runtime gains can also be achieved in this way. - RG34: To date, there has been no formal methodology to define which knowledge subjects are involved in n-ary relators. This is formally defined by the Cascaded Role Sets that we introduced. The special object properties ◊
andRole,◊orRole,◊xorRoleand the empty object »Nilwere introduced for this purpose. When storing new n-ary knowledge subjects, it can be checked whether they syntactically meet the structural requirements. The example for Marriage shows that at least the husband and wife and the witnesses must be involved in a marriage (◊andRole), but that the location is optional (◊orRole). The marriage example also shows that the Cascaded Roles Sets can be assigned to processes and sub-processes via the Object Property ◊Result. - RG35: Reification is a long-established modeling principle in ontology engineering. It allows facts modeled as knowledge atoms to be referenced by other knowledge subjects. As a result, these facts are no longer regarded as certain or assigned facts. What is new are the Functions for the transformation of knowledge atoms into reified knowledge subjects introduced by us and, conversely, the return of reified knowledge subjects into knowledge atoms. We also use reification in linguistic engineering, where concepts of natural language are assembled into other concepts by Compositon. An example of this is the Genus-Differentiae Pattern.
- RG36: The generalized modeling patterns for composing conceptual concepts are introduced formally as Concept Binary Trees and as Concept Ternary Trees. They form the basis for the recursive definition of any deeply nested terms. The sub-terms that occur at any level of a definition tree can then be used to implement the Search by Meaning (SbM) search method. The relevance of the search can be limited by limiting the distance between the sub-terms and the root term of the tree. The evaluation of the SbM method has shown that significantly more relevant search results are delivered than with semantic searches that only use hypernym, hyponym and synonym relationships.
- RG37 : With the method of Relational Concept Composition (RCC) it is possible to assemble concepts with arithmetic operators. Compound concepts that would otherwise require two triples can now be expressed in just one triple, e.g.
(^Day, *7, ^Week) or (^Week, :7, ^Day)or(^Bird, *^Water, ^WaterBird), (^WaterBird, :^Water, ^Bird)instead of(^WaterBird, ◊Gen, ^Bird)and(^WaterBird, ◊Dif, ^Water). The RCGDAx axiom shows how the two models can be transferred into one another in the case of genus differentiae definitions. - RG38: One of the biggest sources of errors in conceptual modeling is the undifferentiated use of the word is. We have shown that disambiguation in this regard can be achieved through our naming conventions. The word is alone in the object property ◊is is equivalent to ◊subClassOf and always designates the subclass relationship between two classes. The instantiation of particulars is always expressed with ◊iof, which is the abbreviation of ◊isInstanceOf. Furthermore, we have shown that the word is as used in Rome is large cannot simply be taken from natural language without reflection.
- RG39: The formal capture of object property instantiations is highly demanding, since they are based on object property definitions affecting classes at arbitrary levels of two parallel class hierarchies. The inheritance or hierarchy of the respective object properties must also be taken into account. In the Object Property Instantiation Pattern section we have defined general patterns for the instantiation of object properties. The two fundamentally different types of object property instantiation are, on the one hand, the object properties that connect particulars to particulars (PP) and, on the other hand, those that connect classes and particulars (CP). For each of the two types, a distinction must also be made as to whether it is a regular object property OP (aggregator) or an object property from the set of inverse object properties of OP (aggregatee). The functions instPP, instCP, instInvPP and instInvCP, which cover all types of object property instantiations, could then be defined on this basis.
Advanced Evaluation of Ontologies
- RG40: All of the methods we introduce for analyzing knowledge subjects for similarity or identity are based on the Features that we formally introduce. Compared to other possible uses of the term feature, we always use features to denote property-value pairs. On this basis, two fundamentally different similarity measures can then be defined, namely that for the similarity of abstract concepts on the Schema Layer and that for the similarity of concrete knowledge subjects on the Particulars Layer.
- RG41: The definition of quality dimensions is the prerequisite for the arrangement of knowledge subjects in n-dimensional conceptual spaces. On this basis, the absolute and relative similarity measures ksd(u,w) and ksdr(u,w) are then defined by us to determine the distances from concrete knowledge subjects (particulars). In the Detecting Concept Similarities section, we show how people's growth over time can be used to compare individual people.
- RG42: The similarity of abstract knowledge subjects of the schema layer is determined using the similarity measure Concept Similarity. For this purpose, only the property definitions of the term are used, i.e. which characteristics make up a concept, but NOT the values of the characteristics as with the quality dimensions. The graphic below the Feature Similarity Space then shows the percentage of similarity between two abstract concepts, with the X-axis indicating the number of features defined for this concept. Here, for example, Bird (BD) and GoldenEagle (GE) have a similarity of 76.4%. See Prototypes for an explanation of how this can be used to find prototypes of concepts. For bear (BR), brown bear is more representative than panda bear (PDB), panda bear more representative than polar bear (PLB).
- RG43: The Knowledge Subject Equivalence axiom defines when two knowledge subjects are identical. The equivalence axiom can be used to detect possible inconsistencies in the knowledge repository. Knowledge subjects with different identifiers should usually also be different. The Knowledge Subject Equivalence Axiom we defined is a generalization of the Identity Criterion. The latter merely detects the identity with regard to a specific property P.
Advances in Linguistic Engineering
- RG50: To date, there is no unifying approach to reconcile the concepts of ontological engineering with those of linguistic engineering. With the creation of the Controlled Vocabulary we provide a fundamental solution. Textual descriptions as well as formal descriptions of concepts are required for both areas. We define formal descriptions using the concept binary trees already mentioned above.
- RG51: Semantic search is based on exploiting the knowledge of semantic relationships between terms. The deep semantic search (DSS) we introduced also uses the composition of terms from sub-terms. It is based on the entries of the Controlled Vocabulary. As our analyzes show, this not only achieves a significant improvement in search queries in terms of recall and precision, but the DSS is also language-independent, i.e. queries can be executed with terms from any language.
- RG52: Princeton WordNet (PWN) uses unique identifiers for synsets. Synsets group words with synonomous meanings.With our Concept Numbering System we go far beyond that. Not only do we clearly assign an integer number and an alphanumeric identifier derived from it to each concept. Rather, with the numbers we assign based on Cantor's pairing function (CPF), it is even possible to encode all partial concepts of a concept in one integer. Due to the bijectivity of the CPF, not only all partial terms can be recovered, but also the complete binary tree that defines this term.
Consolidating Methods and Terminology
- RG60: There is a big debate in multilevel modeling (MLM) about the assignment of concepts to levels of abstraction. To this end, new concepts such as Regularity Attributes and additional meta-annotations for concepts and their properties are frequently introduced. With the introduction of the three facets of property definitions and their instantiation, we have shown that all these methods, some of which are quite complex and incompatible with standard modeling, can then be dispensed with. In particular, we have shown with the Triple Facets Theorem that when using our methodology there are only exactly the two abstraction levels schema layer and particulars layer. As a result, this leads to much simpler and more transparent conceptual models.
- RG61: For ontology engineering, there are a number of criteria that an ontology should expose, such as analysability, leanness (absence of redundancy), expressiveness, performance, completeness, symmetry, reliability, modification stability, recoverability, maintainability and stability. For all these criteria we have defined methods of checking that can be used to optimize the ontologies. Once the deficits have been identified, the recommendations we have developed for optimizing the ontologies can then be applied. These include, among other things, the naming recommendations, the use of partitioning classes, the application of the three facet definitions of properties.
- RG62: One area that is often discussed among ontologists is the juxtaposition of the Open World Assumption (OWA) and the CLosed World Assumption (CWA). We have shown that even among experts, the assessment of the OWA in relation to its use in database systems is simply wrong. In addition, there is the question of what practical relevance the discussion entails. We often come across the discussion of the CWA in connection with the computability of inferencing. Here we have shown that the state of the art for reasoners falls far short of the demands and needs for practical use. However, we have also shown that it is possible to optimize certain reasoning tasks in terms of time in such a way that they can be executed directly in operations on the knowledge repository with acceptable response times.
Ontology Development Cycle
The most important individual steps with the respective quality assurance measures for concept modeling are listed here:
- Inserting a data property definition in a class. It must be checked whether the data property meets the requirements for instantiation.
- Injecting a Data Property Instance into an instance of a class. The syntactic correctness should be checked using the Data Property Pattern.
- Inserting an Object Property Definition between two classes. Crucial to correctness is getting the direction from the Aggregator class to the Aggregatee class right. The modeling should comply with the Object Property Modeling Guideline.
- Inserting an Object Property Instance between two particulars or a class and a particular. The knowledge subjects to be connected should already exist in the ontology. The syntactic correctness should be checked using the Object Property Patterns.
- Inserting/changing a subclass relationship: After that, check whether the class hierarchies are still cycle-free.
- Deleting a subclass relationship: After that, check whether the class hierarchies are still connected.
- Insertion of a particular: then the consistency check has to be carried out to determine whether the particular has been assigned to an instantiating class.
- Insertion of a relator based on a Cascade Role Set (CRS) definition. All logical conditions specified by the CRS must have been met. Otherwise an error message will appear or the insertion will be undone.
Ontology development and operation is usually a continuous process. First of all, it must be determined for the project whether a top-level ontology such as BFO, GFO, DOLCE, OntoUML, schema.org etc. should be used. The advantage of this is that a large number of basic concepts or application classes can already be accessed, for which axioms with associated semantics can usually also be used. The downside is that none of the top ontologies can offer all the basic concepts that might be needed. Once you have got involved with one of the top ontologies, it is not possible to switch to another top ontology later with justifiable effort.
For the first phase of ontology development, the following recommendations for modeling steps can be given, whereby the order, unless otherwise stated, is irrelevant.
- Analysis of concrete objects/particulars provides the lists of features that need to be modeled.
- Each data property is assigned to a class and is referred to as an internal attribute as a data property.
- All data properties of similar knowledge subjects are to be modeled in classes that are in a class hierarchy.
- The more items have the data property, the higher the class will be in the class hierarchy.
- This provides the basis for connecting the classes together in the right order via the subClassOf object property.
- The second group of properties are the relationship types between the knowledge subjects, which we have called object properties.
- The object property can then be defined on class levels for two particulars
>P1and>P2, which are connected via an object property<>OPwith(>P1,◊OP, >P2). If^C1and^C2are the instantiating classes with(>P1,◊iof, ^C1)and(>P2,◊iof, ^C2)and^C1is the aggregating class of^C2, then this results in the Object property definition(^C1,◊OP, ^C2)and the inverse definition(^C2,◊OPOf, ^C1). The aggregating call is coined Aggregator{{index: Aggregator} and the aggregated class is coined Aggregatee{{index: Aggregatee}. The relation between Aggregator and Aggregatee can be seen in analogy to the relation between Subsumer and Subsumee in a subclass relation.
The modeling of more complex knowledge subjects by means of reification or relators is then a sequential execution of several individual modeling steps. Rules, state of affairs and modeling patterns can then also be expressed.
Ontology Operation Cycle
During the operation of an ontology, the requirements can change. At the beginning of the development and also possibly later when the ontologies are small, the performance requirement probably does not play a major role. During operation, it may turn out that in order to meet the performance requirements, it may be necessary to increase redundancy or to use additional caching mechanisms such as O4Store for query acceleration. The storage requirements are less critical, since the costs for storage space in cloud systems play a subordinate role in relation to the CPU costs.
Final Considerations
What are the essential requirements for the successful implementation of ontology engineering projects? Once the project requirements have been clarified, the feasibility must first be checked. Ontologies should then be developed in such a way that they are as error-free as possible, syntactically and semantically correct, interoperable, modular, reusable, extensible and maintainable. Then nothing stands in the way of a successful application. This experience is the basis for a large number of large, complex ontology projects that we have successfully carried out in recent years in the fields of energy research, chat bots, natural language text generation and semantic search.
Conclusion
The UNFOLD Framework developed in this work and the ontology modeling methods based on it form the basis for ontology and linguistic development frameworks, which allow to develop leaner, more expressive and more correct ontologies. The triple-facet axiom for instantiation, which unifies the application of inheritance and inherence, provides the decisive methodology for this.
Linguistics Future Work
In a small team of researchers we are currently implementing a Semi-Automatic Concept Acquisition Pipeline (SACAP). Natural language texts are automatically translated to English as a the reference language. The translated texts are parsed for complex phrases which are candidates for lexical concepts. From the stemmed and lemmatized candidates those are excluded which already exist in the CoVoC. A human editor authorizes from the remaining candidates those, which are meaningful for the inclusion into CoVoC. As the CoVoC is updated several times per day, it allows large communities of researchers and scientists to contribute to and profit from the CoVoC. It is also planned to provide a Rest-API for the querying of CoVoC with SbM.
Extension: deriver.app
Related: UNFOLD Framework, IN3 Evaluation, 3A-LLM. Deriver documentation.
Source: taoke.de — IN3 — Summary.