UNFOLD Framework

IN3 — Implementation

\newpage

UNFOLD Framework

UNFOLD Overview

This section provides a complete summary of the concepts, methods, and guidelines that make up the UNFOLD Framework. The selection and implementation is carried out in such a way that a minimal set of definitions is sufficient, which can be mapped with the methods otherwise commonly used in RDF/RDFS, OWL and SPARQL. The UNFOLD framework can thus be used as a template for the implementation of lean and high-performance systems for ontology engineering and for linguistic engineering.

All technical components described in this thesis were developed by the bense.com publishing company since the early 2000 years. The O4*UNFOLD system architecture is depicted in figure o4unfold-architecture. The components include the ontology editor O4-Builder, the UNFOLD Database Server providing the query language OQL, the O4*Inference-Engine and a RestAPI. There are also import and export tools for OWL files. The conventions and notations have been continuously improved over the entire period. The technology has been and is being used by many projects, especially those working with very large ontologies and models.

pdf:o4unfold-architecture:1.0
Fig. o4architecture: O4*UNFOLD System Architecutre
In contrast to other Ontology Engineering Frameworks, the UNFOLD Framework not only provides technical components but also notations, guidelines and recommendations, design patterns and axioms as a holistic approach. Care was taken to ensure that there is a high degree of interoperability with other ontologies. Therefore there are import and export interfaces for OWL files and a RestAPI, which enables the development of scalable applications against a lean programming interface.

UNFOLD Notations

Fundamental to the power and efficiency of the Unfold Framework are the Naming Conventions used for ontological and linguistic concepts. The visualization of the Knowledge Graphs in the form of Ontographs builds on this. The various concepts are immediately recognizable through different colors and shapes and thus provide essential support in correcting faulty modelling. Many rules, axioms and formulas are based on Logic Notations. We have modeled it in such a way that it is largely free of a specific syntax. Logical expressions can be converted into any other syntactical formats.

UNFOLD Knowledge Graph

The technical data model on which the framework is based essentially corresponds to the storage model of triple stores. The sum of all triples forms the Knowledge Repository. However, we have extended the internal memory model in such a way that a range of meta information about the triples is also annotated, such as the user ID of the creator and the last updater as well as information about when the changes are made. However, the most important extensions relate to the possibilities of being able to store the name KB of a  subset of a knowledge graph in addition to each triple and the possibility of being able to reference each triple via a unique ID. We call the triples (s, p, o) of the knowledge graph knowledge atoms. We define triples with the same subject s as Knowledge Subjects (KS), triples with the same object O as Knowledge Objects (KO) or as the context of o. The Reification of triples then results in so-called reified knowledge Atoms (RKA). The latter make it possible to model knowledge about knowledge in a simple way.

UNFOLD Conceptual Modeling

The internal characteristics of knowledge subjects are modeled via Data Properties. A data property has exactly one Atomic Data Type to which a Set of Values belong.The external properties of knowledge subjects are defined via Object Properties. These connect knowledge subjects with other knowledge subjects.We define Features as property-value pairs (p, v). If the feature property p is a data property, then the value v must come from the value set VS(p). If the feature property p is an object property, then v must be from the range of p. Features form the basis for introducing measures to calculate absolute and relative similarities between knowledge subjects. Several quality dimensions specified by value sets of data properties define a conceptual space. The basis for calculating the similarities of knowledge objects is the distance between the knowledge subjects within the conceptual space. Ontological knowledge subjects are instantiated based on Classes, which form a class hierarchy. Linguistic knowledge objects and Word Sense Definitions are modeled with Concept Binary Trees. Analogously to the class hierarchies in ontological concepts, the sub-concepts contained in linguistic concepts are also managed in hierarchical structures, which are the basis for the efficient implementation of the search-by-meaning search method.

UNFOLD Patterns

The Reification Pattern is of fundamental importance for ontology modeling. It allows modeling knowledge about knowledge, transforming knowledge atoms into reified knowledge atoms. The Genus-Differentiae Pattern is based on the reification pattern and is the most important pattern for bringing together pairs of ontological and linguistic concepts for classification and taxonomy construction. Ontological concepts are thereby arranged in a class hierarchy withe the ◊is Object Property, linguistic concepts are also modeled in a tree structure as Concept Binary Trees with the ◊Gen and ◊Dif  Object Properties.The Materialization Pattern and the Powertype Pattern go hand in hand. Powertype pairs like (Car, Car_Model) are instantiated using the materialization pattern. With the Powertype absorbance method developed by us, both patterns can be reshaped in such a way that significantly slimmer models result.Cascaded Role Sets (CRS) are a structuring methodology that enables the hierarchically nested definition of object properties using logical links such as and, or and xor. This is the first time that the prerequisite has been created for the formal application of complex relators for modeling arbitrarily complex issues.

UNFOLD Functions

The Reification Function converts a knowledge atom into a reified knowledge atom, the De-Reification Function converts a reified knowledge atom back into a knowledge atom. In the first case, knowledge can be converted into a referenceable form. With the de-reification function, the knowledge can be reconverted in such a way that it can be used for checking for truth or falsity. The function Search by Meaning can be called in order to obtain an extended set of search terms for given search term combinations, which is contained as partial meanings in the set of search terms. This allows searches to be carried out in search engines or on controlled vocabularies with significantly better recall and precision. To date, there are no mature comprehensive systems for the unambiguous identification and numbering of linguistic and ontological concepts. On the basis of the controlled vocabularies we developed, we developed an Concept Numbering system, which enables the standardization of both the identification and the numbering of terms.With features like cycle detection in graphs and Powertype Detection we enable the checking of the Knowledge Graph Integrity.

UNFOLD Storage System

The UNFOLD storage system was implemented as a triple store using mySQL. The database schema contains only a single relation, which contains additional metadata about the authors and storage locations for each triple. Optionally, an additional O4Store can be used for the performance optimization of an UNFOLD knowledge repository.

UNFOLD Publishing System

The UNFOLD Publishing System is an ontology-based publication system that manages both the content for websites and the ontologies for generating scientific publications in a single-source process. The websites are edited using the content management system cms2web. Management of the ontologies via the ontology editor O4Builder. The latex source for the created subdirectories of the website can be generated at any time by calling up a URL. Author collectives can therefore work simultaneously on publications with web-based tools and generate PDF documents that meet the requirements of publications at any time.

UNFOLD Query Language

With OQL we have developed a functional, variable-free query language that gets by with a minimal set of only three retrieval functions, namely getObjects:gO(S, P, R), getSubjects:gS(P, O , R) and getRelations:gR(S, P, O, R). Far beyond the possibilities of SQL, Transitive closures can also be computed with OQL, which are essential for inferencing and the resolution of hierarchical structures and for traversing knowledge graphs. A minimal set of update functions is also provided for the knowledge manipulations of the knowledge graphs, namely insert(s,p,o), delete(s,p,o) and update(s1, p1,o1,s2,p2,o2). The Atomic Functions on Knowledge Graphs section described how these functions can be applied in a way that preserves the integrity of the knowledge repository.

UNFOLD Inferencing

The automatic derivation and entailment of knowledge is a key feature of ontology engineering. This can be done implicitly by applying the axioms already presented, or it can be triggered explicitly by applying rules. In most cases it is about the application of functions that are purely syntactic in nature. That is, the inference engine does not need to understand the meaning of the terms. It can be different with Truth Making. Here we show how authors can decide on the truth content of statements and the result is stored as an assessment of the facts.

UNFOLD Axioms

The Set Cardinalities Axioms form the mathematical basis for determining the cardinalities of sets that are formed from the set operations union, intersection and difference. Based on them the Superfeatures and Subfeatures Axioms compare two feature sets FS(X) and FS(Y) and then then similarity measures of feature sets can then be defined.

The Relation Property Axioms apply to any binary mathematical relation in general. In ontology engineering, they are mainly used when using object properties. For example, with a transitive object property ◊op like ◊is or ◊PartOf, it can be checked whether the integrity of the knowledge repository is violated because the knowledge repository with respect to ◊op contains cycles.

The Inverse Property Axiom (IPAx) states that every knowledge-atom has a dual inverse knowledge-atom. If you know one, you can derive the other. This forms one basis for inferencing and entailment. 

The ValueSet Subsumption Axiom (IPAx) states that if the data property b is a sub-property of data property a, then the value set of b is also a subset of the value set of a, i.e. VS(b) ⊆ VS (a). If the axiom IPAx is violated, then this is an indication that the set of values VS(a) may have to include values contained in VS(b).

For two classes C1 and C2, it can be derived from the Particulars Subsumption Axiom (PSAx) which subset relationship must exist between their sets of their particulars sop(C1) and sop(C2). The PSAx axiom can also be used to verify the integrity of the knowledge repository.

If it is found with Knowledge Subject Equivalence Axiom (KSEAx) that the feature sets of two knowledge subjects with different names x and y are identical, then this gives an indication that either the same name should be used or that the knowledge subjects should be further differentiated.

For every ontological concept (OC) there should be a linguistic concept (LC), which correctly and unambiguously describes the ontological concept. The Concept Congruence between both concept types is defined by the Lingustic and Ontological Concept Congruence Axiom (LOCCAx). Ideally, the LOCCAx axiom should always be satisfied. Otherwise, this gives an indication that either the OC or the LC should be supplemented accordingly.

UNFOLD Theorems

The Subsumption Theorem states that subsumption is only allowed ontological concepts of the same type. E.g. data properties cannot subsume classes and classes can subsume processes etc. Checking for compliance with the Subsumption Theorem in this form is only possible if naming conventions of the kind introduced by us are used.

The Triple Facets Theorem is only applicable in ontology engineering if the instantiation methods introduced by us are used for the three different data properties. Without the three facets, it is not possible to take advantage of the optimization methods we have introduced for ontology slimming.

The three facet approach is also a prerequisite for the Abstraction Layer Theorem (ALT). This is the only way to avoid the many problems regarding layer mistakes. The essential statement behind the ALT is that in ontological engineering, the view of the organization of concepts in abstraction levels can be significantly simplified, because with our approach only two ontological abstraction levels are required in the end.

UNFOLD Guidelines

In order to be able to use the advantages of the methods we have developed, it is necessary to apply guidelines that have arisen in the course of the development of our methodology or that have already been introduced elsewhere in ontology engineering.

The Partitioning Class Naming Guidelines significantly extend the differentiated application of class hierarchies. They make it clear when conventional classes are used, which can have any number of data properties and object properties, and when partitioning classes are used, which are usually based on exactly one data property.

Ontology Evaluation Guidelines generally take into account desirable properties of ontologies such as cohesion, freedom from redundancy, reliability, recoverability, maintainability, stability, and analysability. For each of the criteria, we discussed scenarios and presented possible solutions.

The definition of concepts of linguistic engineering are based on the Language Design Guidelines introduced by us. We have shown how formal definitions of terms in the form of Word Sense Definitions can arise from terms defined in natural language (Textual Definitions), which in turn can serve as the basis for a global Controlled Vocabulary.

Extension: deriver.app

Related in this mirror: IN3 Evaluation, IN3 Summary, 3A-LLM, Preliminaries — concept numbering. Deriver documentation (OQL, rules, workbench) complements the UNFOLD tooling description on taoke.de.

Source: taoke.de — IN3 — UNFOLD.