Modeling guidelines

Criteria for building and naming ontologies

Motivation

The analysis, modeling, representation and processing of knowledge is one of the most demanding tasks for the human intellect. In the area of data modeling, (description) logic and computer linguistics, high formal requirements are often made. In practice, one must then ask oneself whether one is accountable for the formal requirements or whether better pragmatic solutions can be found while accepting restrictions on the requirements. This treatise tries to show a way to do this, which is paved with compromises, but on the other hand shows a multitude of guidelines and recommendations for efficient knowledge processing.

It is based on the most common standards such as RDF/RDFS, OWL and SPARQL. All methods and guidelines presented can always be traced back to this standards. However, notations and methods that go far beyond these standards are also presented. They are always backward compatible to the standards. This is done for the purpose of providing better visualizations and more powerful expressiveness for modeling the ontologies.

There is an ongoing debate about when to model knowledge subjects as instances and when as classes. We will work out that as far as this is concerned, there is not enough differentiation. The concept of the particulars is used for this purpose. As we will show later, the problem is that knowledge subjects can switch between being a class and being a particular. We will also show that it is beneficial to retain the qualities of being class for as long as possible.

Preface

A Knowledge Base (KB) will be represented as Knowledge Graph (KG). Throughout this treatise like in OWL2 we will often use the term ontology for it. There are lots of definitions of Ontology in the literature. To avoid conflicts here in general the term Knowledge Graph will be used for a KB.

Referencing

Throughout the sections different types of referencing are used:

  • Sections, e.g., Description Logic
  • Sub-sections, e.g., The Artist Pablo Picasso
  • Figures, e.g., EoS, or pdf:figure:EoS:100mm or 10.0cm or 1.2 for width scaling factor
  • Tables, e.g., Acronym referenced by Block-Anker="Acronym"
  • Index entries, e.g.,
  • Bibliography, e.g., [Bens2014], UML [UML], MOF [MOF]
  • Definition, e.g., {{definition:KG:Knowledge Graph: Knowledge Graph KG = ⊆ NS x NP x NO; KGN = ⊆ NS x NP x NO x NN}}

Quotations are always placed in double quotation marks. If the quotations themselves contain double quotation marks, the sentence components highlighted in this way are alternatively set in italics. In a series of chapters, such as Wittgenstein or , many passages from the respective article or book are quoted. Then the source is only cited once at the beginning of the article, e.g. with [LuWi1921] and Building Ontologies with BFO all other quoted places use a reference like [LW 2.0121] or the page number like (Page 35 : "Citation text").

For the signatures of the references we have used an internal scheme, which combines up to four letters from the author's name(s) with the year of publication and if this is not clear, one more letter is appended, e.g.: ApBe1982a. The signatures can be found using the author search https://cms2latex.pub/en/References/Authors/index,appelrath,bense.htm. The signature can then be used to search for corresponding citation points within the website of this publication, e.g. https://www.taoke.de/kg/Search?q=ApBe1982a. You can also use full-text search to search directly for your name.

If you find yourself in the list of works cited, I kindly ask you to review the citations and send comments to the email address mentioned under Reading this Treatise Online.

American Language Use

We use American English as the preferred language instead of British English, i.e. we do not write Modelling but Modeling and Color instead of Colour. In the case of quotations, we naturally use the original spelling, even if it differs from American English.

Remarks

This treatise tries to use the most up-to-date sources of knowledge and publications possible in order to be able to make reliable statements and conclusions. Nevertheless, due to the large number of publications, it cannot be ruled out that not all available information was included. We have provided warning symbols in traffic light colors in places that we consider particularly worth mentioning:

We have marked posts that we consider particularly noteworthy in a positive sense with the green warning symbol. This includes, for example, recommendations that give rise to better, more efficient, more consistent and more compatible models.

Statements that we consider questionable or that somehow contradict others have been marked with the orange warning symbol. This includes, for example, different naming conventions or recommendations for modeling methods. The user or modeler must then decide on a case-by-case basis whether or not to adopt such points of view. Such a decision can also have pragmatic reasons, e.g. if the effort required to comply with standards or recommendations involves a disproportionate amount of effort.

We have marked statements that we believe are incorrect or bad practices or that take incomprehensible positions with the red warning symbol. An example of this is how it is claimed in relation to the Closed World Assumption that it would yield different results when using the query language SQL in database systems compared to reasoning under the Open World Assumption. Equally with the usual examples given for the use of punning, we assume that the problems are more likely to be due to incorrect or at least questionable modeling.

Sections that we consider to be particularly new and that present ideas that, to the best of our knowledge, have not yet been discussed or have only been discussed to a very limited extent, we have marked the idea icon.

Graphical Visualization

The Graphical Visualization (GV) of knowledge graphs (KG) is based on the conventions described in [Bens2014] and is implemented with graphviz/dot [dot]. To differentiate the pictorial graphs from other KGs and UML Diagrams such a GV is called OntoGraph.

Introduction

Knowledge Engineering covers aspects like conceptual modeling (CM), logics, reasoning (inferencing on knowledge) and linguistics. The aim is to built knowledge bases (KB) to store facts, rules, axioms, formulas etc. to represent human knowledge and make it accessible for retrieval and inferencing. Since the inception of the internet and the World Wide Web (WWW), an additional focus was laid on making the knowledge connectable in the so called Semantic Web (SW) to profit from reusing knowledge. The process of knowledge aquisition can also be coined as kind of knowledge mining in analogy to data mining.

Since the introduction of computers for data processing, millions and billions of programs have been written, to solve every day tasks like accouting, booking and reservation, data analysis etc. To allow for transactional access, data is kept in database systems. Though database technology has evolved over time, still the most applications are still using relational database systems as foundationally discussed bei Codd in [Codd1970] and [Codd1972a] and SQL as the query and update language. The schema behind every relational database system is a set of tables (database relations) each of them having rows (tuple, records) and columns (attributes, values). Tuples of one table can be connected to those of other tables by so called foreign keys. n-ary relations like n-to-m relations can be realized by having two or more foreign keys in a table. SQL queries are based on the relational calculus. With join-operators and aggregate functions the data needed for processing can be comfortably accessed. The external layer of the 3-level ANSI-SPARC architecture allows to create views to abstract from the conceptual layer, which is the set of tables being defined in the data dictionary (DD). Smith and Smith present in [SmSm1977a] an early systematic approach for the description of the abstraction mechanisms aggregation and generalization. In the late 90th Booch, Rumbaugh and Jacobson proposed in [BoRu1999a] the Unifified Modeling Language (UML) for object oriented analysis and design.

So — what is wrong with it?

Perhaps there is nothing wrong with handling huge amounts of data and having many, many users working on it in parallel. Unfortunately, in areas such as artificial intelligence (AI) and ontologies, the simple schema of relational database systems does not work well enough. The reason for this is that in the knowledge engineering world, the relationship between the size of the schema and the number of extensions (instances) is inverse compared to database systems. Knowledge bases and ontologies usually contain large complex schemas like class hierarchies with many axioms, rules and other types of meta-knowledge. And that's something relational database systems just weren't designed for. A major handicap of relational query languages ​​is that they do not support transitive closure. It is not possible to retrieve all subclasses of a class or all parts of a car with a SQL query. For performance reasons, most relational database schemas are not in 3rd normal form. The consequence is that many attribute values ​​in tuples must be assigned NULL values. Many scientific papers have been written to analyze the problem and provide solutions to circumvent this complication.

But why is Knowledge Engineering (KE) an art?

Obviously, the world of conceptual modeling is not as simple as the promise of relational database systems might suggest. Not only is the design of large database schemas non-trivial (the SAP data model contains thousands of tables and tens of thousands of attributes), but when it comes to schema changes due to new requirements, things get even worse. For example, the requirement to keep a salary history for an employee when previously only the last salary was stored becomes a tedious task. In the conceptual scheme, it is necessary to insert a new table in which salaries for specific periods are linked to the table of employees by a foreign key. This also affects the applications running against the database.

Where do we go from here?

In the olden days of artificial intelligence (AI), Doug Lenat was credited with being one of the key figures in saying: everything is an object. This could be a statement in the context of object-oriented programming (OOP). The view presented here is controversial and probably new to the discussion in the field of ontologies: Everything is a relation. Why is that? From a linguistic point of view, statements are made in the form of sentences. The most basic sentence pattern in western languages ​​is subject-predicate-object (SPO). Picasso was born in Malaga is an example of the minimally meaningful piece of knowledge that one can express. The words Picasso and Málaga by themselves are meaningless, they are just strings of characters. Worse still, they lend themselves to ambiguity. Because in One of the Most Famous Picassos is exhibited in the Reina Sofia Museum in Madrid, the name Picasso obviously designates one of his works of art and not the natural person Pablo Picasso. This problem has long been known and has been studied in the branch of science called semiotics since the days of Aristotle and the likes of Charles S. Peirce and Saussure. In the semiotic triangle, an attempt is made to relate symbols (words, names, images) to the concepts they represent and concrete instances of these concepts. Following this approach, one could disambiguate the concepts named by Picasso by creating symbol names like ^Picasso and >Pablo_Picasso and making statements like The word "Picasso" symbolizes the artworks of the painter Pablo Picasso with (Picasso, <>Symbolized, ^ Picasso). The sentence also makes it clear that in natural language words are often placed in apostrophe brackets to make it clear that in this case the word is meant and not one of the concepts behind Picasso.

What is an Ontology?

Ontology definitions from ontology.co ,

  1. Ontology as a philosophical discipline
  2. Ontology as a an informal conceptual system
  3. Ontology as a formal semantic account
  4. Ontology as a specification of a conceptualization
  5. Ontology as a representation of a conceptual system via a logical theory
    1. characterized by specific formal properties
    2. characterized only by its specific purposes
  6. Ontology as the vocabulary used by a logical theory
  7. Ontology as a (meta-level) specification of a logical theory

Some more formal textual definitions of ontological concepts are given in ArSm2015, page 1-2:

  • "ontology = def. a representational artifact, comprising a taxonomy as proper part, whose representations are intended to designate some combinations of universals, defined classes, and certain relations between them"
  • "taxonomy = def. a hierachy consisting of terms denoting types (or universals or classes) linked by subtype relations"
  • "entity = def. anything that exists, including objects, processes, and qualities"
  • "representation = def. an entity (for example a term, an idea, an image, a label, a description, an essay) that refers to some other entity or entities"

The that text passage the authors also provide less formal definitions for hierarchy, types and universals.

An ontology can also be defined in terms of [HeHe2006a], page 8-9 as O = {L, V, Ax}, where V is a structured vocabulary and L is a formal language in which the axioms Ax are defined. We essentially agree with this definition, but will expand the concept of vocabulary. In our view, the vocabulary consists of the knowledge-subjects, which make up the actual content of the ontology, and a controlled vocabulary, which assigns the linguistic concepts to the knowledge-subjects. The ontology definition is further enriched by a derivability relation ⊢, a semantic consequence relation ⊧ and a class Mod(V) of interpretations that serve as semantics for the language L.

OWL Ontologies have a very formal definition.

Knowledge Models

A knowledge model represents knowledge about domains. It consists of data models, behavioral models, hypotheses, axioms, rules, classes and objects.

  • Education: Learning Models and Currilae
  • Computer science: data models, knowledge bases, domain model, program flowcharts, business process model, human-machine system, human-machine interface, digital terrain model
  • Mathematics: theories and axioms, computability, model theory
  • Chemistry: Periodic Table
  • General concepts: abstraction, realism, constructivism, idealization, process of knowledge, homomorphy

Questions:

  • How to transform knowledge models? Which models are equivalent in terms of completeness and consistency?
  • Which knowledge models are equivalent in terms of the knowledge they represent?
  • How do you recognize contradictions and paradoxes (quality of knowledge models)?
  • How complete are knowledge models (quantity of knowledge models, measure of completeness)?
  • How can knowledge be gained and classified automatically (ID3|Iterative Dichomotiser,C4.5 , Nextra)?

From the difficulty of creating complex ontologies

In recent years, we have had several opportunities to discuss ontologies with leading academics and industry at academic meetings and conferences. We got the impression that the task of creating complex (enterprise) ontologies is very much appreciated.

There are now some good tools and approaches to make knowledge from a wide variety of sources (databases, documents in the file system, Internet) accessible. However, there are significant problems in verifying, homogenizing and consolidating this knowledge. In our view, the main obstacles to this are that there are neither generally accepted top ontologies nor uniform conventions for nomenclature and visualization.

To make matters worse, knowledge and know-how from computer science, philosophy, linguistics and last but not least from the application domains are required.

This inevitably leads to the failure of many ontology projects, since knowledge diffusion rather than knowledge convergence must be assumed. What is needed is a method of knowledge acquisition that starts from a solid core of knowledge (top ontology) and which is then gradually supplemented with knowledge. With each further step, the new knowledge must be checked and consolidated with the existing knowledge. It may be necessary to have several knowledge engineers vote on the veracity of the new knowledge. If you cannot come to a sufficiently reliable conclusion that the new findings are correct, you may have to remove them later.

Although the field of ontologies has been worked on more intensively for 10 to 15 years, it seems to us that it will take another 10 to 15 years before technology, conventions and procedural know-how are so advanced that one can no longer speak of knowledge norms or can speak of a normality of knowledge.

One can only hope that the field of ontologies will not suffer the same fate as that of expert systems and artificial intelligence. Because there they failed in the late 1980s to mid-1990s because the methodology, tools and, above all, the experts were not advanced enough to master the topic. Too bad that ontologies also have to include the original aspect of knowledge processing from artificial intelligence in order to meet all knowledge processing and knowledge management requirements.

What can you do?

There is no way around creating simple but robust top ontologies (Upper Ontologies) first. There are currently about half a dozen candidates, mostly in academia, with DOLCE, GFO, Schema.org and Sumo being the deepest for me. The only problem is that the concepts represented in these ontologies have little in common and different names are used for the same concepts. They also differ greatly in the level of detail and the intended areas of application.

In order to avoid term inflation, one could try to follow the approaches of Anna Wierzbicka (NSM = Natural Semantic Metalanguage) and Charles Bliss (Blisssymbolics) and base concepts on a few basic terms and then combine them. For example, the concept fish could be combined with Relational Concept Construction as ^fish = cold +^blood +^vertebrate +^water or ^MotorBike = 2 +^wheel +^motor.

For us, Wikipedia is an essential source for naming terms and topics. First, one would check whether the term is not already defined in Wikipedia, e.g. for Pablo Picasso Footnote:https://en.wikipedia.org/wiki/Pablo_Picasso. In this case, Pablo_Picasso should also be selected as a reference or identifier. If the term is not clear or is a homonym for various concepts, you can also follow Wikipedia conventions and add the area where the term is used, e.g. generalization_(disease). Wikipedia also has disambiguation pages like Disease_(disambiguation).

Smith states in [Smit2005a], page 5: "There are two central components to the formal ontology Husserl himself presented in his Third Logical Investigation: the theory of part and whole (or mereology), which has received some considerable attention in recent years, and the theory of dependence - that is to say the theory of those links between entities of different types in virtue of which entities of one type cannot, as a matter of necessity, exist without some further entity of another, different type"

On page 12 Barry Smith he discusses an alleged counter-example for the modeling of artifacts using the example of parking garages. In our opinion he did not model correctly. Of course, a parking lot (car park) is not a superclass of a parking level (parking ramp). Rather, a parking garage aggregates several parking levels, with the parking levels (ParkingRamp) being modeled as roles of parking spaces in relation to parking garages in the UML diagrams. Barry Smith, page 14, argues that it is an unrealizable ideal that ontology would consist of a single taxonomy.

  • "On the one hand, the language used here is inadequate, since ontologies are certainly to be understood more broadly than hierarchies or families of hierarchies."
  • "On the other hand, in my opinion, nothing speaks against introducing a hierarchy that subsumes ALL families of hierarchies."

Domains

Ontology Modeling is found in numerous domains. The Information Coding Classification ICC is a classification system for almost all known approximately 6500 knowledge areas. It also contains areas of knowledge that are generally not covered in literature. It thus goes beyond the framework of the well-known library classification systems such as the Regensburg Association Classification, the Dewey Decimal Classification, the Universal Decimal Classification and the Library of Congress Classification. Therefore, it can serve as a universal system for classifying literature or other information by field of knowledge.

Pragmatism versus Generalism?

Knowledge engineering is a very demanding discipline. In order to penetrate all facets of it, you need a sound knowledge of databases, software engineering, linguistics, mathematics and, last but not least, philosophy. How much of it is needed depends on the complexity of the task. In this treatise, we will comprehensively analyze the field and identify what knowledge is required. In doing so, we will work our way from simple tasks to more and more demanding ones. In each section, formal definitions, axioms, and theorems are provided alongside many examples and graphics. The many links allow you to quickly jump back and forth between sections to quickly get to the information you need, or to skip areas that are less relevant to completing the task at hand.

Subjects and Objects

The terms subject and object are used very differently in different contexts and different people. The way we will use them at the lowest level of ontological implementation in Atomic Knowledge Entities is unambiguous. Subjects s always appear on the left side of a triple (s, p, o) and are linked to objects o via a property p.

The situation is different when these terms are used in Top Level Ontologies and in Philosophy. There, for example, Objects are subject to requirements such that they represent a whole and must exist in space and time. In this respect, it is always necessary to question what is meant by entity or thing. The same is true of objects. In German, we would emotionally tend to regard an subject (Gegenstand) as a material object that can be sensed, i.e. seen, heard, smelt. felt, measured, etc. This also corresponds to the fact that in German nouns that meet the criteria of sensory perception always start with a capital letter.

So we will work out in the course of the discussion when the terms subject and object should be used, how and where.

Ontological Skeleton

Many of the concepts, methods and perspectives in this treatise can be traced back to Wittgenstein's Tractatus logico-philosophicus [LuWi1921]. That is why many concepts and applications run like a red thread through the work and an ontological skeleton develops to which one can attach everything.

Source (canonical): taoke.de — TAoKE Introduction.

This section gives guidelines for conceptual modeling and naming of concepts. Several criteria matter when you create ontologies; pattern-oriented design has a long tradition [AlIs1997], while taxonomy construction in ontologies is treated in [BaAl2022] and foundational commitments in [GuBe2021].

  • Completeness — Domain knowledge should be represented as fully as practicable.
  • Correctness — Avoid contradictions; where evidence conflicts, represent the conflict explicitly (e.g. certainty or provenance) rather than hiding it.
  • Symmetry — For base relations, include inverses where appropriate (e.g. subclass and inverse navigation) so you can traverse in both directions.
  • Avoiding redundancy — Do not assert what can be derived automatically; symmetry-driven duplication may still be acceptable if it improves navigation.
  • Usability — Support both human-friendly views (graphs, tables) and machine-oriented serializations.
  • Performance — Large ontologies should remain usable; controlled redundancy may be traded for speed.

In deriver.app, these ideas align with keeping triples and rules consistent, using modules to scope large KBs, and using visualizations (triple and rule graphs) alongside list editors.

Naming

Follow the TAoKE naming guidelines on the canonical site for prefixes, human-readable labels, and stability. This mirror links to the bibliography so you can cite sources consistently using signature keys.

Source: taoke.de — Modeling Guidelines. See also Naming guidelines.

References

  1. [AlIs1997] Christopher Alexander, Sara Ishikawa, Murray Silverstein, Max Jacobson, Ingrid F. King, Shlomo Angel, A Pattern Language. Towns, Buildings, Construction, Oxford University Press, New York , 1977, ISBN: 0195019199
  2. [BaAl2022] Jeferson O. Batistaa, João Paulo A. Almeida, Eduardo Zambona, Giancarlo Guizzardi, Ontologically Correct Taxonomies by Construction, Data & Knowledge Engineering , 2022
  3. [GuBe2021] Giancarlo Guizzardi, Alessander Botti Benevides, Claudenir M. Fonseca, Daniele Porello, João Paulo A. Almeida, Tiago Prince Sales, UFO: Unified Foundational Ontology, Applied Ontology 1-3 , 2021, https://www.researchgate.net/publication/355735118_UFO_Unified_Foundational_Ontology, last visit: 09.04.2026