Evaluation

Partitioning Classes (PTCL)

Evaluation

In this section, we examine partitioning classes in terms of storage space savings and query performance optimization.

Storage Efficiency:

Modeling with partitioning classes (PC) classes has a great potential for saving storage space. This is especially true for features that are rigid/essential when modeling the classes. In the case of discrete, non-essential characteristics such as the .BirthDate, modeling using PCs is just as useful and effective as traditional modeling. For example, instead of (>NPS-Carl_Linnaeus, .DeathDate, 1778-01-10) one could model (>NPS-Carl_Linnaeus, »pof, ^.Person_by_Deathdate-1778-01-10). Again, the same number of triples is required, namely 1, and the access speed is identical. The corresponding queries to find all persons who died on January 10, 1778 are gS(.DeathDate, 1778-01-10) and gS(»pof, ^.Person_by_Deathdate-1778-01-10).

Fig. pc3: Body of Water (BoW) PC Hierarchy

This also includes the 10 or so essential properties from the for Body of Water (BoW) example. Every particular of the class ^BoW exhibits these properties. So, instead of (>RIV-Rhine, .NatureType, natural) we can store the same information as (>RIV-Rhine, »is, ^.natural). The Ontograph in Figure pc3 shows, that the four classes ^BoW-Stream, ^BoW-Wadi, ^BoW-Creek and ^BoW-TidalCreek inherit the values for the properties .∆FloatingType, .∆LocationType and .∆NatureType from the superclasses of ^BoW_natural_aboveground_floating. The consequence is that the values ‘floating’, ‘aboveground’ and ‘natural’ do not need to be explicitly stored in the four classes, nor in any particular thereof. In terms of the particulars this can be achieved by assertions such as (>RIV-Rhine, »pof, ^BoW-Stream), or (>RIV-Amazon, »pof, ^BoW-Stream). Compared to the conventional modeling, only one triple is needed instead of three to entail the values ‘floating’, ‘aboveground’ and ‘natural’ for any particular one of the four classes. If n is the number of properties combined within a concept (n=3 for ^BoW_natural_aboveground_floating) then n-1 triples can be saved for every class and every particular thereof. Assuming that there are about 50,000 streams in Germany, it is now possible to save 50,000 * (3 - 1) = 100,000 triples. In the case of non-essential properties such as .flow_velocity, .length, .width etc., modeling with partitioning classes makes less sense, especially when it comes to value scales involving real numbers such as (.flow _velocity: 2,367 m/sec).

Unlike traditional class hierarchies, the depth of PCs is constant. For single attributes, the depth is two, because there is only one more level below the root partitioning class. As already shown in section the Evaluation section using the example of the Body of Water (BoW) ontology, it is also possible to model hierarchies of PCs with combinations of data properties. In the future we want to systematically investigate which combinations of attributes are suitable for the implementation of PC hierarchies. Currently we have evidence that this supports the identification of prototypes according to [Rosc1978]. For example, the combination of the features ‘aboveground’, ‘natural’ and ‘floating’ in a PC class hierarchy is very characteristic of typical BoWs such as ‘rivers’ and ‘streams’.

Conjunctive Query Performance:

Suchanek et al. present in [SuKa2007] with YAGO “a large ontology with high coverage and precision. ... YAGO is based on a clean logical model with a decidable consistency”. The CQA performance evaluation was performed on an ontology with a subset of 4 million triples from YAGO on a dedicated database server containing with the following characteristics. Operating System: Ubuntu 18.04.1 LTS; processors: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz, 6 cores, 12 threads; RAM: 64GB; hard disks: 2 x 6 TB HDD in RAID 1; MySQL-version: mysql-server-5.7; transmission rate to clients: up to 1 Gb/s. The database contains approximately 1.2 million personal records with properties such as GivenName (613k, 63k), FamilyName (1213k, 209k), isMarriedTo (7.7k), ChildOf (6.4k, 4.6k), BornOnDate (500k, 55k), BornIn (24k, 2.5k), DiedOnDate (219k, 36k), DiedIn (10k, 1.1k) and Gender (587k) where the numbers in parentheses indicate the approximate total number of triples for that attribute and the number of different values for that attribute. Before we perform a detailed benchmark in future work, we want to use an example to determine the magnitude of the performance improvement that can be achieved with our method. The following example evaluates the query “find all women born in London in or after 1980”. To do this, we use the three OQL selects S1 = gS (.BornIn, London), S2 = gS (.BornOnDate, ≥, 1980), and S3 = gS (.Gender, female). The query S0 = S1 ∩ S2 ∩ S3, executed with three selects and two intersections, took 6.66 to execute. For the comparison we inserted a triple (>P, »pof, ^.female_born-in-London_in-or-after-1980) into the KG for every >P ∈ S0. The execution time for S0 = gS (»pof, ^.female_born-in-London_in-or-after-1980) was 0.098 seconds, which is a factor of 6.66 / 0.098 = 68.0 or nearly two orders of magnitude faster than the conjunctive query. The significant performance improvement was expected, since the result of the query is already precomputed and only requires one select on the result set. Regardless of the size of the result set, this can be expected to always be achieved within a constant time of about 0.1 seconds. Therefore, if reasoning or CQA indicates that the standard procedures cause timeouts or unacceptable runtimes, the use of our methodology should be considered. For each entity, only one more triple needs to be added, which can be easily done in parallel with the other updates to dataset. Even if the precompilation is done from scratch, a triple insertion time of ≤ 1 msec [Bens2017] means that no more than 100k / 1000 = 100 seconds = 1.67 minutes can be expected per 100,000 entries (> 540 million per day).

Extension: deriver.app

Back to Introduction; Deriver documentation.

Source: taoke.de — Evaluation.