Atomic concept representation

3A-LLM — An Alternative Axiomatic Algebraic LLM

The architecture of the Axiomatic Large Language Model is grounded in a layer of atomic concepts derived from the Longman Defining Vocabulary (LDV). This section introduces the theoretical and methodological rationale behind selecting a restricted definitional vocabulary, explains how atomic concepts function as semantic primitives within the model, and demonstrates how these primitives serve as generative base from which other conceptual representations emerge.

The LDV has been recognized for its capacity to offer definitional clarity through a minimal yet expressive set of lexical items [OgdenRichards1923]. The Longman Dictionary of Contemporary English (LDOCE) [Fox2014], which uses the LDV for all definitions, contains about 230,000 words and 165,000 corpus-based examples that use only LDV terms; the print edition lists 65,000 common collocations [Fox2014], the online version more than 82,000, and LDOCE provides 48,000 synonyms and antonyms. Its utility extends beyond dictionary construction as computational lexicon research has shown: controlled defining vocabularies can provide a stable and conceptually coherent foundation for large-scale semantic modeling [Bloomfield1933], [Fellbaum1998]. In A-LLM, the LDV is reinterpreted as representing a set of conceptual primitives rather than lexical items. This shift from lexical to conceptual minimalism aligns with theoretical work in psycholinguistics [Roelofs2018] and cognitive semantics [Gardenfors2000]. In A-LLM, LDV items are treated as atomic conceptual nodes. These nodes form the backbone of the conceptual graph and serve as the input to the transformation calculus by which the A-LLM is expanded as described in later sections. In principle, the ca. 2000 LDV entries can cover a wide range of domains via definitional composition; domain coverage is expanded by the transformation calculus and compounding.

A defining characteristic of the atomic concept layer is its interpretive transparency. Unlike distributional semantic vectors whose dimensions are inaccessible and whose meaning emerges implicitly through corpus statistics [Lenci2018], atomic concepts in A-LLM are explicit, stable, and grounded in definitional relations. This property enables the model not only to avoid the interpretability challenges associated with neural embeddings [Devlin2019] but to maintain complete semantic traceability. Each A-LLM concept representation is the result of a sequence of semantic operations applied to atomic primitives, a sequence that can be derived from the resulting concept. This reversibility, ensuring the model‘s structural and epistemic transparency, is ensured by the assignment of specific identifiers to the modeled concepts, an assignment that exploits Cantor’s Pairing function [HopcroftUllman1979] (Section ?). The respective algorithms are described in [Bense2024].

FoCbaseoppositeorthogonalopposite orthogonal
n (noun)largenesssmallnesslongnessshortness
v (verb)enlarge (to)reduce (to)elongate (to)shorten (to)
a (adjective)biglittlelongshort
Family of Concepts for largeness

Table ? illustrates a Family of Concepts (FoC) for the concept largeness, showing how vertical transformations (noun, verb, adjective) and horizontal transformations (opposite, orthogonal, opposite-orthogonal) generate related concepts. Structurally, the atomic concept layer forms the root level of the semantic graph. Nodes at this level exhibit minimal relational structure apart from definitional equivalence and basic semantic relations. Higher levels of the graph emerge through vertical and horizontal transformations that operate directly on atomic nodes. Vertical transformations provide anchor points for docking lexical entries of given languages to the model. Horizontal transformations establish semantic relations such as opposition, orthogonality, complementarity, and temporal sequencing, drawing on theoretical insights from lexical semantics [Cruse1986] and semantic field theory [Lehrer1990]. The combination of these operations yields Families of Concepts (FoC) which represent structured clusters of derived concepts that share a common root. Vertical and horizontal transformations are discussed in more detail in the following section.

Extension: deriver.app

This chapter consolidates material from the allm LaTeX sources (main40.tex, main50.tex, main97.tex). In Deriver documentation, triples, rules, and the Workbench align with the explicit conceptual structure described here.

Source text: parallel project allm/ (LaTeX); HTML generated via taoke/tools/build-3allm-from-tex.php.