Introduction

3A-LLM — An Alternative Axiomatic Algebraic LLM

Today's term “large language model” is connected to language processing AI systems, e.g., ChatGPT. “Language model” (without the modifier “large”) had been used in former machine translation software to denote the knowledge about a language. In particular, translation software used two such models, one for the source language and one for the target language. Large language models are somewhat different. They are typically trained via self-supervised pretraining on large corpora. Many systems use instruction tuning or reinforcement learning from human feedback (RLHF) for the response layer which complements the proper LLM. In the training, huge amounts of texts are consumed, and the training algorithms go through all those texts word by word, predicting the next word each time, and evaluating the predictions. The evaluations shape a neural network which represents the “language model”. Thus, large language models are accumulations of statistical knowledge of all those languages occurring in the processed texts.

Linguistics is about orthography (or phonetics and phonology in case of spoken language), morphology, syntax, semantics, and pragmatics. The evaluation criterion for the first three is correctness. Large language models excel here since statistics mirror a correct orthography, a correct morphology, and a correct syntax as long as the training texts are (mostly) correct in these aspects. Semantics, however, involves truth-conditional aspects and world knowledge that are not reducible to statistics alone. For example, the Roman emperor Nero was not responsible for the outbreak of the great fire in Rome in 64 AD although he often has been blamed for it. Pragmatics is evaluated by appropriateness. With respect to systems like ChatGPT, an appropriate reaction on a user's prompt means that the user is happy with the reaction. Interestingly, systems like ChatGPT include at least one separate layer which is responsible for the reactions on user prompts. This layer is not part of the proper language model itself and is as already mentioned trained in a supervised and not an unsupervised manner.

This paper presents a kind of language model which we call Axiomatic Large Language Model (A-LLM). The A-LLM is about modeling concepts and the conceptual space. It thus focuses on semantics. Among other things, the A-LLM can be used as a semantic resource that can be integrated into language processing AI systems to reduce the gap those systems have representing semantic knowledge and world knowledge. As has been mentioned, the gap results from representing semantics statistically. One kind of problems results if something is mentioned in the prompt that has not been mentioned in the training texts. For example, when asking ChatGPT “Where do sea gulls breed in New Zealand and why does the Chroicocephalus novaehollandiae not share a breeding area with the Chroicocephalus albus?”, it explains that C. albus breed at rivers in New Zealand and not at coast areas although we invented that type of gulls by taking a species name from the genus corvus (raven) adding it to the genus term chroicocephalus (gull). This is an illustrative example of language models‘ tendency toward hallucination when faced with invented entities, cf. [Kalai2025]. Another kind of problems stems from fictional texts included in the training corpora, e.g., Thornton Wilder’s “The Ides of March”. Some of the acting persons in that novel, among them Publius Clodius Pulcher, had in reality already been dead before Caesar‘s assassination. If asked about the role of Clodius Pulcher in Caesar’s death, a language processing AI trained with “The Ides of March” might give an answer that is based on Wilder‘s novel although it also might give Clodius Pulcher’s date of death correctly, not being able to recognize the contradiction.

Both kinds of problems illustrate that a language processing system needs a firm basis of truth or true fact to complement a modern AI system's capability to generate text out of text statistics. In order to serve as such a basis, the following two requirements must be met: The resource in question, such as the A-LLM we will introduce, needs to be large to cover the majority of relevant concepts and it needs to be computational.

The rest of the paper is structured as follows. Section 2 situates the A-LLM within existing research in formal semantics, lexical semantics (controlled defining vocabularies and lexical semantic resources), cognitive science (in particular psycholinguistics), and computational linguistics including ontology building. Section 3 introduces the A-LLM‘s atomic concept layer. Section 4 presents the semantic transformation calculus by which the A-LLM is generated out of that layer. Section 5 describes the semantic graph structure and “semantic distance”. Section 6 broadens the semantic transformation calculus by the functional compounding mechanism. Section 7 examines A-LLM’s inferential architecture exploiting `semantic distance‘. Section 8 presents canonical numeric encoding via Cantor pairing. Section 9 discusses why A-LLM qualifies as a Large Language Model. Section 10 shows a pilot evaluation. Section 11 concludes the paper by analyzing some of A-LLM’s applications and by providing further research directions.

Extension: deriver.app

This chapter consolidates material from the allm LaTeX sources (main40.tex, main50.tex, main97.tex). In Deriver documentation, triples, rules, and the Workbench align with the explicit conceptual structure described here.

Source text: parallel project allm/ (LaTeX); HTML generated via taoke/tools/build-3allm-from-tex.php.