Princeton WordNet (PWN)

Controlled vocabulary of concepts (FCSS)

Princeton WordNet (PWN):

Jurafsky and Martin in [JuMa2020] provide some statistical information about WordNet: English WordNet 3.0 consists of three separate databases, one each for nouns (∼ 188k) and verbs (∼ 11.5k) and a third for adjectives (∼ 22.5k) and adverbs (∼ 4.5k). Each database contains a set of lemmas, each one annotated with a set of Word Senses (WS). The set of near synonyms for a WordNet sense is called a SynSet as abbreviation of Synonym Set (SS). 26 Lexicographic Categories (LC) are defined for nouns and 15 for verbs. These categories are often called SuPerSenses (SPS), as they act as coarse semantic categories or groupings of senses. SPSs have also been defined for adjectives and prepositions by Schneider et al. [ScHw2018]. Noun Categories (NC) are the set NC = {act, animal, artifact, attribute, body, cognition, communication, feeling, food, group, location, motive, natural event, natural object, other, person, phenomenon, plant, possession, process, quantity, relation, shape, state, substance, time}. PWN has as two kinds of Taxonomic Entities (TE) classes and instances. An instance is an individual or a proper noun that represents a unique entity.

Collaborative InterLingual Index (CILI):

The CILI project discussed by Bond et al. [BoVo2016] identifies a number of problems with Princeton WordNet (PWN). Firstly, there are discrepancies in SynSet representation between different wordnets, where large SynSets in one wordnet are linked to small SynSets in another language through PWN, indicating underrepresentation. Additionally, different interpretations of relation names by editors and algorithms further contribute to inconsistencies. There are also noticeable differences in vocabulary coverage and polysemy across wordnets. Moreover, the absence of certain concepts in Princeton WordNet poses a disadvantage, as these concepts cannot be expressed. To investigate meaning across languages, cross-lingual linking of SynSets is necessary. However, the lack of coordination between projects leads to the introduction of similar or even identical concepts in multiple places, causing duplication and validation challenges. To overcome these issues, it is important to establish consistent meanings for semantic and lexical relations across languages. These challenges highlight the need for improved representation, coordination, and consistency to facilitate effective cross-lingual semantic analysis in order to ensure accurate and comprehensive interlingual communication. We will describe below, how many of the problems described can be addressed by creating a Controlled Vocabulary of Concepts as a formal ontology of linguistic concepts.

Schneider et al. [ScHw2018] introduce a new annotation scheme, a new corpus, and a new task for the disambiguation of prepositions and possessives in English. Starting from the insight that there is a semantic overlap between prepositions and possessives they produced a new hierarchical inventory (SNACS) of 50 super sense classes and a gold-standard corpus where all types and tokens of prepositions and possessives are disambiguated. In the Concept Composition section and in the Discussion section we will show how multi-word prepositions and possessives can be modeled as semantic compounds.

Extension: deriver.app

Back to Controlled vocabulary of concepts; Deriver documentation.

Source: taoke.de — Princeton WordNet (PWN).