# LEXICON

The Lexicon is a tool accessible in the Composer area of the NLG platform. It holds the grammatical information in a project about particular words that the software needs to generate linguistically correct units.

While the software already stores these kind of information about many of the most common words in all languages in a global, the project lexicons* include the words that come from domain knowledge. The global lexicon cannot be edited directly, all your entries and changes enter the project lexicon.

You need to create a lexicon entry, if grammatical interpretation of a word is wrong, or a word has different grammar requirements for your use case. Lexicon entries are only have to be made for words that require grammatical flexibility, i.e. for which containers have been created. Words that are used in static text parts do not need lexicon entries.

# Creating and editing lexicon entries

In the lexicon section (accessible under MORE TOOLS) you can view existing lexicon entries and edit them, view an overview of missing lexicon entries and create new entries.

The existing lexicon entries for this project are divided into lists based on their part of speech: nouns, verbs and adjectives. With a click on every entry, you can see the properties and edit them. The settings for each word category are based on the requirements of each language.

TIP

You can create lexicon entries in any language, even if they are not present in the project settings or no collections are available in this language.

# Choose the part of speech

You have to categorize every new entry into the parts of speech: noun, verb and adjective.

In some languages, adjectives are never inflected or are inflected rule based, so for these languages there are only noun or verb choices in the menue.

# Set in the lemma

A lemma is the base form of the word, e.g the form is used in a dictionary. There are different conventions in different languages on how a lemma is built, so you have to check them for each language.

Here is a more general overview:

part of speech lemma
noun usually singular, masculine, nominative
adjective usually singular, masculine, nominative
verb in many languages infinitive (singular present). English: the uninflected form

TIP

To distinguish lemma and stem: a stem is the part of a word that does not change.

  • stem: creat for *to create (verb) or creative (adjective) or creation (noun).
  • lemma: create (verb), creative (adjective) and creation (noun)

# Settings for Nouns

The lexicon is the place where the information is stored that is necessary to deflect nouns in a grammatically correct way and to use pronouns properly. What information this is differs in many languages.

For nouns this comprises for example:

number singular, dual, plural
grammatical gender masculine, feminine, neutral, animate, inanimate
grammatical cases nominative, accusative, genitive, dative, locative, vocative, instrumental, ablative

# Plurale Tantum (Plural only)

If the noun for which you want to create an entry occurs only in the plural form, set the slider to plurale tantum. Then only fields for the plural forms that are actually required will appear in the fields below.

# Switch Determiner

Some nouns differ from standard behavior and combine with different determiners than other nouns from the same type. You can configure multiple alternate determiners and specify with which grammatical forms they will appear.

Example: Nouns denoting countries usually stand with the none determiner like Poland, but the Netherlands need a definite determiner, so you define a replacement here.

# Switch Preposition

As with the determiner, there is also deviant behavior with some nouns when using prepositions. Therefore you can enter these cases here.

Example: Nouns denoting places usually stand with the preposition in like in Germany, but there are some exceptions like at Shibuya station.

# Settings for Adjectives

The settings for adjective are similar to the settings for nouns.

Adjective Position: You have to define if the adjective stands before or after a noun.

# Setting for Verbs

For verbs you have to fill the conjugations of the verbs, this means how the verb will form under different circumstances.

The conjugations of the verbs depend different grammatical persons that include the number (singular, plural in some languages also dual) and distinct between the speaker (first person), the addressee (second person), and others (third person). All these conjugations must also be completed for the different tenses.

Overview of the basic grammatical persons & tenses you can define in the lexicon:

grammatical persons first, second, third
number singular, dual, plural
grammatical gender masculine, feminine, neutral, animate, inanimate
tenses present, past, imperfect, future, past participle, gerund

# Missing lexicon entries

The most common way to spot that grammatical information is missing is the occurrence of such mistakes as missing determiners or a wrong declination for a noun. If such mistakes occur in your test content, chances are high that a missing lexicon entry might be the cause of that irregularity. Thus, an adequate solution for grammar errors oftentimes contains adding new words to the lexicon. The missing lexicon entries give you an impression about which words are used in your data and what you need to do.

In the Missing entries tab you can see which words in your project do not appear in a lexicon. All these words are then unhandled entries. You can decide whether to include them in your lexicon or ignore them. All the missing you ignore will be categorized as ignored entries. Please be aware that the engine just looks for missing entries in generated texts (in the RESULTS area).

You can also use the search function to search entries directly in these two categories (ignored and undhandled entries).

TIP

If words are not listed in either the global or project lexicon, they are adapted according to the usual linguistic standards of the respective language.