Naija SUD Guidelines

This page outlines various features and annotation conventions useful for the annotation of Naija.

Table of contents


Cleft sentences and questions

Cleft sentences are an extremely common construction in Naija, making the comp:cleft relation a particularly important for the the annotation of this language. The basic cleft construction in Naija includes the phrase na im (it’s him), followed by a verb phrase, though a number of variants exist. The following provides several such examples.

The comp:cleft relation is also used in questions containing interrogative words such as who or where. In such cases, the wh-word is annotated as the root, and is connected to the verb via a comp:cleft relation.


Be, dey, na and the zero copula

Dey

The term dey in Naija performs two primary roles. The first is that of a copula. In these instances, dey is annotated as a verb and is connected to its complement with a comp:pred relation, as in the examples below.

This word is also used as an auxiliary verb which marks the imperfective aspect. In these cases, dey is annotated as an auxiliary and is connected to the following verb with a comp:aux relation.

Be and na

In addition to dey, Naija contains two other words that can function as copulas: be and na. Like dey, be, is annotated as a verb, and is connected to the subject via a subj relationship and to the predicate via the comp:pred relationship. We also treat na in a similar fashion, though it is tagged as a particle rather than a verb.

Zero copula

However, the copula is not always needed to link subjects to their predicates. In cases where no copula is present, the predicate is connected to its subject via a subj relationship.


Compounds and phrasal verbs

Our annotation of Naija makes frequent use of the compound relation. In our annotation system, this relation is systematically applied to relationships between two nouns in which one of them acts as a form of modifier. In this sense, compound functions much like the mod relation, except that it links two nouns together rather than a noun to an adjective.

The compound relation is also used in some relations between nouns and adjectives, such as dry cleaner, which are considered fixed expressions whose meaning cannot be directly understood from its constituent parts

The subtype compound:prt is also used to connect the components of various phrasal verbs inherited from English.

Please note that other languages might use the compound relation in a more limited set of contexts, if at all. For a more general overview of this relation, please consult the dedicated page.


Numbers and dates

Numbers composed of more than one word, such as five hundred or six thousand are primarily chained together with the flat relation. If the number contains the coordinating conjunction and, such as in one hundred and one, the integer directly preceding the coordinating conjunction is connected to one directly following it with a conj:coord relation.

If the number contains a decimal, the point is marked as a noun and is integrated into the number with a simple flat relation.

If numerals are listed a sequence, such as in telephone numbers, the constituents are chained together with the conj:coord relation.

Note that references to radio stations which use this format nevertheless contain a flat relation. This is because we consider that the frequency number effectively functions as a title.

When annotating dates, the mod:appos relation is used to connect the month to the numerical day. Meanwhile, the year is connected to the month using the mod relation.


Multi-word placenames and organizations

In Naija, multi-word placenames and organizations are currently annotated with a simple flat relation, though their constituents retain their typical parts of speech.

Titles and honorifics

Honorifics such as Mister or President are connected to the names they precede with a simple flat relation.

However, this is not the case when a title is connected to a determiner or otherwise modified in some way. In these cases, a mod:appos relation is used.

Official multi-word titles such as Minister of Foreign Affairs are treated as titles (see here for a detailed guide). The head of the title is given an ExtPos of PROPN.