Correspondences between SUD and UD
This page is dedicated to exploring several key differences between SUD and UD labels, and the correspondences between the two schemes (For information about the conversion process, see SUD corpora).
SUD is a dependency-based annotation scheme. Annotation choices rely on surface-syntactic distributional criteria, while at the same time attempting to maintain convertibility with the UD annotation scheme as much as possible.
SUD represents an alternative rather than a competitor to UD, and was designed in such a way that the two can convey the same informational content. The two schemes enjoy a nearly perfect degree of two-way convertibility, meaning that conversions between the two schemes can take place without informational loss in most cases. Because of this, correspondences between the two are most often regular and predictable.
Correspondences between SUD and UD relationships are impacted by several key properties. Firstly, SUD annotations are less redundant and more economical than UD annotations. For example, we can see in the table below that SUD uses a single subj
relation which comprises both the nsubj
(nominal subject) and csubj
(clausal subject) relationships in UD. However, the information provided by UD’s distinction between nominal and clausal subjects is not lost in under the simpler SUD scheme: the differentiation can be recovered automatically from the POS of the subject and its context, though how this context is taken into account depends on the language. In total, a subset of 17 UD relations (nsubj
, csubj
, obj
, iobj
, obl
, xcomp
, ccomp
, amod
, nmod
, nummod
, advmod
, acl
, advcl
, aux
, cop
, case
, mark
) is replaced by three major relations in SUD: subj
, comp
, mod
, as well as udep
to a marginal extent.
In addition to its more economical set of labels, SUD also diverges from UD in the sense that it does not systematically label content words as heads. Instead, SUD treats adpositions, subordinating conjunctions, auxiliaries, and copulas as heads. This is because SUD identifies surface syntactic heads using the main criterion that they determine the distribution of the syntactic unit in question. For example, the SUD scheme would identify the preposition to in the sentence Peter talked to Mary as a head, since it determines the distribution of Mary. The UD scheme would label Mary as a head based on the fact that it is a content word. Because of this difference, the direction of certain syntactic relationships is reversed between SUD and UD. This namely applies to the SUD relationships aux
, cop
, case
, and mark
, which are also highlighted in bold in the correspondence table below. You may also find more information about this aspect of SUD relations on the general principles page.
Table of correspondences between UD and SUD
nsubj | subj |
csubj | |
aux | comp:aux |
cop | comp:pred |
xcomp | comp:obj |
case | |
mark | |
obj | |
ccomp | |
ccomp | comp:obl |
obl | |
iobj | |
nmod | udep |
obl, acl | mod |
advcl | |
advmod | |
amod | |
nummod | |
fixed | encoded in node features (see here) |
det | det |
nummod |
Example of a sentence annotated in SUD (above) and UD (below).