SUD data • Version 2.12
In version 2.12 of SUD data, released in May 2023:
- 8 corpora are maintained in the SUD format (called Native SUD)
- 236 corpora are automatically converted to SUD from the corresponding UD data (version 2.11)
The full release SUD 2.12 contains 244 corpora. Note that UD 2.12 has 245 corpora but one corpus cannot be released in the SUD version, because of its CC license which contain the ND (NoDerivative) flags:
- UD_Portuguese-CINTIL →
License: CC BY-NC-ND 4.0
Download all corpora
Download the full set of 244 SUD corpora: sud-treebanks-v2.12.tgz.
Native SUD corpora
In the table below, the 7 native SUD corpora are given. Note that each corresponding UD version is obtained by automatic conversion.
Corpus | Files | Grew-match |
---|---|---|
SUD_Beja-NSC |
2.12 – latest | 2.12 – latest |
SUD_Chinese-PatentChar |
2.12 – latest | 2.12 – latest |
SUD_French-GSD |
2.12 – latest | 2.12 – latest |
SUD_French-ParisStories |
2.12 – latest | 2.12 – latest |
SUD_French-Rhapsodie |
2.12 – latest | 2.12 – latest |
🆕 SUD_French-Sequoia |
2.12 – latest | 2.12 – latest |
SUD_Naija-NSC |
2.12 – latest | 2.12 – latest |
SUD_Zaar-Autogramm |
2.12 – latest | 2.12 – latest |
Conversion from UD
- 236 corpora of SUD 2.12 are converted from UD. The version of the data and tools used:
- Input data: version 2.12 of UD corpora
- Grew conversion rules system: tag
v2.12
of the conversion system - Tools: grew version 1.12.0, grewlib version 1.12.4 and conll version 1.15.1
Access to each corpus
In the table below, for each corpus you can access to the Grew-match query system.
Corpus | Grew-match |
---|---|
Abaza-ATB |
[Query] [Relations] |
Afrikaans-AfriBooms |
[Query] [Relations] |
Akkadian-PISANDUB |
[Query] [Relations] |
Akkadian-RIAO |
[Query] [Relations] |
Akuntsu-TuDeT |
[Query] [Relations] |
Albanian-TSA |
[Query] [Relations] |
Amharic-ATT |
[Query] [Relations] |
Ancient_Greek-Perseus |
[Query] [Relations] |
Ancient_Greek-PROIEL |
[Query] [Relations] |
Ancient_Hebrew-PTNK |
[Query] [Relations] |
Apurina-UFPA |
[Query] [Relations] |
Arabic-NYUAD |
[Query] [Relations] |
Arabic-PADT |
[Query] [Relations] |
Arabic-PUD |
[Query] [Relations] |
Armenian-ArmTDP |
[Query] [Relations] |
Armenian-BSUT |
[Query] [Relations] |
Assyrian-AS |
[Query] [Relations] |
Bambara-CRB |
[Query] [Relations] |
Basque-BDT |
[Query] [Relations] |
Beja-NSC (Native) |
[Query] [Relations] |
Belarusian-HSE |
[Query] [Relations] |
Bengali-BRU |
[Query] [Relations] |
Bhojpuri-BHTB |
[Query] [Relations] |
🆕 Bororo-BDT |
[Query] [Relations] |
Breton-KEB |
[Query] [Relations] |
Bulgarian-BTB |
[Query] [Relations] |
Buryat-BDT |
[Query] [Relations] |
Cantonese-HK |
[Query] [Relations] |
Catalan-AnCora |
[Query] [Relations] |
Cebuano-GJA |
[Query] [Relations] |
Chinese-CFL |
[Query] [Relations] |
Chinese-GSD |
[Query] [Relations] |
Chinese-GSDSimp |
[Query] [Relations] |
Chinese-HK |
[Query] [Relations] |
Chinese-PatentChar (Native) |
[Query] [Relations] |
Chinese-PUD |
[Query] [Relations] |
Chukchi-HSE |
[Query] [Relations] |
Classical_Chinese-Kyoto |
[Query] [Relations] |
Coptic-Scriptorium |
[Query] [Relations] |
Croatian-SET |
[Query] [Relations] |
Czech-CAC |
[Query] [Relations] |
Czech-CLTT |
[Query] [Relations] |
Czech-FicTree |
[Query] [Relations] |
Czech-PDT |
[Query] [Relations] |
Czech-PUD |
[Query] [Relations] |
Danish-DDT |
[Query] [Relations] |
Dutch-Alpino |
[Query] [Relations] |
Dutch-LassySmall |
[Query] [Relations] |
English-Atis |
[Query] [Relations] |
🆕 English-ESLSpok |
[Query] [Relations] |
English-EWT |
[Query] [Relations] |
🆕 English-GENTLE |
[Query] [Relations] |
English-GUM |
[Query] [Relations] |
English-GUMReddit |
[Query] [Relations] |
English-LinES |
[Query] [Relations] |
English-PUD |
[Query] [Relations] |
English-Pronouns |
[Query] [Relations] |
Erzya-JR |
[Query] [Relations] |
Estonian-EDT |
[Query] [Relations] |
Estonian-EWT |
[Query] [Relations] |
Faroese-OFT |
[Query] [Relations] |
Faroese-FarPaHC |
[Query] [Relations] |
Finnish-FTB |
[Query] [Relations] |
Finnish-PUD |
[Query] [Relations] |
Finnish-TDT |
[Query] [Relations] |
Finnish-OOD |
[Query] [Relations] |
French-FQB |
[Query] [Relations] |
French-GSD (Native) |
[Query] [Relations] |
French-ParTUT |
[Query] [Relations] |
French-PUD |
[Query] [Relations] |
French-Sequoia (Native) |
[Query] [Relations] |
French-ParisStories (Native) |
[Query] [Relations] |
French-Rhapsodie (Native) |
[Query] [Relations] |
Frisian_Dutch-Fame |
[Query] [Relations] |
Galician-CTG |
[Query] [Relations] |
Galician-TreeGal |
[Query] [Relations] |
German-GSD |
[Query] [Relations] |
German-HDT |
[Query] [Relations] |
German-LIT |
[Query] [Relations] |
German-PUD |
[Query] [Relations] |
Gothic-PROIEL |
[Query] [Relations] |
Greek-GDT |
[Query] [Relations] |
🆕 Greek-GUD |
[Query] [Relations] |
Guajajara-TuDeT |
[Query] [Relations] |
Guarani-OldTuDeT |
[Query] [Relations] |
Hebrew-HTB |
[Query] [Relations] |
Hebrew-IAHLTwiki |
[Query] [Relations] |
Hindi-HDTB |
[Query] [Relations] |
Hindi-PUD |
[Query] [Relations] |
Hittite-HitTB |
[Query] [Relations] |
Hungarian-Szeged |
[Query] [Relations] |
Icelandic-PUD |
[Query] [Relations] |
Icelandic-Modern |
[Query] [Relations] |
Icelandic-IcePaHC |
[Query] [Relations] |
Indonesian-GSD |
[Query] [Relations] |
Indonesian-PUD |
[Query] [Relations] |
Indonesian-CSUI |
[Query] [Relations] |
Irish-Cadhan |
[Query] [Relations] |
Irish-IDT |
[Query] [Relations] |
Irish-TwittIrish |
[Query] [Relations] |
Italian-ISDT |
[Query] [Relations] |
Italian-MarkIT |
[Query] [Relations] |
Italian-ParTUT |
[Query] [Relations] |
Italian-PoSTWITA |
[Query] [Relations] |
Italian-TWITTIRO |
[Query] [Relations] |
Italian-ParlaMint |
[Query] [Relations] |
Italian-PUD |
[Query] [Relations] |
Italian-Valico |
[Query] [Relations] |
Italian-VIT |
[Query] [Relations] |
Japanese-BCCWJ |
[Query] [Relations] |
Japanese-BCCWJLUW |
[Query] [Relations] |
Japanese-GSD |
[Query] [Relations] |
Japanese-GSDLUW |
[Query] [Relations] |
Japanese-PUD |
[Query] [Relations] |
Japanese-PUDLUW |
[Query] [Relations] |
Javanese-CSUI |
[Query] [Relations] |
Kaapor-TuDeT |
[Query] [Relations] |
Kangri-KDTB |
[Query] [Relations] |
Karelian-KKPP |
[Query] [Relations] |
Karo-TuDeT |
[Query] [Relations] |
Kazakh-KTB |
[Query] [Relations] |
Khunsari-AHA |
[Query] [Relations] |
Kiche-IU |
[Query] [Relations] |
Komi_Permyak-UH |
[Query] [Relations] |
Komi_Zyrian-IKDP |
[Query] [Relations] |
Komi_Zyrian-Lattice |
[Query] [Relations] |
Korean-GSD |
[Query] [Relations] |
Korean-Kaist |
[Query] [Relations] |
Korean-PUD |
[Query] [Relations] |
Kurmanji-MG |
[Query] [Relations] |
🆕 Kyrgyz-KTMU |
[Query] [Relations] |
Latin-ITTB |
[Query] [Relations] |
Latin-LLCT |
[Query] [Relations] |
Latin-Perseus |
[Query] [Relations] |
Latin-PROIEL |
[Query] [Relations] |
Latin-UDante |
[Query] [Relations] |
Latvian-LVTB |
[Query] [Relations] |
Ligurian-GLT |
[Query] [Relations] |
Lithuanian-ALKSNIS |
[Query] [Relations] |
Lithuanian-HSE |
[Query] [Relations] |
Livvi-KKPP |
[Query] [Relations] |
Low_Saxon-LSDC |
[Query] [Relations] |
Madi-Jarawara |
[Query] [Relations] |
🆕 Maghrebi_Arabic_French-Arabizi |
[Query] [Relations] |
Makurap-TuDeT |
[Query] [Relations] |
Malayalam-UFA |
[Query] [Relations] |
Maltese-MUDT |
[Query] [Relations] |
Manx-Cadhan |
[Query] [Relations] |
Marathi-UFAL |
[Query] [Relations] |
Mbya_Guarani-Dooley |
[Query] [Relations] |
Mbya_Guarani-Thomas |
[Query] [Relations] |
Moksha-JR |
[Query] [Relations] |
Munduruku-TuDeT |
[Query] [Relations] |
Naija-NSC (Native) |
[Query] [Relations] |
Nayini-AHA |
[Query] [Relations] |
Neapolitan-RB |
[Query] [Relations] |
Nheengatu-CompLin |
[Query] [Relations] |
North_Sami-Giella |
[Query] [Relations] |
Norwegian-Bokmaal |
[Query] [Relations] |
Norwegian-Nynorsk |
[Query] [Relations] |
Old_Church_Slavonic-PROIEL |
[Query] [Relations] |
Old_East_Slavic-Birchbark |
[Query] [Relations] |
Old_East_Slavic-RNC |
[Query] [Relations] |
Old_East_Slavic-Ruthenian |
[Query] [Relations] |
Old_East_Slavic-TOROT |
[Query] [Relations] |
Old_French-SRCMF |
[Query] [Relations] |
🆕 Old_Irish-DipSGG |
[Query] [Relations] |
🆕 Old_Irish-DipWBG |
[Query] [Relations] |
Old_Russian-RNC |
[Query] [Relations] |
Old_Russian-TOROT |
[Query] [Relations] |
Old_Turkish-Tonqq |
[Query] [Relations] |
Persian-Seraji |
[Query] [Relations] |
Persian-PerDT |
[Query] [Relations] |
Pomak-Philotis |
[Query] [Relations] |
Polish-LFG |
[Query] [Relations] |
Polish-PDB |
[Query] [Relations] |
Polish-PUD |
[Query] [Relations] |
Portuguese-Bosque |
[Query] [Relations] |
Portuguese-PetroGold |
[Query] [Relations] |
Portuguese-PUD |
[Query] [Relations] |
Romanian-ArT |
[Query] [Relations] |
Romanian-Nonstandard |
[Query] [Relations] |
Romanian-RRT |
[Query] [Relations] |
Romanian-SiMoNERo |
[Query] [Relations] |
Russian-GSD |
[Query] [Relations] |
Russian-PUD |
[Query] [Relations] |
Russian-SynTagRus |
[Query] [Relations] |
Russian-Taiga |
[Query] [Relations] |
Sanskrit-UFAL |
[Query] [Relations] |
Sanskrit-Vedic |
[Query] [Relations] |
Scottish_Gaelic-ARCOSG |
[Query] [Relations] |
Serbian-SET |
[Query] [Relations] |
Sinhala-STB |
[Query] [Relations] |
Skolt_Sami-Giellagas |
[Query] [Relations] |
Slovak-SNK |
[Query] [Relations] |
Slovenian-SSJ |
[Query] [Relations] |
Slovenian-SST |
[Query] [Relations] |
Soi-AHA |
[Query] [Relations] |
South_Levantine_Arabic-MADAR |
[Query] [Relations] |
Spanish-AnCora |
[Query] [Relations] |
Spanish-GSD |
[Query] [Relations] |
Spanish-PUD |
[Query] [Relations] |
Swedish-LinES |
[Query] [Relations] |
Swedish-PUD |
[Query] [Relations] |
Swedish_Sign_Language-SSLC |
[Query] [Relations] |
Swedish-Talbanken |
[Query] [Relations] |
Swiss_German-UZH |
[Query] [Relations] |
Tagalog-TRG |
[Query] [Relations] |
Tagalog-Ugnayan |
[Query] [Relations] |
Tamil-TTB |
[Query] [Relations] |
Tamil-MWTT |
[Query] [Relations] |
Tatar-NMCTT |
[Query] [Relations] |
Teko-TuDeT |
[Query] [Relations] |
Telugu-MTG |
[Query] [Relations] |
Thai-PUD |
[Query] [Relations] |
Tupinamba-TuDeT |
[Query] [Relations] |
Turkish-Atis |
[Query] [Relations] |
Turkish-BOUN |
[Query] [Relations] |
Turkish-FrameNet |
[Query] [Relations] |
Turkish-GB |
[Query] [Relations] |
Turkish-IMST |
[Query] [Relations] |
Turkish-Kenet |
[Query] [Relations] |
Turkish-PUD |
[Query] [Relations] |
Turkish-Penn |
[Query] [Relations] |
Turkish-Tourism |
[Query] [Relations] |
Turkish_German-SAGT |
[Query] [Relations] |
Ukrainian-IU |
[Query] [Relations] |
Umbrian-IKUVINA |
[Query] [Relations] |
Upper_Sorbian-UFAL |
[Query] [Relations] |
Urdu-UDTB |
[Query] [Relations] |
Uyghur-UDT |
[Query] [Relations] |
Vietnamese-VTB |
[Query] [Relations] |
Warlpiri-UFAL |
[Query] [Relations] |
Welsh-CCG |
[Query] [Relations] |
Western_Armenian-ArmTDP |
[Query] [Relations] |
Western_Sierra_Puebla_Nahuatl-ITML |
[Query] [Relations] |
Wolof-WTB |
[Query] [Relations] |
Xavante-XDT |
[Query] [Relations] |
Xibe-XDT |
[Query] [Relations] |
Yakut-YKTDT |
[Query] [Relations] |
Yoruba-YTB |
[Query] [Relations] |
Yupik-SLI |
[Query] [Relations] |
Zaar-Autogramm (Native) |
[Query] [Relations] |