Documentation
Licensing
Talkalotta's Agents build with the Talkalotta Corpora by default, with some of Talkalotta's Agents also building upon the work of select datasets from: 1) Universal Dependencies 2.15 (Universal Dependencies v2.15 License Agreement); 2) UniMorph ; 3) Common Voice (Datasets); and/or 4) Tatoeba (Datasets) respectively by referencing select datasets in Knowledge.
Dataset usage differs by tiers:
Agent Datasets: Focus and Pro Tier
No changes were made to any Universal Dependencies nor Common Voice datasets respectively. UniMorph datasets used were converted from .txt file types to .csv file types in December, 2024, while Tatoeba datasets used were converted from .tsv to .csv file types in March, 2025.
The most stringent of the applicable licenses and their terms and conditions govern, extend through, and apply to your use of the indicated Talkalotta Agent(s), including their outputs.
License | URL of Deed |
|---|---|
C-UDA 1.0 | https://cdla.dev/computational-use-of-data-agreement-v1-0/ |
CC BY 4.0 | http://creativecommons.org/licenses/by/4.0/ |
CC BY-SA 3.0 | http://creativecommons.org/licenses/by-sa/3.0/ |
CC BY-SA 4.0 | http://creativecommons.org/licenses/by-sa/4.0/ |
CC-0 | https://creativecommons.org/public-domain/cc0/ |
CC0 1.0 | http://creativecommons.org/publicdomain/zero/1.0/ |
GNU GPL 2.0 | http://opensource.org/licenses/GPL-2.0 |
GNU GPL 3.0 | http://opensource.org/licenses/GPL-3.0 |
LGPL-LR | https://spdx.org/licenses/LGPLLR.html |
PD | public domain |
Talkalotta Corpora
As per our official GitHub: here
Synthetic, Licensable Datasets made with Talkalotta's Custom AI Agents for Language Learning and Language Tool Development
Universal Dependencies
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
Acknowledgement:
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code: LM2023062
Project name: LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy
Credit:
Zeman, Daniel; et al., 2024, Universal Dependencies 2.15, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-5787.1
UniMorph
The Universal Morphology (UniMorph) project is a collaborative effort to improve how NLP handles complex morphology in the world’s languages. The goal of UniMorph is to annotate morphological data in a universal schema that allows an inflected word from any language to be defined by its lexical meaning, typically carried by the lemma, and by a rendering of its inflectional form in terms of a bundle of morphological features from our schema. The specification of the schema is described [here].
Tatoeba
Tatoeba is a large database of sentences and translations. Its content is ever-growing and results from the voluntary contributions of thousands of members.
Agent Datasets: Focus and Pro Tier
Language | Agent | Universal Dependencies License (if applicable) | Treebank | UniMorph License (if applicable) | ISO 639-3 (if applicable) | Common Voice Corpus 20.0 License (if applicable) | Common Voice File |
|---|---|---|---|---|---|---|---|
Afrikaans | Afrikaans B1 Exams⚡TKLTA | CC BY-SA 4.0 | Afrikaans-AfriBooms | CC BY-SA 3.0 | afr | CC-0 | validated_sentences |
Afrikaans | Afrikaans C2 Exams⚡TKLTA | CC BY-SA 4.0 | Afrikaans-AfriBooms | CC BY-SA 3.0 | afr | CC-0 | validated_sentences |
Afrikaans | Afrikaans B2 Exams⚡TKLTA | CC BY-SA 4.0 | Afrikaans-AfriBooms | CC BY-SA 3.0 | afr | CC-0 | validated_sentences |
Afrikaans | Afrikaans C1 Exams⚡TKLTA | CC BY-SA 4.0 | Afrikaans-AfriBooms | CC BY-SA 3.0 | afr | CC-0 | validated_sentences |
Afrikaans | Afrikaans A1-C2 Exams⚡TKLTA | CC BY-SA 4.0 | Afrikaans-AfriBooms | CC BY-SA 3.0 | afr | CC-0 | validated_sentences |
Afrikaans | Afrikaans A1 Exams⚡TKLTA | CC BY-SA 4.0 | Afrikaans-AfriBooms | CC BY-SA 3.0 | afr | CC-0 | validated_sentences |
Afrikaans | Afrikaans A2 Exams⚡TKLTA | CC BY-SA 4.0 | Afrikaans-AfriBooms | CC BY-SA 3.0 | afr | CC-0 | validated_sentences |
Afrikaans | Learn Afrikaans with Talkalotta | CC BY-SA 4.0 | Afrikaans-AfriBooms | CC BY-SA 3.0 | afr | CC-0 | validated_sentences |
Akan | Learn Akan with Talkalotta | N/A | N/A | CC BY-SA 3.0 | aka | N/A | N/A |
Akkadian | Learn Akkadian (𒀀𒆠𒌓𒌓𒉡) with Talkalotta | CC BY-SA 4.0 | Akkadian-PISANDUB | N/A | N/A | N/A | N/A |
Akuntsu | Learn Akuntsu with Talkalotta | CC BY-SA 4.0
| Akuntsu-TuDeT | N/A | N/A | N/A | N/A |
Albanian (Shqipja) | Learn Albanian (Shqipja) with Talkalotta | CC BY-SA 4.0 | Albanian-STAF | CC BY-SA 3.0 | sqi | N/A | N/A |
Agent Datasets: Plus Tier
Comprehensive Base Language | Agent | Universal Dependencies License (if applicable) | Treebank | UniMorph License (if applicable) | ISO 639-3 (if applicable) | Common Voice Corpus 20.0 License (if applicable) | Common Voice File (if applicable) | Tatoeba License (if applicable) | Tatoeba File (if applicable) |
|---|---|---|---|---|---|---|---|---|---|
Afrikaans | Leer Jouself Enige Taal met Talkalotta (TKLTA) | CC BY-SA 4.0 | Afrikaans-AfriBooms | CC BY-SA 3.0 | afr | CC-0 | validated_sentences | CC BY 2.0 FR | afr_sentences |
Akan | Kyerɛ Wo Ho Kasa Biara De Talkalotta (TKLTA) | N/A | N/A | CC BY-SA 3.0 | aka | N/A | N/A | N/A | N/A |
Arabic (العربية الفصحى) | علّم نفسك بنفسك أي لغة مع Talkalotta (TKLTA) | CC BY-SA 4.0 | Arabic-NYUAD | CC BY-SA 3.0 | ara | N/A | N/A | N/A | N/A |
Armenian (Հայերեն) | Ինքնուրույն Սովորեք Ցանկացած Լեզու Talkalotta-ի Միջոցով (TKLTA) | CC BY-SA 4.0 | Armenian-ArmTDP and Armenian-BSUT | CC BY-SA 3.0 | hye | N/A | N/A | N/A | N/A |
Brazilian Portuguese (Português) | Ensine Qualquer Idioma a Você Mesmo com Talkalotta | CC BY 4.0 | Portuguese-Porttinari | N/A | N/A | N/A | N/A | N/A | N/A |
Bulgarian (български език) | Научи сам всеки език с Talkalotta (TKLTA) | N/A | N/A | CC BY-SA 3.0 | bul | N/A | N/A | N/A | N/A |
Czech (Čeština) | Naučte Se Jakýkoli Jazyk s Talkalotta (TKLTA) | CC BY-SA 4.0 | Czech-CAC, Czech-CLTT, and Czech-Poetry | CC BY-SA 3.0 | ces | N/A | N/A | N/A | N/A |
Danish (Dansk) | Lær Dig Et Hvilket Som Helst Sprog med Talkalotta | CC BY-SA 4.0 | Danish-DDT | CC BY-SA 3.0 | dan | N/A | N/A | N/A | N/A |
Dutch (Nederlands) | Leer Jezelf Elke Taal met Talkalotta (TKLTA) | CC BY-SA 4.0 | Dutch-Alpino and Dutch-LassySmall | CC BY-SA 3.0 | nld | N/A | N/A | N/A | N/A |
Estonian (Eesti Keel) | Õppige Ise Mis Tahes Keelt Talkalottaga (TKLTA) | N/A | N/A | CC BY-SA 3.0 | est | N/A | N/A | N/A | N/A |
Finnish (Suomi) | Opeta Itsellesi Mikä Tahansa Kieli TKLTA Avulla | CC BY 4.0 and CC BY-SA 4.0 | Finnish-FTB, Finnish-OOD, and Finnish-TDT | CC BY-SA 3.0 | fin (fin.1 and fin.2) | N/A | N/A | N/A | N/A |
French (Français) | Apprenez N'importe Quelle Langue avec Talkalotta | CC BY-SA 4.0 and LGPL-LR | French-GSD, French-ParisStories, French-Rhapsodie, and French-Sequoia | CC BY-SA 3.0 | fra | N/A | N/A | N/A | N/A |

