top of page

Documentation

Licensing

Talkalotta's Agents build with the Talkalotta Corpora by default, with some of Talkalotta's Agents also building upon the work of select datasets from: 1) Universal Dependencies 2.15 (Universal Dependencies v2.15 License Agreement); 2) UniMorph ; 3) Common Voice (Datasets); and/or 4) Tatoeba (Datasets) respectively by referencing select datasets in Knowledge.

Dataset usage differs by tiers:

Agent Datasets: Focus and Pro Tier

Agent Datasets: Plus Tier

 

No changes were made to any Universal Dependencies nor Common Voice datasets respectively. UniMorph datasets used were converted from .txt file types to .csv file types in December, 2024, while Tatoeba datasets used were converted from .tsv to .csv file types in March, 2025.

The most stringent of the applicable licenses and their terms and conditions govern, extend through, and apply to your use of the indicated Talkalotta Agent(s), including their outputs.

This means that any content generated using these datasets may also require attribution and ShareAlike if it is shared, redistributed and/or adapts the material.

 

For more information on proper attribution and licensing requirements, consult the links and tables provided on this webpage.

Talkalotta Corpora

As per our official GitHub: here

 

Synthetic, Licensable Datasets made with Talkalotta's Custom AI Agents for Language Learning and Language Tool Development

Universal Dependencies

As per their website: here

 

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).

Acknowledgement:
Ministerstvo školství, mládeže a tělovýchovy České republiky

Project code: LM2023062

Project name: LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy

​​

Credit:
Zeman, Daniel; et al., 2024, Universal Dependencies 2.15, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-5787.1

Tatoeba

As per their website: here

 

Tatoeba is a large database of sentences and translations. Its content is ever-growing and results from the voluntary contributions of thousands of members.

Agent Datasets: Focus and Pro Tier

Language
Agent
Universal Dependencies License (if applicable)
Treebank
UniMorph License (if applicable)
ISO 639-3 (if applicable)
Common Voice Corpus 20.0 License (if applicable)
Common Voice File
Afrikaans
Afrikaans B1 Exams⚡TKLTA
CC BY-SA 4.0
Afrikaans-AfriBooms
CC BY-SA 3.0
afr
CC-0
validated_sentences
Afrikaans
Afrikaans C2 Exams⚡TKLTA
CC BY-SA 4.0
Afrikaans-AfriBooms
CC BY-SA 3.0
afr
CC-0
validated_sentences
Afrikaans
Afrikaans B2 Exams⚡TKLTA
CC BY-SA 4.0
Afrikaans-AfriBooms
CC BY-SA 3.0
afr
CC-0
validated_sentences
Afrikaans
Afrikaans C1 Exams⚡TKLTA
CC BY-SA 4.0
Afrikaans-AfriBooms
CC BY-SA 3.0
afr
CC-0
validated_sentences
Afrikaans
Afrikaans A1-C2 Exams⚡TKLTA
CC BY-SA 4.0
Afrikaans-AfriBooms
CC BY-SA 3.0
afr
CC-0
validated_sentences
Afrikaans
Afrikaans A1 Exams⚡TKLTA
CC BY-SA 4.0
Afrikaans-AfriBooms
CC BY-SA 3.0
afr
CC-0
validated_sentences
Afrikaans
Afrikaans A2 Exams⚡TKLTA
CC BY-SA 4.0
Afrikaans-AfriBooms
CC BY-SA 3.0
afr
CC-0
validated_sentences
Afrikaans
Learn Afrikaans with Talkalotta
CC BY-SA 4.0
Afrikaans-AfriBooms
CC BY-SA 3.0
afr
CC-0
validated_sentences
Akan
Learn Akan with Talkalotta
N/A
N/A
CC BY-SA 3.0
aka
N/A
N/A
Akkadian
Learn Akkadian (𒀀𒆠𒌓𒌓𒉡) with Talkalotta
CC BY-SA 4.0
Akkadian-PISANDUB
N/A
N/A
N/A
N/A
Akuntsu
Learn Akuntsu with Talkalotta
CC BY-SA 4.0
Akuntsu-TuDeT
N/A
N/A
N/A
N/A
Albanian (Shqipja)
Learn Albanian (Shqipja) with Talkalotta
CC BY-SA 4.0
Albanian-STAF
CC BY-SA 3.0
sqi
N/A
N/A

Agent Datasets: Plus Tier

Comprehensive Base Language
Agent
Universal Dependencies License (if applicable)
Treebank
UniMorph License (if applicable)
ISO 639-3 (if applicable)
Common Voice Corpus 20.0 License (if applicable)
Common Voice File (if applicable)
Tatoeba License (if applicable)
Tatoeba File (if applicable)
Afrikaans
Leer Jouself Enige Taal met Talkalotta (TKLTA)
CC BY-SA 4.0
Afrikaans-AfriBooms
CC BY-SA 3.0
afr
CC-0
validated_sentences
CC BY 2.0 FR
afr_sentences
Akan
Kyerɛ Wo Ho Kasa Biara De Talkalotta (TKLTA)
N/A
N/A
CC BY-SA 3.0
aka
N/A
N/A
N/A
N/A
Arabic (العربية الفصحى)
علّم نفسك بنفسك أي لغة مع Talkalotta (TKLTA)
CC BY-SA 4.0
Arabic-NYUAD
CC BY-SA 3.0
ara
N/A
N/A
N/A
N/A
Armenian (Հայերեն)
Ինքնուրույն Սովորեք Ցանկացած Լեզու Talkalotta-ի Միջոցով (TKLTA)
CC BY-SA 4.0
Armenian-ArmTDP and Armenian-BSUT
CC BY-SA 3.0
hye
N/A
N/A
N/A
N/A
Brazilian Portuguese (Português)
Ensine Qualquer Idioma a Você Mesmo com Talkalotta
CC BY 4.0
Portuguese-Porttinari
N/A
N/A
N/A
N/A
N/A
N/A
Bulgarian (български език)
Научи сам всеки език с Talkalotta (TKLTA)
N/A
N/A
CC BY-SA 3.0
bul
N/A
N/A
N/A
N/A
Czech (Čeština)
Naučte Se Jakýkoli Jazyk s Talkalotta (TKLTA)
CC BY-SA 4.0
Czech-CAC, Czech-CLTT, and Czech-Poetry
CC BY-SA 3.0
ces
N/A
N/A
N/A
N/A
Danish (Dansk)
Lær Dig Et Hvilket Som Helst Sprog med Talkalotta
CC BY-SA 4.0
Danish-DDT
CC BY-SA 3.0
dan
N/A
N/A
N/A
N/A
Dutch (Nederlands)
Leer Jezelf Elke Taal met Talkalotta (TKLTA)
CC BY-SA 4.0
Dutch-Alpino and Dutch-LassySmall
CC BY-SA 3.0
nld
N/A
N/A
N/A
N/A
Estonian (Eesti Keel)
Õppige Ise Mis Tahes Keelt Talkalottaga (TKLTA)
N/A
N/A
CC BY-SA 3.0
est
N/A
N/A
N/A
N/A
Finnish (Suomi)
Opeta Itsellesi Mikä Tahansa Kieli TKLTA Avulla
CC BY 4.0 and CC BY-SA 4.0
Finnish-FTB, Finnish-OOD, and Finnish-TDT
CC BY-SA 3.0
fin (fin.1 and fin.2)
N/A
N/A
N/A
N/A
French (Français)
Apprenez N'importe Quelle Langue avec Talkalotta
CC BY-SA 4.0 and LGPL-LR
French-GSD, French-ParisStories, French-Rhapsodie, and French-Sequoia
CC BY-SA 3.0
fra
N/A
N/A
N/A
N/A
bottom of page