publications
-
Correcting the Tamazight Portions of FLORES+ and OLDI Seed Datasets
Tamazight -
Awal – Community-Powered Language Technology for Tamazight
Tamazight -
Nós-TTS : a Web User Interface for Galician Text-to-Speech
Galician -
BibleTTS - a large, high-fidelity, multilingual, and uniquely African speech corpus
Akuapem TwiAsante TwiChichewaEweHausaKikuyuLingalaLugandaLuoYoruba -
Preparing an endangered language for the digital age - The Case of Judeo-Spanish
Ladino -
Corpora compilation for prosody-informed speech processing
EnglishSpanish -
Congolese Swahili Machine Translation for Humanitarian Response
Swahili CongoCoastal SwahiliFrench -
TICO-19 – The Translation Initiative for COvid-19
AmharicArabic (Modern Standard)BengaliChinese (Simplified)DariDinkaFarsiFrench (European)HausaHindiIndonesianKanuriKhmer (Central)KinyarwandaKurdish KurmanjiKurdish SoraniLingalaLugandaMalayMarathiMyanmarNepaliNigerian FulfuldeNuerOromoPashtoPortuguese (Brazilian)RussianSomaliSpanish (Latin American)SwahiliCongolese SwahiliTagalogTamilTigrinyaUrduZulu -
Participatory Research for Low-resourced Machine Translation - A Case Study in African Languages
African Languages -
Gamayun – Language Technology for Humanitarian Response
KanuriHausaSwahiliLingalaNandeRohingyaTigrinyaEnglishFrench -
CATOTRON – A Neural Text-to-Speech System in Catalan
Catalan -
Masakhane – Machine Translation For Africa
African Languages -
Tigrinya Neural Machine Translation with Transfer Learning for Humanitarian Response
Tigrinya -
Prosodic phrase alignment for machine dubbing
SpanishEnglish -
Building an open source automatic speech recognition system for Catalan
Catalan -
Bilingual prosodic dataset compilation for spoken language translation
EnglishSpanish -
Visualizing punctuation restoration in speech transcripts with Prosograph
-
Attentional parallel RNNs for generating punctuation in transcribed speech
English -
Revising the METU-Sabancı Turkish treebank: An exercise in surface-syntactic annotation of agglutinative languages
Turkish -
Prosograph: A tool for prosody visualisation of large speech corpora
-
Automatic extraction of parallel speech corpora from dubbed movies
EnglishSpanish -
From raw data to semantically enriched hyperlinking: Recent advances in the LinkedTV analysis workflow
German -
Processing the manuscripts of Atatürk
Turkish