alpöktem
blog about publications

publications

  • Correcting the Tamazight Portions of FLORES+ and OLDI Seed Datasets

    Tamazight

    October 24, 2025

  • Awal – Community-Powered Language Technology for Tamazight

    Tamazight

    October 23, 2025

  • Nós-TTS : a Web User Interface for Galician Text-to-Speech

    Galician

    March 15, 2024

  • BibleTTS - a large, high-fidelity, multilingual, and uniquely African speech corpus

    Akuapem TwiAsante TwiChichewaEweHausaKikuyuLingalaLugandaLuoYoruba

    September 15, 2022

  • Preparing an endangered language for the digital age - The Case of Judeo-Spanish

    Ladino

    June 21, 2022

  • Corpora compilation for prosody-informed speech processing

    EnglishSpanish

    September 4, 2021

  • Congolese Swahili Machine Translation for Humanitarian Response

    Swahili CongoCoastal SwahiliFrench

    April 19, 2021

  • TICO-19 – The Translation Initiative for COvid-19

    AmharicArabic (Modern Standard)BengaliChinese (Simplified)DariDinkaFarsiFrench (European)HausaHindiIndonesianKanuriKhmer (Central)KinyarwandaKurdish KurmanjiKurdish SoraniLingalaLugandaMalayMarathiMyanmarNepaliNigerian FulfuldeNuerOromoPashtoPortuguese (Brazilian)RussianSomaliSpanish (Latin American)SwahiliCongolese SwahiliTagalogTamilTigrinyaUrduZulu

    November 19, 2020

  • Participatory Research for Low-resourced Machine Translation - A Case Study in African Languages

    African Languages

    November 16, 2020

  • Gamayun – Language Technology for Humanitarian Response

    KanuriHausaSwahiliLingalaNandeRohingyaTigrinyaEnglishFrench

    November 1, 2020

  • CATOTRON – A Neural Text-to-Speech System in Catalan

    Catalan

    October 16, 2020

  • Masakhane – Machine Translation For Africa

    African Languages

    April 26, 2020

  • Tigrinya Neural Machine Translation with Transfer Learning for Humanitarian Response

    Tigrinya

    April 26, 2020

  • Prosodic phrase alignment for machine dubbing

    SpanishEnglish

    September 24, 2019

  • Building an open source automatic speech recognition system for Catalan

    Catalan

    November 22, 2018

  • Bilingual prosodic dataset compilation for spoken language translation

    EnglishSpanish

    November 21, 2018

  • Visualizing punctuation restoration in speech transcripts with Prosograph

    September 2, 2018

  • Attentional parallel RNNs for generating punctuation in transcribed speech

    English

    October 23, 2017

  • Revising the METU-Sabancı Turkish treebank: An exercise in surface-syntactic annotation of agglutinative languages

    Turkish

    September 18, 2017

  • Prosograph: A tool for prosody visualisation of large speech corpora

    August 20, 2017

  • Automatic extraction of parallel speech corpora from dubbed movies

    EnglishSpanish

    July 30, 2017

  • From raw data to semantically enriched hyperlinking: Recent advances in the LinkedTV analysis workflow

    German

    October 28, 2013

  • Processing the manuscripts of Atatürk

    Turkish

    May 22, 2010

© 2019–2026 Alp Öktem