Notes from Mozilla Festival 2025
Mozilla Festival happened right in my own neighborhood in Barcelona this year—three days of conversations that put me up to date where ethical technology is heading. And also where my work is situated among all this.
This wasn’t your typical conference with talks and panels, but workshops, hackathons, art installations, pilates, even a fashion show. The venue was Poble Espanyol, an enchanting miniature Spanish architecture-themed park that I always appreciate visiting as long as I don’t pay their usual pricey entrance fee. Imagine one session being in the back of a theater, then in the art museum and then at the back of a chapel.
The theme was “Unlearning,” and the crowd matched it: grassroots language organizations like Dagbani Wikimedians User Group and Te Hiku Media, civic tech labs like data_labe, Masakhane, researchers from Makerere University and University of South Africa, startups like Lelapa AI, Honcho, and Mozilla Foundation itself bringing it all together.
Getting to be there in person with my CLEAR Global colleague Polly Harlow was definitely the highlight—after years of collaborating remotely. I hope we were able to synthesize our learnings and gave our team a fresh update on the landscape. And the landscape is shifting fast!
A shoutout to Francis Tyers for making it possible for us to attend. He’s been a collaborator since our work putting Aranese on Common Voice years ago and is now doing key work with Mozilla Data Collective and Common Voice. It was great to finally meet him and other folks from Common Voice in person.
A striking thing we noticed was that the notion “More data = better” is being challenged everywhere. The AI Data Real Talk session made this concrete. Keoni Mahelona from Te Hiku Media sat beside a Meta representative and essentially said: “we don’t want you to have our data, and if you want to work on our language, go marry a Māori person, and I’m serious.” Now that’s data sovereignty, not as theory but as practice.
The licensing ecosystem is exploding—NOODL, Esethu, Kaitiakitanga frameworks are all trying to encode community ownership into legal infrastructure. It was especially a pleasure to meet Lilian D. Wanzare from Maseno Centre for Applied Artificial Intelligence who walked me through NOODL’s vision while having our pizza together. Creative Commons just launched CC Signals, an updated open licensing framework adapted for the AI age. Mozilla Data Collective officially launched their platform, positioned to help communities actually implement these types of custom licenses. I saw that they’re already hosting valuable datasets, and we’re in continued conversations about potentially hosting CLEAR Global’s TWB Voice dataset there.
I never fathomed earlier how key the work of Common Crawl is in today’s AI boom. Pedro Ortiz Suarez explained how they’re actively making their crawls more inclusive—asking language communities to share URLs of pages in their language rather than sticking to US-centric scraping of the web. A tiny non-profit running the infrastructure that powers so much of what we see today as “AI.”
I also had the chance to speak with startups working on AI-powered applications for local languages, not just data infrastructure. My impression is that making and commercializing sovereign technology is hard. It was both sad but assuring to hear that promising initiatives still struggle to turn their work into stable income, relying on public funding or equity investment rather than sustainable revenue. I say assuring because this validates the exact challenge I faced with my recently closed cooperative (more on that later). We developed open-source ASR, TTS, and MT models for languages like Catalan, Galician, and Judeo-Spanish, envisioning they would power AI applications. They didn’t get picked up beyond occasional artistic installations and prototype apps. And it’s not just us—even state-funded research organizations’ models sit unused. I guess it needs time, but my other impression is that only big tech cloud APIs are seen as trustworthy enough for real-world integrations like say telephone IVR. Something worth exploring deeper in another post.
These three days helped me see where we actually are in building an inclusive digital world—not just the promising developments, but the gaps we still need to address. The distance between big tech and localized initiatives is massive, but it’s promising to see serious work happening across the spectrum.
And finally, the mandatory festival photo with my dear colleague Polly…