In an era where technology is rapidly reshaping human interaction, a critical question emerges: Will the digital future include the rich tapestry of African languages? At Kabarak University, we are not just asking the question—we are actively building the answer.
We are proud to announce the successful completion of the groundbreaking pilot phase for the African Next Voices data collection initiative, a project where Kabarak University played an integral role. Funded by the Gates Foundation and its partners, this ambitious project has culminated in a first-of-its-kind dataset in five Kenyan languages: Kikuyu, Dholuo, Maasai, Kalenjin, and Somali.
The challenge with most modern speech recognition technologies, like voice assistants and transcription services, is that they are primarily built on data from Western languages. This leaves thousands of languages, including many in Africa, as "low-resource" languages, meaning they are digitally marginalized and risk being left behind.
The African Next Voices project directly tackles this disparity. By collecting both scripted and unscripted speech data spread across eleven key domains of everyday life, the project creates a vital foundation for researchers and developers to train AI models that truly understand Kenyans.
Dr. Andrew K. Kipkebut, PhD, Coordinator of Innovation and Business Incubation at our Directorate of Research, Innovation and Outreach, who served as the Team Lead for Kabarak University, emphasized the significance of this work:
"This pilot is more than just data collection; it's about digital sovereignty. It's about ensuring that when the next generation of Kenyans interacts with technology, they can do so in the language that resonates most deeply with their culture and identity. The patterns we've unearthed, especially in the very low-resource Maasai language, are not just data points; they are the keys to unlocking linguistic inclusion in the AI age."
Why This Matters: More Than Just Words
This meticulously collected dataset is a treasure trove for:
- AI Researchers & Developers: Providing authentic, locally-sourced data to build and test accurate speech-to-text and text-to-speech models for Kenyan languages.
- Educational Technologists: Creating tools for digital learning and literacy programs in native languages.
- Cultural Preservationists: Documenting and digitizing languages for academic study and future generations.
- Every Kenyan: Paving the way for technology that can power everything from agricultural advice in Kalenjin to healthcare information in Somali, all accessible by voice.
This achievement was not ours alone. We extend our congratulations and gratitude to the dedicated teams from:
- Maseno University and Maseno Centre for Applied Artificial Intelligence (MCAAI)
- Dedan Kimathi University of Technology (DeKUT)
- United States International University - Africa (USIU-Africa)
- The Local Development Research Institute (LDRI)
Together, we have laid a strong foundation for a more inclusive technological future.
This pilot is just the beginning. The insights gleaned from this data will fuel further innovation and research right here at Kabarak University.
We will be sharing these fascinating insights soon! Stay tuned to our website and social media channels for deep dives into what the data reveals about speech patterns in native Kenyan languages.
In the meantime, you can read the full story on the TRT Afrika website here: https://trt.global/afrika-english/article/359e1362af39
Kabarak University: Championing Innovation Rooted in Our Heritage.