This case study describes how TechnoLynx built a real-time Kazakh text-to-speech (TTS) solution using ONNX, deep learning, and multiple optimisation methods, targeting deployment on Android, Windows, and iOS.
A small government-backed startup, aimed to improve accessibility for individuals with visual impairment by expanding multi-platform screen reading tools to support Kazakh on consumer devices. Kazakh posed specific constraints: it is not widely supported by major platforms, few high-quality pre-trained models exist, and existing tools lacked native support. The client also required fast deployment across Android, Windows, and iOS, with real-time performance and natural-sounding speech under mobile memory and performance limits.
Limited time
The client had a firm release date, so the work relied on pre-trained Kazakh models rather than training or retraining from scratch.
Format compatibility
Deployment required converting PyTorch models into ONNX (Android/Windows) and CoreML (iOS), including reworking inputs, outputs, and inference pipelines.
iOS memory limitations
The iOS screen reader framework (AVSpeechSynthesisProviderAudioUnit) had strict memory limits that the original model could not meet.
Audio quality
Natural-sounding speech was required, including suitability for formal communication (robotic or artificial tones were unacceptable).
Outdated application layers
Some open-source tools were useful but depended on old build systems and dependencies, requiring updates or rewrites of core components.
The work focused on two practical questions: how to deploy a Kazakh TTS model on three platforms using existing tools, and how to maintain high-quality, real-time speech under memory limits.
Converted the PyTorch model to ONNX and optimised model size by reducing layers and quantising weights.
Because the screen reader framework imposed hard memory limits and the model was too heavy, the iOS delivery was implemented as a standalone app to run outside the system’s built-in memory limits.
The standalone iOS app could not integrate directly with VoiceOver, but it provided full text-to-speech functionality with natural voice quality.
The project ultimately resulted in a high-quality text-to-speech system available on multiple platforms, including Android, Windows, and iOS. While the iOS version required a standalone application due to memory limitations, the overall solution was deemed a success.
This case shows how careful planning and the right tools can solve practical language support problems.
We adapted pre-trained models, updated legacy tools, and worked within device limits.
We can apply the approach we used in this project to other languages or platforms.
Continue to optimise the model for iOS devices, exploring alternative ways to reduce memory consumption.
Future projects could incorporate more advanced deep learning models to enhance the quality of text-to-speech software even further.
If you need to ship an on-device voice solution across platforms, while balancing model size, latency, and quality, we can help you define the right conversion, optimisation, and deployment strategy.