Case-Study: Text-to-Speech

Read about our case study in Text-to-speech!

12/03/2024

Case-Study: Text-to-Speech

Problem

Our client, a small government-backed startup, faced the challenge of expanding multi-platform screen reading capabilities to support the Kazakh language on consumer devices. This task, seemingly straightforward on the surface, involved numerous technical hurdles due to the complexity of developing a text-to-speech (TTS) system for a less commonly supported language like Kazakh.

The need for Kazakh language support arose from a broader initiative to improve accessibility for individuals with visual impairments, ensuring they could interact with consumer devices just as easily as those without impairments.

This issue is especially critical in the case of text-to-speech software, which forms the backbone of accessibility tools that read aloud text from a screen. A case study on this project demonstrates the real-world challenges companies face when working with multiple languages, particularly those that are less supported in mainstream platforms.

Challenges and Constraints

The case study typically focuses on how a business overcomes obstacles, and in this situation, the primary hurdle was the tight deadline. The project had a narrow timeline, which meant there wasn’t enough time to develop custom models or retrain existing ones from scratch. Given this constraint, our team had to rely on pre-trained models that were already available for Kazakh.

However, the pre-trained Kazakh models were mostly accessible only as PyTorch checkpoints, a format that wasn’t directly compatible with the text-to-speech software our client planned to integrate. To deploy these models efficiently on consumer devices, they needed to be converted into more efficient runtime formats, such as ONNX (Open Neural Network Exchange) and CoreML, which work better in real-time environments.

The tight deadline also forced us to consider additional constraints. Generated speech needed to be natural sounding, especially given the sensitivity of the text-to-speech project for visually impaired users. The voice needed to sound as close as possible to human speech, avoiding the common robotic or artificial tone that some TTS systems produce.

Another challenge was the outdated build systems used in some of the open-source application-layer solutions. Although some application-layer tools were available to start from, they often relied on outdated dependencies, which meant they could not be directly implemented without reworking the code and simply copying the functionality from more updated systems. We needed to resurrect these systems to bring them up to date, which took additional time and effort.

Solution

Despite these constraints, our team managed to develop a working solution for text-to-speech software on multiple platforms. For Android and Windows devices, we were able to convert the pre-trained PyTorch models into more efficient runtime formats, such as ONNX. This conversion allowed us to deploy high-quality real-time models capable of generating natural sounding voices that could read aloud text on these devices.

For iOS, however, the solution proved more complex. The same model that worked for Android and Windows turned out to be too memory-intensive when integrated into the iOS screen reader framework. This limitation became a significant roadblock for our team, as the model required more memory to function effectively on iOS.

To circumvent this issue, we opted to develop a standalone application for iOS devices. This standalone app allowed us to bypass the restrictive memory requirements imposed by the AVSpeechSynthesisProviderAudioUnit API.

The trade-off, however, was that the app could no longer integrate directly with the native screen reader framework. Instead, users had to launch the separate app for the text-to-speech functionality. Despite this compromise, we still delivered a functional, real-time solution that met the needs of Kazakh-speaking users with visual impairments.

Results

The project ultimately resulted in a high-quality text-to-speech system available on multiple platforms, including Android, Windows, and iOS. While the iOS version required a standalone application due to memory limitations, the overall solution was deemed a success. It allowed users to access natural sounding speech in the Kazakh language, improving device accessibility for individuals with visual impairments.

The generated speech was of sufficient quality to be practically indistinguishable from human speech, helping users interact with their devices more naturally and comfortably. This was especially important in ensuring the speech could be used not only for casual reading but also in more formal environments like education or business.

Future Steps

Our team proposed a number of potential improvements for future projects. Specifically, we suggested continuing to optimise the model for iOS devices, exploring alternative ways to reduce memory consumption. Additionally, as AI voice technologies continue to evolve, future projects could incorporate more advanced deep learning models to enhance the quality of text-to-speech software even further.

Why This Case Study Matters

This case study is an in-depth examination of the unique challenges that arise when developing text-to-speech solutions for a less commonly supported language like Kazakh. It highlights the practical difficulties of working with pre-trained models and outdated systems and underscores the importance of real-world problem-solving in AI research and development.

For companies developing business case studies, this project offers a clear example of how to tackle language-specific challenges under tight deadlines, balancing high-quality output with the constraints of memory, processing power, and pre-existing software limitations.

Text-to-Speech: A Broader Perspective

Text-to-speech technologies have gained increasing importance in recent years, thanks to the rise of AI technologies. These systems are used not only to support accessibility for those with visual impairments but also in a wide range of other industries. For example, text-to-speech software is used in customer service applications, virtual assistants, and content generation. A free text-to-speech solution can help businesses save time by automating the reading of documents, reports, or other text-heavy content.

AI voice technology has come a long way from its early, robotic-sounding origins. Today’s systems use deep learning to generate voices that sound more human-like, and they can operate in multiple languages, making them suitable for a wide range of global applications. By developing solutions that can handle niche languages like Kazakh, companies can expand their markets and serve previously underserved populations.

The Role of AI in Improving Text-to-Speech

The quality of text-to-speech depends heavily on the underlying AI models. These models must be trained on large data sets that include a wide variety of speech patterns, accents, and dialects. For example, creating a natural-sounding Kazakh voice required careful attention to the nuances of the language, which differ from more commonly supported languages like English or French.

One of the key advantages of modern AI voice technologies is their ability to perform specific tasks at a high level. Whether it’s generating a natural sounding voice in real-time or processing large amounts of text quickly, AI-powered text-to-speech solutions are designed to be both efficient and adaptable.

The role of deep learning and neural networks in these systems cannot be understated. These advanced techniques allow TTS software to learn from massive amounts of data sets, refining the way it handles speech patterns and improving the overall quality of the generated voice. This is crucial for providing users with an experience that feels as natural as possible, whether they’re using the TTS software to read documents, emails, or websites.

Beyond Kazakh: Applications of Text-to-Speech in Multiple Languages

While this case study focused on developing a Kazakh language solution, the underlying technology can be adapted for other languages as well. Companies that operate in multilingual environments can benefit from investing in high-quality text-to-speech systems that support multiple languages. This enables them to reach more customers and provide better service, particularly in regions where access to technology is limited.

For instance, businesses operating in North America, Europe, or Asia can integrate text-to-speech solutions into their customer service platforms, allowing customers to interact with their services in their native languages. This not only improves customer satisfaction but also helps to save time by automating tasks that would otherwise require human intervention.

Conclusion

The success of this project demonstrates the potential for text-to-speech technologies to expand accessibility and improve the user experience for people with visual impairments. By overcoming the challenges posed by outdated systems, limited memory, and pre-trained models, our team was able to deliver a real-world solution that met the needs of Kazakh-speaking users across multiple platforms.

As the field of AI research continues to evolve, we expect to see even more natural sounding and high-quality text-to-speech systems that can handle increasingly complex, specific tasks.

Whether it’s for business case studies, academic research, or practical applications in customer service, AI-powered text-to-speech will continue to play a crucial role in shaping how we interact with technology in the years to come.

At TechnoLynx, we are committed to staying at the forefront of this innovation, helping businesses deploy cutting-edge text-to-speech solutions that make a difference.

Read our Blog!

Technical Excellence

Founded in 2019 by Balázs Keszthelyi, co-inventor of more than a dozen patents and contributor to two international standards, we know how to beat the state-of-the-art.

Balázs’ passion for high quality and superior performance sets a high bar, generating value for our clients and growth for our employees.

Meet our team

Technologies

Computer Vision
Generative AI
Extended Reality (XR)

What We Do

We specialise in guiding clients through the entire research and development journey, from initial prototyping to seamless integration and even safeguarding intellectual property. As an innovative solutions center, we not only identify areas for workflow enhancement but also actively engage in crafting and implementing solutions.

Reach out!

Services

Technical Business Analysis & Consulting
R&D Outsourcing
Custom Software Development
MLOps
Performance Optimisation

27/01/2025

AI Assistants: Surpassing the Limits of Productivity

Don’t we all dream of getting things done with the least amount of effort? AI assistants are here to solve this issue for most. If you are into content creation, if quality control is your main occupation or if your goal is to manage your tasks efficiently and set priorities, don’t miss this article.

15/11/2024

ChatGPT Cheat Sheet for Mastering AI Prompts

Learn how to use ChatGPT effectively with this ultimate cheat sheet. Get tips on prompts, fine-tuning, AI-generated responses, and making the most of this language model.

29/10/2024

Maximise Call Center Efficiency with AI Solutions

Boost call centre efficiency with AI. Learn how artificial intelligence improves customer service, reduces wait times, and enhances call centre performance.

22/10/2024

AI Chatbots and Productivity: How They Boost Economic Growth

Learn how AI chatbots improve productivity, enhance customer service, and contribute to economic growth by optimising business processes in real time.

9/10/2024

How do AI detectors identify AI-written content?

Learn how AI detectors identify AI-generated content and differentiate it from human-written text. Discover the tools and techniques used by AI content detectors, including machine learning models and real-time detection methods.

16/09/2024

How AI Chatbots Are Transforming Industries Worldwide

Learn how advanced chatbots are revolutionising industries through machine learning, real-time customer service, and natural language processing. Discover how TechnoLynx can provide solutions for businesses with cutting-edge chatbots.

22/08/2024

How NLP Solutions Are Improving Chatbots in Customer Service?

Learn how NLP solutions and machine learning are improving chatbots, enabling better customer service through natural language understanding, sentiment analysis, and real-time interactions.

25/04/2024

The Impact of Conversational AI on the Insurance Industry

Discover how conversational AI is transforming the insurance industry. From virtual assistants to claims processing, learn how generative AI models are improving customer satisfaction and streamlining operations.

24/04/2024

The Ultimate ChatGPT Cheat Sheet: Crafting Effective Prompts

Learn how to write engaging prompts for ChatGPT with this guide. Crafted for marketers and content creators, discover tips to generate compelling content effortlessly.

27/02/2024

AI in Customer Service: Efficiency and Personalisation

Learn how companies use artificial intelligence to improve customer service benefits for business success!

23/02/2024

How can artificial intelligence replace virtual assistants?

Find out how AI is reinventing virtual assistance, and look at how TechnoLynx provides innovative AI solutions for augmenting AI in the provision of support services.

8/02/2024

Microsoft's AI Journey from Bing to Copilot

Examining Microsoft's transition from Bing to Copilot, witnessing the evolution of its AI strategy and its impact on user experiences.

17/01/2024

Amazon's AI Banter: Your Shopping Questions Just Got Witty!

Amazon is shaking up online shopping with a new AI tool that answers product queries in a flash, adds a touch of humor, and promises a smarter, more fun shopping experience.

4/01/2024

Microsoft's new button for AI chatbot

This new button in the upcoming Microsoft laptops will enable AI chatbot instantly!

18/12/2023

AI chatbots solve mathematical problems beyond human capacity

A recent article published by The Guardian announces "the first genuine scientific discovery made by large language models (LLMS)".

15/12/2023

Case-Study: Performance-porting of GPU application from OpenCL to Metal

This case study demonstrates our successful project in GPU application!

28/11/2023

Conversational AI – Beyond Basic Chatbots

Inspired by today's article recommendations, Marcin Frąckiewicz's piece on TS2 Space (linked below), we'd like to discuss the latest updates on Conversational AI

27/10/2023

GPT-3 and GPT-4: Model architecture comparison

A new article written by Natalia Toczkowska takes a closer look at the advancements and differences between GPT-3 and GPT-4, two significant AI language models.

27/09/2023

AI Art Prompts with Adobe Firefly

We have previously talked about Adobe Firefly and the new possibilities that opened up with it in the world of AI-generated art.

30/08/2023

Deep Learning - the South Park episode co-written with ChatGPT

The latest episode of the iconic animated series "South Park" called "Deep Learning" featured a surprising co-writer: ChatGPT, OpenAI's advanced language model.

23/08/2023

Communicating with animals through AI

Can artificial intelligence really help us converse with animals? Let's find out!

22/08/2023

Conversational AI vs Generative AI

In the rapidly growing landscape of artificial intelligence, two prominent domains have captured significant attention: Conversational AI vs Generative AI.

6/06/2023

Case-Study: NLP Applications for Stock Market Prediction

Read all about our case study in Stock Market Prediction!

24/05/2023

ChatGPT Cheat Sheet

We stumbled upon this fantastic ChatGPT cheat sheet packed with tips, tricks, and best practices to level up your conversations with Chat...

16/05/2023

How ChatGPT can improve the roadmap process in product development

ChatGPT can assist in prioritizing features and initiatives by analyzing input from multiple stakeholders. It can help identify common themes, provide recommendations based on data and insights.

15/05/2023

Case-Study: Performance Modelling of AI Inference on GPUs

Read all about our case study in Performance Modelling of AI Inference in GPUs!

4/04/2023

ChatGPT in cybersecurity

One of the hottest news in the AI field is the launch of GPT-4 by DeepAI. As the developers state, outperforming the previous version is not the only improvement made.

22/03/2023

GPT-4 vs GPT-3.5

The upcoming AI model, GPT-4 has the potential for accuracy, training speed, and size improvements over GPT-3.5, but ethical concerns remain.

10/02/2023

Case Study: Multi-Target Multi-Camera Tracking

Read all about our case study in Multi-Target Multi-Camera Tracking!

30/01/2023

ChatGPT and Plagiarism in Education: A New Challenge

As ChatGPT becomes increasingly widespread, its implications for the educational sector are starting to show.

11/01/2023

Case-Study: Action Recognition

We are proud to present our detailed case study in Action Recognition!

2/11/2022

Consulting: AI for Personal Training

Read all about our case study in AI application in Personal Training!

22/05/2022

Case-Study: A Generative Approach to Anomaly Detection

See how we successfully compeleted this project using Anomaly Detection!

29/12/2020

Case Study - Accelerating Cryptocurrency Mining

Our client had a vision to analyse and engage with the most disruptive ideas in the crypto-currency domain. Read more to see our solution for this mission!

10/11/2020

Case Study - AI-Generated Dental Simulation

Our client, Tasty Tech, was an organically growing start-up with a first-generation product in the dental space, and their product-market fit was validated. Read more.

17/09/2020

Case Study - Fraud Detector Audit

Discover how a robust fraud detection system combines traditional methods with advanced machine learning to detect various forms of fraud!

15/04/2020

Case Study - Embedded Video Coding on GPU

TechnoLynx developed a customised embedded video coding solution using GPU optimisation, dedicated graphics cards, and discrete GPUs to enhance video compression efficiency, performance, and integration within the client’s pipeline.

23/01/2020

Case Study - Accelerating Physics -Simulation Using GPUs

TechnoLynx used GPU acceleration to improve physics simulations for an SME, leveraging dedicated graphics cards, advanced algorithms, and real-time processing to deliver high-performance solutions, opening up new applications and future development potential.