Auto-translation used

The problem of transcription of mixed speech: the peculiarity of the Kazakh context and the need for a systematic approach

AICA (aica.kz ) is a transcription and speech analytics service.

Speech transcription is a technology that translates audio into text, making it indispensable for business, medicine, government agencies and other fields. But in Kazakhstan, transcription faces unique difficulties: residents often use a mix of Kazakh and Russian languages, which is also complemented by conversational and trending inserts. This feature creates problems not only for technology, but also for the process of introducing artificial intelligence into all spheres of life, which makes the issue of the participation of the state and scientific organizations in its development relevant.

Kazakhstan is a country with a rich cultural and linguistic heritage, where conversations often include several languages at once. Kazakhstanis habitually switch from Kazakh to Russian in one sentence using "language mixes". This includes not only standard words, but also regional expressions, which makes it difficult for typical audio-to-text translation technologies. Depending on the region, Kazakh, Russian, English and words in other languages are mixed in different ways.

For modern AI models, the problem of mixed languages is the ability to switch between languages correctly, identify them correctly, and convey context. Models capable of translating audio into text, especially those trained in the same language, cannot cope with transitions between languages, which greatly reduces the accuracy of recognition and reduces the potential for using these technologies in Kazakhstan.

Mixed speech requires technology to be flexible and able to work with different languages at the same time. Transcription of audio files with "complex" speech requires multilingual models adapted to local culture, as well as analysis of the features of spoken language. The model should be able to recognize and analyze video recordings and audio files where languages are switched and mixed in real time, and understand dialect differences characteristic of certain regions of the country.

This requires training neural networks, one of the solutions is code switching, a method in which the model learns to recognize frequent language changes. This requires large amounts of high-quality data with local speech features, which is still difficult to access for a relatively small Kazakhstani market. For transcribing the Kazakh language, it is especially important to have models with a high degree of accuracy, since the lack of available language models limits the quality of transcription and the implementation of AI in various fields.

The successful implementation of mixed speech transcription requires government support and cooperation between universities, research institutes and startups. Government programs and initiatives can help address key challenges:

  1. Data collection and annotation. To develop effective models for transcription and automatic translation of audio into text, you need to have a large volume of audio files with typical Kazakh speech. The joint work of universities and companies can provide developers with high-quality data that will become the basis for training models.
  2. Research and development of mixed speech recognition technologies. Universities in Kazakhstan, with the support of government programs and with the participation of startups, can initiate projects aimed at developing technologies for transcribing and translating video into text adapted to the linguistic characteristics of the region.
  3. Investments in the localization of AI technologies. Government funding and grants will help startups develop solutions for the Kazakh market, which will contribute to the wider introduction of AI into the country's economy.

An effective solution to the problem of multilingual transcription opens up huge opportunities for business, government agencies and education. The introduction of AI and speech analytics simplifies working with data, improves the quality of customer service and reduces the cost of manual analysis of conversations. Companies and government agencies will be able to better use the information obtained from conversations with customers to improve the quality of service and maintenance.

Technologies for automatic audio—to-text translation and speech analysis will become important tools for many areas - from contact centers to educational institutions. They will help Kazakhstan to develop AI technologies adapted to local features. This requires close cooperation between the State, scientific organizations and the private sector.