Auto-translation used

The problem of transcription of mixed speech: the peculiarity of the Kazakh context and the need for a systematic approach

AICA (aica.kz ) is a transcription and speech analytics service.

Speech transcription is a technology that translates audio into text, making it indispensable for business, medicine, government agencies and other fields. But in Kazakhstan, transcription faces unique difficulties: residents often use a mix of Kazakh and Russian languages, which is also complemented by conversational and trending inserts. This feature creates problems not only for technology, but also for the process of introducing artificial intelligence into all spheres of life, which makes the issue of the participation of the state and scientific organizations in its development relevant.

Kazakhstan is a country with a rich cultural and linguistic heritage, where conversations often include several languages at once. Kazakhstanis habitually switch from Kazakh to Russian in one sentence using "language mixes". This includes not only standard words, but also regional expressions, which makes it difficult for typical audio-to-text translation technologies. Depending on the region, Kazakh, Russian, English and words in other languages are mixed in different ways.

For modern AI models, the problem of mixed languages is the ability to switch between languages correctly, identify them correctly, and convey context. Models capable of translating audio into text, especially those trained in the same language, cannot cope with transitions between languages, which greatly reduces the accuracy of recognition and reduces the potential for using these technologies in Kazakhstan.

Mixed speech requires technology to be flexible and able to work with different languages at the same time. Transcription of audio files with "complex" speech requires multilingual models adapted to local culture, as well as analysis of the features of spoken language. The model should be able to recognize and analyze video recordings and audio files where languages are switched and mixed in real time, and understand dialect differences characteristic of certain regions of the country.

This requires training neural networks, one of the solutions is code switching, a method in which the model learns to recognize frequent language changes. This requires large amounts of high-quality data with local speech features, which is still difficult to access for a relatively small Kazakhstani market. For transcribing the Kazakh language, it is especially important to have models with a high degree of accuracy, since the lack of available language models limits the quality of transcription and the implementation of AI in various fields.

The successful implementation of mixed speech transcription requires government support and cooperation between universities, research institutes and startups. Government programs and initiatives can help address key challenges:

  1. Data collection and annotation. To develop effective models for transcription and automatic translation of audio into text, you need to have a large volume of audio files with typical Kazakh speech. The joint work of universities and companies can provide developers with high-quality data that will become the basis for training models.
  2. Research and development of mixed speech recognition technologies. Universities in Kazakhstan, with the support of government programs and with the participation of startups, can initiate projects aimed at developing technologies for transcribing and translating video into text adapted to the linguistic characteristics of the region.
  3. Investments in the localization of AI technologies. Government funding and grants will help startups develop solutions for the Kazakh market, which will contribute to the wider introduction of AI into the country's economy.

An effective solution to the problem of multilingual transcription opens up huge opportunities for business, government agencies and education. The introduction of AI and speech analytics simplifies working with data, improves the quality of customer service and reduces the cost of manual analysis of conversations. Companies and government agencies will be able to better use the information obtained from conversations with customers to improve the quality of service and maintenance.

Technologies for automatic audio—to-text translation and speech analysis will become important tools for many areas - from contact centers to educational institutions. They will help Kazakhstan to develop AI technologies adapted to local features. This requires close cooperation between the State, scientific organizations and the private sector.

Comments 7

Login to leave a comment

Очень интересная проблема, особенно для Казахстана, где языковая многослойность является нормой. Согласен, что для эффективной транскрибации смешанной речи необходимы локализованные решения и коллаборация с университетами. Как думаете, какие шаги должны быть первоочередными для того, чтобы стимулировать разработку таких технологий в Казахстане?

Reply

Добрый день. Немного погрузимся в текущую ситуацию... Такие работы ведутся, к примеру НУ (их финансирует государство) или Яндексом (с большими ресурсами и Казахстан для них один из основных рынков поэтому они в него складываются), некоторые рядом с госструктурами частные компании этим понемногу занимаются. Недавно Яндекс анонсировал дуальную систему транскрибации (рус-каз), но без глубинных данных, их система делает анализ предложения и после определяет на каком языке слова и переключает модель между языками, но это не решает проблемы так как в нашей речи внутри 1 предложения может быть сразу слова на 2-х языках + сленг + региональная особенность. Мы тестировали такой вариант на своих моделях (да, это немного улучшает точность), но не решает проблему. По слухам один из крупных банков внутри себя работает над собственной моделью. На текущий момент ни один сервис в Казахстане не может этого делать с точностью более 90%. Здесь стоит отметить, что это не только проблема Казахстана, весь мир с этой проблемой столкнулся и даже большие языковые группы испытывают проблемы. Теперь по поводу реальных шагов - нужно выделить 5 ВУЗов страны и совместно с Астана Хаб и стартапами, кто специализируется на этом, провести большую работу для создания действительно рабочей модели, которая будет понимать нашу речь и быть при этом прикладной. Управление и координация должна быть со стороны Астана Хаб, финансирование от государства и крупных компаний (заинтересованных в технологии и готовых ее внедрять). Разработку делать сразу прикладную и разворачивать как продукт или сервис на гос услугах, что бы граждане могли получить положительный эффект от технологии и научной деятельности.

Reply