Auto-translation used

Read the original

KAZ-LLM представлена Главе государства. В разработке участвовали Beeline Казахстан и QazCode

11 декабря в Астане Президенту Казахстана Касым-Жомарту Токаеву была представлена национальная языковая модель KAZ-LLM. Модель была разработана под руководством Института умных систем и искусственного интеллекта (ISSAI NU) в партнерстве с Beeline Казахстан и его ИТ-компанией QazCode, а также Astana Hub. Проект координируется Министерством цифрового развития, инноваций и аэрокосмической промышленности РК (МЦРИАП РК). Модель имеет стратегическое значение для всей страны, поскольку решает проблему языкового разрыва с помощью ИИ.

Как разрабатывалась модель KAZ-LLM?

KAZ-LLM от ISSAI основана на 150 миллиардах токенах, тщательно собранных из общедоступных источников на четырех языках — казахском, русском, английском и турецком. Это позволяет модели демонстрировать высокую точность и универсальность, обеспечивая улучшенное качество обработки текстов на различных языках и способствуя улучшению перевода. Токенами называют минимальные единицы текста, такие как слова, их части или даже отдельные символы, которые ИИ использует для анализа и понимания информации.

Интерфейс и функциональность модели KAZ-LLM были разработаны с учетом самых передовых мировых стандартов, что подтверждает высокую технологическую зрелость и широкий потенциал модели. Для оценки ее производительности использовались комплексные бенчмарки с вопросно-ответными парами, охватывающие разнообразные области знаний. Пакет бенчмарков включал в себя следующие тесты:

ARC (AI2 Reasoning Challenge) — проверка научного мышления через вопросы с множественным выбором.
GSM8K — оценка способности решать задачи по математике для начальной школы.
HellaSwag — тестирование логики продолжения предложений.
MMLU (Massive Multitask Language Understanding) — проверка знаний по 57 различным предметам.
Winogrande — оценка здравого смысла в двусмысленных предложениях.
DROP — тестирование навыков понимания прочитанного и логического мышления.

Партнёрство Beeline и QazCode ускорило разработку

Ключевыми партнерами в ее создании стали Beeline Казахстан и его ИТ-компания QazCode, объединив усилия и опыт в создании языковых моделей, таких как Kaz-RoBERTA, а также в разработке ИИ-решений для малых языковых групп в сотрудничестве с зарубежными партнерами. Поддержка в виде предоставленных серверов с вычислительными мощностями 8 DGX H100 значительно ускорила процесс обучения и расширила возможности модели. Для сравнения: обычному компьютеру понадобится несколько дней, чтобы проанализировать архив из 1 миллиона фотографий. В то время как 8 серверов DGX H100, использующихся для обучения ISSAI KAZ-LLM, справятся с этой задачей всего за несколько секунд.

На базе этих серверов разработчики обучили две версии модели — с 8 миллиардами и 70 миллиардами параметров, к процессу присоединились дата-сайентисты QazCode.

“Наша команда активно участвовала в разработке и обучении модели KAZ-LLM. При создании LLM разработчики и партнеры использовали современные технологии машинного обучения, такие как PyTorch и Torchtune, а также учитывали опыт предыдущих проектов по адаптации open source архитектур LLM для казахского языка. В ходе обучения, которое продолжалось 50 дней непрерывных вычислений, модель улучшила способность понимать контекст и обеспечивать высокое качество взаимодействия с пользователями. Тестирование показало, что модель успешно решает технические задачи, учитывая культурные и языковые особенности казахского языка»,- поделился СЕО QazCode Алексей Шаравар.

О результатах и перспективах KAZ-LLM

Исследователи отмечают, что проект - это важная веха на пути Казахстана на мировой арене искусственного интеллекта: «Эта модель отражает стремление Казахстана к инновациям, самостоятельности и росту своей технологической экосистемы. Наша команда подготовила две версии ISSAI KAZ-LLM с 8 миллиардами и 70 миллиардами параметров, построенные на архитектуре Meta Llama и оптимизированные для высокопроизводительных систем и сред с ограниченными ресурсами. Модели, выпущены по лицензии CC-BY-NC, которые доступны для некоммерческого использования на сайте Hugging Face, способствуя глобальному академическому и исследовательскому сотрудничеству. Таким образом разработчики смогут скачать и запустить нашу модель как на сложных серверах так и на ноутбуках », - рассказал директор ISSAI профессор NU Хусейн Атакан Варол.

Ожидается, что ISSAI Kaz-LLM откроет новые возможности для создания стартапов и инновационных проектов на базе ИИ. В дальнейшем планируется разработка моделей следующего поколения, которые будут интегрировать языковые и визуальные данные, что позволит значительно расширить возможности ИИ. Также рассматривается добавление поддержки модели других тюркских языков, что позволит укрепить связи между тюркоязычными сообществами.

KAZ-LLM is presented to the Head of State. Beeline Kazakhstan and QazCode participated in the development

December 11 in Astana to the President Kazakhstan's Kassym-Jomart Tokayev was presented with the national language model KAZ-LLM. The model was developed under the guidance of the Institute of Smart Systems and Artificial Intelligence (ISSAI NU) in partnership with Beeline Kazakhstan and its IT company QazCode, as well as Astana Hub. The project is coordinated by the Ministry of Digital Development, Innovation and Aerospace Industry of the Republic of Kazakhstan (ICRIAP RK). The model is of strategic importance for the whole country, as it solves the problem of the language gap with the help of AI.

How was the KAZ-LLM model developed?

ISSAI's KAZ-LLM is based on 150 billion tokens, carefully collected from publicly available sources in four languages — Kazakh, Russian, English and Turkish. This allows the model to demonstrate high accuracy and versatility, providing improved text processing quality in various languages and contributing to improved translation. Tokens are the minimum units of text, such as words, parts of them, or even individual characters that the AI uses to analyze and understand information.

The interface and functionality of the KAZ-LLM model have been developed taking into account the most advanced international standards, which confirms the high technological maturity and wide potential of the model. To assess its performance, comprehensive benchmarks with question-answer pairs covering a variety of fields of knowledge were used. The benchmark package included the following tests:

ARC (AI2 Reasoning Challenge) is a test of scientific thinking through multiple choice questions.
GSM8K — assessment of the ability to solve math problems for elementary school.
HellaSwag — testing the logic of sentence continuation.
MMLU (Massive Multitask Language Understanding) is a knowledge test in 57 different subjects.
Winogrande — evaluation of common sense in ambiguous offers.
DROP — testing of reading comprehension and logical thinking skills.

Beeline and QazCode partnership accelerated the development

Beeline Kazakhstan and its IT company QazCode became key partners in its creation, combining efforts and experience in creating language models such as Kaz-RoBERTA, as well as in the development of AI solutions for small language groups in cooperation with foreign partners. Support in the form of provided servers with computing power of 8 DGX H100 significantly accelerated the learning process and expanded the capabilities of the model. For comparison, an ordinary computer will take several days to analyze an archive of 1 million photos. While 8 DGX H100 servers used for ISSAI KAZ-LLM training will cope with this task in just a few seconds.

On the basis of these servers, the developers trained two versions of the model — with 8 billion and 70 billion parameters, and QazCode data scientists joined the process.

“Our team actively participated in the development and training of the KAZ-LLM model. When creating LLM, developers and partners used modern machine learning technologies such as PyTorch and Torchtune, and also took into account the experience of previous projects on adapting open source LLM architectures for the Kazakh language. During the training, which lasted 50 days of continuous computing, the model improved the ability to understand the context and ensure high-quality user interaction. Testing has shown that the model successfully solves technical problems, taking into account the cultural and linguistic features of the Kazakh language," said Alexey, CEO of QazCode Sharavar.

About the results and prospects of KAZ-LLM

The researchers note that the project is an important milestone on Kazakhstan's path on the world stage of artificial intelligence: "This model reflects Kazakhstan's desire for innovation, independence and growth of its technological ecosystem. Our team has prepared two versions of ISSAI KAZ-LLM with 8 billion and 70 billion parameters, built on the architecture Meta Llama and optimized for high-performance systems and environments with limited resources. The models are released under a CC-BY-NC license, which are available for non-commercial use on the Hugging Face website, facilitating global academic and research collaboration. In this way , developers will be able to download and run our model on both complex servers and laptops," said Professor Hussein Atakan, Director of ISSAI Varol.

ISSAI Kaz-LLM is expected to open up new opportunities for creating startups and innovative AI-based projects. In the future , it is planned to develop next-generation models that will integrate language and visual data, which will significantly expand the capabilities of AI. The addition of support for the model of other Turkic languages is also being considered, which will strengthen ties between Turkic-speaking communities.

1635

История Очистить

Popular posts

How we prepared Aurma for growth: about the infrastructure on Yandex Cloud, which is not visible, but which is important

Виктория Унгурян
Aug. 11, 2025

Artificial intelligence: a smart assistant or a challenge to the future of education?

ЕРЛАН АБДРАИМОВ
Aug. 1, 2025

Education in the 21st century: why we study and how it is changing

ЕРЛАН АБДРАИМОВ
Aug. 1, 2025

AYCOM: Comprehensive IT services for your business

Aycom Technology & Informatics
Aug. 1, 2025

KAZ-LLM представлена Главе государства. В разработке участвовали Beeline Казахстан и QazCode

KAZ-LLM is presented to the Head of State. Beeline Kazakhstan and QazCode participated in the development

Мадина Турлыбекова
Dec. 19, 2024 00:07

Comments 0

Назерке Нұралым · Jan. 7, 2025 14:35

Нуршат Сайлау · Dec. 23, 2024 15:41

Laura Meir · Dec. 23, 2024 12:03

Дмитрий Викторов · Dec. 20, 2024 17:06

Popular posts

How we prepared Aurma for growth: about the infrastructure on Yandex Cloud, which is not visible, but which is important

Виктория Унгурян Aug. 11, 2025

Artificial intelligence: a smart assistant or a challenge to the future of education?

ЕРЛАН АБДРАИМОВ Aug. 1, 2025

Education in the 21st century: why we study and how it is changing

ЕРЛАН АБДРАИМОВ Aug. 1, 2025

AYCOM: Comprehensive IT services for your business

Aycom Technology & Informatics Aug. 1, 2025

KAZ-LLM представлена Главе государства. В разработке участвовали Beeline Казахстан и QazCode

KAZ-LLM is presented to the Head of State. Beeline Kazakhstan and QazCode participated in the development

Мадина Турлыбекова Dec. 19, 2024 00:07

Comments 0

Назерке Нұралым · Jan. 7, 2025 14:35

Нуршат Сайлау · Dec. 23, 2024 15:41

Laura Meir · Dec. 23, 2024 12:03

Дмитрий Викторов · Dec. 20, 2024 17:06

Виктория Унгурян
Aug. 11, 2025

ЕРЛАН АБДРАИМОВ
Aug. 1, 2025

ЕРЛАН АБДРАИМОВ
Aug. 1, 2025

Aycom Technology & Informatics
Aug. 1, 2025

Мадина Турлыбекова
Dec. 19, 2024 00:07