Astanahub Logo
Astanahub Logo
Home
Community
Tax Incentives
Programs
Hub Market
Vacancies
Tech Tasks
Products and services
Events
Online Courses
Tech Orda
Relocation
Contacts us
Additional
Join us Login
Back
Publish

Post

Event

Vacancy

Initiative

Technological task

  • Feed
  • Programs
  • Tax incentives
    • Become a participant
    • Technopark members
  • Technological tasks
  • Events
  • Networking
  • Tech Orda
  • Vacancies
  • Infrastructure
    • Laboratories and equipment
    • Astana Hub pavilions
    • Regional Hubs
  • Marketplace
  • Relocation
    • Open an IT company
    • Expat Centre
  • About astanahub.com
  • Contact us
  • Social media

astanahub.com © 2020-2025. All rights reserved

Privacy policy Terms of use Additionally F.A.Q.

Audio-speech data

Datasets with audio recordings, labeled speech data, audio clips with different accents and languages.

Common Voice

Common Voice is a multilingual dataset of voice recordings, contributed by volunteers from around the world. It aims to provide a wide variety of speech data for developing and training speech recognition systems. The dataset includes diverse accents, dialects, and languages, making it a valuable resource for researchers and developers working on voice technology and natural language processing.

Read more

Google Speech Commands

Google Speech Commands is a dataset consisting of thousands of labeled audio recordings of spoken commands, designed for training machine learning models in speech recognition tasks. The dataset includes a variety of commands spoken by different speakers, enabling the development of voice-activated applications and systems. It is widely used in research and development for creating more efficient and accurate speech recognition systems.

Read more

OpenSLR 96

OpenSLR 96 is a dataset that provides a collection of speech recordings designed for training and evaluating automatic speech recognition systems. It includes a diverse range of speakers and acoustic environments, making it suitable for developing robust models that can perform well in real-world conditions. The dataset is openly available for research and development purposes, supporting advancements in speech technology.

Read more

VoxCeleb 1

VoxCeleb 1 is a large-scale speaker recognition dataset consisting of thousands of audio clips sourced from YouTube videos. It features a diverse set of speakers from various backgrounds and languages, making it suitable for training and evaluating models in speaker identification and verification tasks. The dataset includes variations in acoustic conditions, providing a comprehensive resource for research in voice recognition technology.

Read more

OpenSLR 12

OpenSLR 12 is a dataset designed for automatic speech recognition research, featuring a collection of high-quality audio recordings of read speech. It includes recordings in multiple languages and various speaking styles, providing a rich resource for developing and testing speech recognition models. The dataset is openly available to facilitate research and development in the field of speech technology.

Read more

TEDLIUM 

TEDLIUM is a dataset derived from TED Talks, featuring a collection of audio recordings along with their corresponding transcripts. This dataset is designed for training and evaluating automatic speech recognition systems and is characterized by diverse speakers, topics, and speaking styles. The rich content of TED Talks provides a valuable resource for research in speech technology and natural language processing.

Read more

Urban Sound 8K

Urban Sound 8K is a dataset consisting of 8,732 labeled audio recordings of urban sounds from various environments, including streets, parks, and public transport. It is designed for the development and evaluation of models in sound classification and environmental sound recognition. The dataset covers a wide range of sound categories, making it a valuable resource for research in audio processing and machine learning applications related to urban environments.

Read more

DARPA TIMIT

DARPA TIMIT is a widely used dataset for acoustic-phonetic research and automatic speech recognition. It consists of recorded speech from 630 speakers of American English, with a diverse range of dialects and accents. The dataset includes phonetically balanced sentences and their corresponding transcriptions, providing valuable resources for training and evaluating speech recognition models and conducting linguistic analysis.

Read more

FMA (Free Music Archive)

FMA (Free Music Archive) is a dataset that provides a large collection of music tracks across various genres, all available for free and open use. It includes metadata such as artist information, track titles, and genre classifications, making it a valuable resource for music information retrieval, analysis, and machine learning applications. The dataset is widely used for research in audio processing, music recommendation systems, and classification tasks.

Read more

Google Audioset

Google Audioset is a large-scale dataset designed for audio event classification. It contains over 2 million human-labeled 10-second audio clips from YouTube videos, covering a wide variety of sound events across multiple categories, such as music, speech, environmental sounds, and animal sounds. This diverse dataset is invaluable for training and evaluating machine learning models in the fields of sound recognition, audio classification, and machine learning applications.

Read more

VoxForge

VoxForge is an open-source speech corpus that provides a collection of transcribed audio recordings contributed by volunteers from around the world. It is designed to support the development of speech recognition systems in various languages and dialects. The dataset includes diverse speech samples, making it a valuable resource for researchers and developers working on speech technology and natural language processing applications.

Read more

REVERB Challenge

The REVERB Challenge dataset is designed for research in reverberation and sound source localization. It consists of recorded audio samples that simulate various acoustic environments with different levels of reverberation. This dataset is used to evaluate algorithms for dereverberation and to improve the performance of speech recognition systems in challenging acoustic conditions. The REVERB Challenge promotes advancements in audio processing and localization technologies.

Read more

RAVDESS

RAVDESS (The Radboud Faces Database) is a dataset of emotional speech and song recordings, designed for research in emotion recognition. It consists of a diverse range of actors expressing various emotions, including happiness, sadness, anger, and fear, through spoken phrases and singing. The dataset includes both audio and video recordings, making it a valuable resource for developing and evaluating models for emotion detection in speech and audio processing applications.

Read more

NSynth (Neural Synth)

NSynth (Neural Synth) is a dataset created by Google that contains over 300,000 musical notes generated from a wide variety of instruments. Each note is represented as a spectrogram, allowing for rich audio synthesis and machine learning applications. NSynth is designed for training neural networks to generate new sounds and explore the possibilities of sound synthesis, making it a valuable resource for researchers and developers in the fields of music technology and audio processing.

Read more

ESC 50

ESC-50 is a dataset for environmental sound classification, containing 2,000 labeled audio recordings of 50 different sound classes. Each class includes 40 recordings, featuring sounds from nature, human activities, and man-made environments. The dataset is designed to facilitate research in sound recognition and machine learning applications, making it a valuable resource for developing and evaluating models for environmental sound classification.

Read more

IEMOCAP (Interactive Emotional Dyadic Motion Capture)

IEMOCAP (Interactive Emotional Dyadic Motion Capture) is a multimodal dataset designed for emotion recognition research. It includes audio, video, and motion capture data of actors performing scripted dialogues with varying emotional expressions. The dataset features multiple emotions such as happiness, sadness, anger, and frustration, providing a rich resource for developing and evaluating models for emotional analysis in speech and video processing applications.

Read more

VoxConverse

VoxConverse is a dataset designed for studying conversational speech, featuring recordings of natural dialogues between speakers in various settings. It contains diverse conversational topics and a wide range of speech styles, making it suitable for research in areas such as speech recognition, dialogue systems, and emotion detection. The dataset provides a valuable resource for developing and evaluating models that analyze and understand conversational interactions.

Read more

AVSpeech

AVSpeech is a dataset designed for research in audiovisual speech recognition, consisting of paired audio and visual recordings of speakers. It includes a diverse range of speakers, languages, and contexts, allowing for the study of how visual cues, such as lip movements, enhance speech recognition accuracy. This dataset is valuable for developing and evaluating models that integrate both audio and visual information in speech processing applications.

Read more

Kazakh ASR Dataset

The Kazakh ASR Dataset is designed for automatic speech recognition research in the Kazakh language. It includes a collection of audio recordings from various speakers, covering a range of topics and speech styles. The dataset aims to provide valuable resources for training and evaluating speech recognition models tailored to the Kazakh language, facilitating advancements in speech technology and applications in natural language processing.

Read more

Kazakh Speech Corpus 

The Kazakh Speech Corpus is a comprehensive dataset designed for speech recognition and linguistic research in the Kazakh language. It comprises a variety of audio recordings from native speakers, covering diverse speech styles, dialects, and topics. This corpus serves as a valuable resource for developing and testing automatic speech recognition systems, phonetic studies, and other applications in natural language processing, promoting advancements in Kazakh language technologies.

Read more

EmoReact

EmoReact is a dataset designed for emotion recognition in videos, featuring a collection of video clips annotated with various emotional responses. It includes diverse scenarios, expressions, and contexts, making it suitable for training models to detect and analyze emotions in visual media. This dataset provides a valuable resource for researchers and developers working on applications in affective computing, emotion analysis, and multimedia processing.

Read more

Common Voice 17.0

Common Voice 17.0 is a multilingual dataset of voice recordings collected from volunteers around the globe, aimed at improving speech recognition technology. It features a wide variety of spoken phrases in multiple languages, accompanied by diverse accents and dialects. This dataset is valuable for training and evaluating automatic speech recognition systems, making significant contributions to the development of inclusive and accurate voice technologies.

Read more
QR

Mobile App

Join the Unicorn Game

© 2025, Autonomous cluster fund «Park of innovative technologies»

Privacy Policy User Agreement F.A.Q.

Login

No account? Registration
Forgot your password?

Authorization

Choose the authorization method that is convenient for you
  • Continue with Google Account
  • Continue using EDS
  • Login via email
No account? Registration
Please ensure the confidentiality of the username and password! By continuing, you accept the terms and offers of Astana Hub

Registration

Choose the registration method that is convenient for you
  • Continue with Google Account
  • Continue using EDS
  • Registration via email
Already have an account? Login
Please ensure the confidentiality of the username and password! By continuing, you accept the terms and offers of Astana Hub

Registration

Already have an account? Login

Вход через ЭЦП

У меня уже есть аккаунт. Хочу войти

ИИН:

Регистрация через ЭЦП

У меня уже есть аккаунт. Хочу войти

ИИН:

Продолжая, Вы принимаете условия и предложения AstanaHub

Регистрация

Войти под другим логином

Пройдите по ссылке, которую мы отправили Вам на почту , для завершения регистрации

Восстановление пароля

Смена пароля

Ваш пароль устарел. Пожалуйста, смените пароль в целях безопасности

Password change

Добавить email

Enter a new email address to be used when logging in


Add a phone number

Enter a phone number to use when authorizing the system


Recover password

Sign in with another login

Enter your email address to which you would like to receive a link to recover your password

Log in with another account

Follow the link we sent you in on email

Успешная регистрация!

Поздравляем, вы успешно зарегистрированы на платформе astanahub.com

Отлично

Your account has been
blocked

Log in with another account

Your account has been blocked because your account password has been entered incorrectly more than 3 times

Opportunities are
being created here for the free development
of innovative IT projects

Log in

Or
Log in with Gmail
Please ensure the confidentiality of the username and password! By continuing, you accept the terms and offers of Astana Hub
Opportunities are
being created here for the free development
of innovative IT projects

Enter your password
You log in with email

Forgot your password?
Opportunities are
being created here for the free development
of innovative IT projects

Enter your password
You log in with phone number

Forgot your password?
Opportunities are
being created here for the free development
of innovative IT projects

Enter the SMS code
We have sent it to your number

Opportunities are
being created here for the free development
of innovative IT projects

Enter the SMS code
We have sent it to your number

Request the code again in 0 seconds
Request the code again
Opportunities are
being created here for the free development
of innovative IT projects

Create password

Minimum of 8 characters

Capital letters A-Z

Lowercase letters a-z

One digit

One special character

Upon completion of registration, we will automatically add you to «Networking» for networking in the Astana Hub ecosystem.

Opportunities are
being created here for the free development
of innovative IT projects

Enter the SMS code
We have sent it to your number

Request the code again in 0 seconds
Request the code again
Opportunities are
being created here for the free development
of innovative IT projects

What is your name?

The «Networking» section is designed to develop networking, find like-minded people and expand business connections.

Opportunities are
being created here for the free development
of innovative IT projects

Enter your password
You log in with email

Forgot your password?
Opportunities are
being created here for the free development
of innovative IT projects

Enter the confirmation code
We have sent it to your email

Request the code again
Opportunities are
being created here for the free development
of innovative IT projects

Your account has been
blocked

Your account has been blocked because your account password has been entered incorrectly more than 3 times

Restore password
Log in with another account