Auto-translation used

Creating a modern TTS application with OpenAI: From idea to implementation

In the digital age, Text-to-Speech (TTS) is becoming increasingly popular. This is especially true for multilingual countries such as Kazakhstan, where it is important to ensure support for both the official Kazakh language and Russian. In this article, we will look at the process of creating a modern web application for TTS using the OpenAI API and Next.js .

The OpenAI Speech API offers several key advantages:

  • High-quality natural-sounding speech synthesis
  • Multilingual support, including Kazakh and Russian
  • A variety of voices (6 different options)
  • Flexible settings for output speed and format
  • Easy integration via the REST API

Our application is built on a modern technology stack:

  • Next.js 15 with App Router for server rendering
  • TypeScript for type safety
  • Tailwind CSS for styling
  • React Hooks for state management
  • Next.js API Routes for processing requests
  • OpenAI SDK for integration with the Speech API
  • Error handling and data validation

The application works with texts in various languages, which is especially important for Kazakhstan.:

// Example of Kazakh text processing

const kazakhText = "Selemetsiz be! Bulq

tilindegi matin." ;

const russianText = "Hello! This is the text on

in Russian." ;

Users can choose from 6 different OpenAI voices.:

  • Alloy - neutral voice
  • Echo - male voice
  • Fable - with a British accent
  • Onyx - deep male voice
  • Nova - female voice
  • Shimmer - a soft female voice

A unique feature of our application is the ability to customize intonation and speech style.:

const intonationMap = {
  'excited': 'Speak with excitement and enthusiasm: ',
  'calm': 'Speak calmly and peacefully: ',
  'serious': 'Speak seriously and formally: ',
  // ... other options
};

The main request processing logic is located in /api/tts/route.ts:

export async function POST(request: NextRequest) {
const { text, voice, speed, format, model, intonation, speechStyle, apiKey } = await request.json();

// Data validation
  if (!apiKey || !text) {
    return NextResponse.json({ error: 'Insufficient data' }, { status: 400 });
  }

// Creating an improved product
  let enhancedText = text;
  if (intonation !== 'neutral') {
    enhancedText = intonationMap[intonation] + text;
  }

// Audio generation
  const mp3 = await openai.audio.speech.create({
    model: model || 'tts-1',
    voice: voice || 'alloy',
    input: enhancedText,
    speed: speed || 1.0,
    response_format: format || 'mp3',
  });
  
  return new NextResponse(Buffer.from(await mp3.arrayBuffer()));
}

The interface is built using React components and Tailwind CSS:

export default function Home() {
  const [text, setText] = useState('');
  const [voice, setVoice] = useState('alloy');
  const [speed, setSpeed] = useState(1.0);
  // ... other states
  
  const handleGenerate = async () => {
    const response = await fetch('/api/tts', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ text, voice, speed, format, model, intonation, speechStyle, apiKey })
    });
    
    if (response.ok) {
      const audioBlob = await response.blob();
      const url = URL.createObjectURL(audioBlob);
      setAudioUrl(url);
    }
  };
  
  return (
    <div className="min-h-screen bg-gradient-to-br from-blue-50 to-indigo-100">
{/* UI components */}
</div>
  );
}

It is important never to store API keys in code.:

// ❌ Is incorrect
const apiKey = 'sk-proj-...';

// ✅ Correct
const [apiKey, setApiKey] = useState('');

All incoming data must be checked.:

if (!apiKey) {
  return NextResponse.json({ error: 'API key required' }, { status: 400 });
}

if (!text) {
  return NextResponse.json({ error: 'Text required' }, { status: 400 });
}

Various types of errors can be handled.:

try {
  const mp3 = await openai.audio.speech.create({...});
  // ...
} catch (error: any) {
  console.error('TTS Error:', error);
  return NextResponse.json(
    { error: error.message || 'Error when generating audio' },
    { status: 500 }
  );
}

Next.js 15 with Turbopack provides fast build and hot reboot.

Audio files are created only at the user's request.

The browser automatically caches the generated audio files.

Simple deployment with a single command:

npm run build
# Automatic deployment via Git integration
npm run build
npm start
  1. 1.SSML support for more precise pronunciation control
  2. 2.Batch processing for long texts
  3. 3.Integration with the database to save the history
  4. 4.User authentication
  5. 5.Support for additional languages

Creating a modern TTS application with the OpenAI API demonstrates the power of modern web technologies. The Next combination.The js, TypeScript, and OpenAI Speech API allow you to create high-quality applications with minimal effort.

It is especially important that such solutions promote digitalization and accessibility of content in national languages, which is important for multilingual countries like Kazakhstan.

The project demonstrates the best practices of modern web development: type safety, component architecture, API security and user-friendly interface.

The full source code of the project is available on GitHub with detailed documentation and deployment instructions.

https://github.com/AubakirovArman/ttskazakh

!

Comments 0

Login to leave a comment