Indian AI startup Sarvam has unveiled its latest AI model called Bulbul-V2. It is a text-to-speech (TTS) model that supports 11 Indian languages. The Bengaluru-based AI startup claims that the AI model comes with authentic accents that sound ‘just like India’.
In its post on LinkedIn, the company said that the AI-generated voice sounds real and not robotic or rehearsed. It also offers lightning-fast performance and comes with custom voices that are ideal for brands and businesses.
The company claimed that the new model has set new benchmarks for speech AI in India. Sarvam AI also said it is making AI more accessible in the country with lower-latency models and India-first pricing for API access. It is the first startup chosen by the central government to build India’s sovereign large language model (LLM) as part of the broader IndiaAI mission.
What is Bulbul-v2?
Bulbul-v2 is the company’s flagship text-to-speech model that has been specifically designed for Indian languages and accents. According to the company, the AI model offers natural-sounding speech with human-like prosody and is capable of having multiple voice personalities. The AI model has multi-language and code-mixed text support with real-time synthesis capabilities. It also has fine-grained control over pitch, pace, and loudness.
When it comes to features, the model comes with voice control, sample rate options, text reprocessing, and language support. Along with fine-grained control and loudness, the model features multiple sample rates from 8kHz to 24 kHz. The model is capable of smart normalisation of numbers, dates, and mixed-language text.
What can Bulbul-v2 do?
When it comes to capabilities, the model can convert text to speech with default settings. It allows users to fine-tune the voice characteristics by adjusting pitch, pace, and loudness.
The company claims that the model is perfect for creating the exact voice style one needs. With the sample rate options, users can opt for the audio quality best suited to their preferences. Moreover, the text preprocessing is a smart way for text normalisation to improve pronunciation of numbers, dates, and mixed language content.
Story continues below this ad
Since it is designed for low latency and comes with cost-effective pricing, Bulbul-v2 could be seen as an efficient alternative to many of its global counterparts. The company introduced Bulbul-v1 in August last year along with six preset voice personalities.
© IE Online Media Services Pvt Ltd
Average Rating