#TTS

2025-08-12
Olá! Seguem dicas:

🔤💬 Ao pesquisar por esse tipo de programa, o termo costuma ser " #TTS " — Text-To-Speech / texto para fala.

Então, para tornozeleiras eletrônicas de bolso :android: #Android, consegui ver que há várias opções de aplicativos de TTS no F-Droid, repositório de #SoftwareLivre para essa plataforma, mas não sei qual seria mais adequada ao caso. Talvez mais alguém possa confirmar? Acho que, pela descrição deste, poderia ser o TTS Util (licença Apache 2.0 ✔️).

Em distribuições de :gnu: #GNU, geralmente já vem instalado algum mecanismo desses, que pode ser utilizado até pelo :shell: Terminal ou em scripts, por exemplo com o comando spd-say:

spd-say -l pt-BR 'Atenção! A reunião tal começa em 5 minutos.'
Se o idioma da máquina já for o desejado, não precisa especificar.

CC: @Beaux24@mastodon.social
Tao of Mactaoofmac
2025-08-10

AI Speech Technologies

This page is a collection of notes and links related to AI speech technologies, including Text-to-Speech (TTS), Speech-to-Text (STT), voice synthesis, voice cloning, and other rela(...)

taoofmac.com/space/ai/speech

AI Speech Technologies
Tao of Mactaoofmac
2025-08-10

AI Speech Technologies

This page is a collection of notes and links related to AI speech technologies, including Text-to-Speech (TTS), Speech-to-Text (STT), voice synthesis, voice cloning, and other rela(...)

taoofmac.com/space/ai/speech

AI Speech Technologies
2025-08-10

I've been playing around with Chatterbox TTS + Kobold this week, so I wrote a small guide.

spacebums.co.uk/post/chatterbo

Jecturejecture
2025-08-08

Text-to-speech isn’t just for accessibility... students, commuters, and auditory learners are embracing it.
techbullion.com/why-more-peopl

Georgiana Brummell boosted:
Curated Hacker NewsCuratedHackerNews
2025-08-06
Niavy :verified: :bearn:niavy@masto.bike
2025-08-03

Il y a quelque temps, dans un fil sur les apps open source, je me suis fait rembarrer en parlant de Whisper comme alternative à la synthèse vocale Google, au motif que ça appartiendrait à OpenAI. Ça vous dit quelque chose ?

Pour le moment j'utilise avec plaisir SherpaTTS, mais bon, je suis curieux.

En plus, la description de Whisper+ semble indiquer que l'application est capable de traduire à la volée de la langue parlée vers l'anglais ! @Vive_Levant

#SpeechRecognition
#OpenAI
#TTS
#FOSS

2025-08-02

AI Voice Generation Made Easy with Pinokio and OpenAudio

Are you a scientist, developer or just a tinkerer like me? Are you fascinated with the power of AI to generate and clone human voice to include in your work. OpenAudio might be what you are looking for. Leveraging the power of Pinokio it’s easy to download and install OpenAudio on your computer. In this brief introduction I am using an M3 MacBook Air with 16 GB RAM. Follow these instructions to install Pinokio on your computer and discover how easy AI generated speech can become. Pinokio is a browser that enables you to install, run, and automate any AI on your computer.

Now that Pinokio is installed I just click on the ‘Discover’ button at the top right side of the application browser and look for OpenAudio which is the first application listed in the Apps section. Pinokio. is open source with an MIT license and OpenAudio is open source with an Apache 2.0 license. It is based on FishSpeech and has recently rebranded itself as OpenAudio.

Screen picture by Don Watkins CC by SA 4.0

The project has seventy-seven contributors and states on their website that: “We are incredibly excited to unveil OpenAudio S1, a cutting-edge text-to-speech (TTS) model that redefines the boundaries of voice generation. Trained on an extensive dataset of over 2 million hours of audio, OpenAudio S1 delivers unparalleled naturalness, expressiveness, and instruction-following capabilities.”

This model was easy to install on Pinokio and you can quickly and easily start producing your own AI generated speech with it. Your experience may vary depending on your processor and RAM.

Screen picture by Don Watkins CC by SA 4.0

Once installed you will be presented with this easy to use interface.

Screen picture by Don Watkins CC by SA 4.0

These four lines of text generated the audio in 77 seconds in wav format and resulted in 8 seconds of audio in a 684 KB file. There is a download button at the top right of the playback window.

Listen to the audio and judge for yourself.

In addition to text to speech synthesis OpenAudio supports voice cloning. You can use your own voice or upload a sample. Five to ten seconds of reference audio is useful for the generation of the cloned voice. There is a dialogue box at the lower left of the display where this is accomplished along with other controls that override the default settings.

Use of this model is governed by Creative Commons CC by NC-SA 4.0. The project also includes a caveat:

“We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.”

The model is a text-to-speech model based on VQ-GAN and Llama developed by Fish Audio. There are links to the source code and models. The project maintains a Discord channel and a presence on X. Visit the OpenAudio blog for up to date information and research.

Have some fun and install Pinokio and OpenAudio on your computer today. Leverage the power of open source and AI in your projects and join their community of developers if you are inclined.

#AI #TTS #VoiceCloning

2025-08-02

Control your #DECtalk-Mini #TTS
on #Android via #Network ( HTTP or UDP) with the Home24 MediaPlayer App v1.28:

Entité terrestre auto-critiques4mdf0o1@piaille.fr
2025-07-30

Quelqu'un connaîtrait un bon text-to-speech FR #GNUGPL local ?

espeak, même avec mbrola, c'est un peu dur à comprendre, quand même

Y'a Festival et PocketSphinx
quelqu'un a un retour ?

J'avais fait tout un trafic de scripts d'apprentissage avec Julius aussi...
(en reconnaissance vocale)
#TTS #TextToSpeech

2025-07-26
2025-07-26

Spotify isn't working with GrapheneOS.

There seems to be fixes, but I think I'm just going to use this as an excuse to cancel spotify and use other things.

Anyone have a recommendation for #foss #TTS #ereaders ?

If I can't get my audiobooks reliably from Spotify, then I think a decent TTS Engine with epub capabilities would solve the problem. I have one through Google Play I was using but it wasn't foss. Pretty nice though.

so my tts app ive been using to read fics w out worsening migraines just decided its gonna force an ai centered update on all users and nuke old versions. so i need a new tts app thats ai free guys. any suggestions? #screenreader #tts

2025-07-24

I wrote this blueprint for a web app that would make it easier for people to build voices and languages for different TTS engines. It's vague, but it's a start if anyone wants to contribute to it or eventually create the real thing. Boosts appreciated, as always. github.com/lower-elements/Voic #TTS #Accessibility #AI #ML

Terence Eden’s Blogblog@shkspr.mobi
2021-07-21

Synthetic Poetry

shkspr.mobi/blog/2021/07/synth

I've been experimenting with Amazon's Polly service. It's their fancy text-to-sort-of-human-style-speech system. Think "Alexa" but with a variety of voices, genders, and accents.

Here's "Brian" - their English, male, received pronunciation voice - reading John Betjeman's poem "Slough":

https://shkspr.mobi/blog/wp-content/uploads/2021/07/slough.mp4

The pronunciation of all the words is incredibly lifelike. If you heard it on the radio, it might sound like a half-familiar BBC presenter. It has a calm, even tone which suits the poem splendidly.

The rhythm is also spot on. That's mostly a function of the short lines and helpful punctuation the poem contains. Much like iambic pentameter, or a limerick, the syllables lend themselves to a specific and identifiable cadence.

But the emphasis is all wrong. The poem just... ends. There's no sense of finality in the tone. You'd expect a competent reader to recognise "tinned minds" as being worthy of stressing. Polly does have some capability to mark specific words for emphasis, but it's all very manual.

There's no synthetic emotion. Do you feel the rage, desperation, sadness, hopelessness of the poem? While Polly has some SSML (Speech Synthesis Markup Language) support - the range of emotions it can express are severely limited. And, again, must be applied manually.

"I used to be an adventurer like you, but then i took an arrow in the knee!"

One of the reasons stock phrases pop up so often in video games is that it is expensive to write and record thousands of different lines of dialogue.

We're almost at a stage where a computer can procedurally generate lines for background characters to speak, and then "record" an audio version in an array of styles. No more expensive voice actors, no more memetic references for in-group homophily. Each player of a game will have a completely different dialogue experience.

But the bit that we're still missing is the automation of emphasis and emotion and comic timing and understatement and... all the things which trained actors spend years learning how to do successfully.

In 2011, the film critic Roger Ebert had surgery which eliminated his voice. He proposed the following "Ebert Test" for synthetic voices:

If the computer can successfully tell a joke, and do the timing and delivery, as well as Henny Youngman, then that’s the voice I want.

We're so close, I can taste it. The Turing Test for realistic voices is whether they can move the audience to tears with poetry.

#AI #Amazon #tts #turing

Terence Eden’s Blogblog@shkspr.mobi
2025-07-20

1KB JS Numbers Station

shkspr.mobi/blog/2025/07/1kb-j

Code Golf is the art/science of creating wonderful little demos in an artificially constrained environment. This year the js1024 competition was looking for entries with the theme of "Creepy".

I am not a serious bit-twiddler. I can't create JS shaders which produce intricate 3D worlds in a scrap of code. But I can use slightly obscure JavaScript APIs!

There's something deliciously creepy about Numbers Stations - the weird radio frequencies which broadcast seemingly random numbers and words. Are they spies communicating? Commands for nuclear missiles? Long range radio propagation tests? Who knows!

So I decided to build one. Play with the demo.

Obviously, even the most extreme opus compression can't fit much audio into 1KB. Luckily, JavaScript has you covered! Most modern browsers have a built-in Text-To-Speech (TTS) API.

Here's the most basic example:

m = new SpeechSynthesisUtterance;m.text = "Hello";speechSynthesis.speak(m);

Run that JS and your computer will speak to you!

In order to make it creepy, I played about with the rate (how fast or slow it speaks) and the pitch (how high or low).

m.rate=Math.random();m.pitch=Math.random()*2;

It worked disturbingly well! High pitched drawls, rumbling gabbling, the languid cadence of a chattering friend. All rather creepy.

But what could I make it say? Getting it to read out numbers is pretty easy - this will generate a random integer:

s = Math.ceil( Math.random()*1000 );

But a list of words would be tricky. There's not much space in 1,024 bytes for anything complex. The rules say I can't use any external resources; so are there any internal sources of words? Yes!

Object.getOwnPropertyNames( globalThis );

That gets all the properties of the global object which are available to the browser! Depending on your browser, that's over 1,000 words!

But there's a slight problem. Many of them are quite "computery" words like "ReferenceError", "URIError", "Float16Array". I wanted all the single words - that is, anything which only has one capital letter and that's at the start.

const l = (n) => {    return ((n.match(/[A-Z]/g) || []).length === 1 && (n.charAt(0).match(/[A-Z]/g) || []).length === 1);};//   Get a random result from the filters = Object.getOwnPropertyNames( globalThis ).filter( l ).sort( ()=>.5-Math.random() )[0]

Rather pleasingly, that brings back creepy words like "Event", "Atomics", and "Geolocation".

Of course, Numbers Stations don't just broadcast in English. The TTS system can vocalise in multiple languages.

//   Set the language to Russianm.lang = "ru-RU";

OK, but where do we get all those language strings from? Again, they're built in and can be retrieved randomly.

var e = window.speechSynthesis.getVoices();m.lang = e[ (Math.random()*e.length) |0 ]

If you pass the TTS the number 555 and ask it to speak German, it will read out fünfhundertfünfundfünfzig.

And, if you tell the TTS to speak an English word like "Worker" in a foreign language, it will pronounce it with an accent.

Randomly altering the pitch, speed, and voice to read out numbers and dissociated words produces, I think, a rather creepy effect.

If you want to test it out, you can press this button. I find that it works best in browsers with a good TTS engine - let me know how it sounds on your machine.

🅝🅤🅜🅑🅔🅡🅢 🅢🅣🅐🅣🅘🅞🅝

With the remaining few bytes at my disposal, I produced a quick-and-dirty random pattern using Unicode drawing blocks. It isn't very sophisticated, but it does have a little random animation to it.

You can play with all the js1024 entries - I would be delighted if you voted for mine.

#code #HTML #javascript #tts

Terence EdenEdent
2025-07-20

🆕 blog! “1KB JS Numbers Station”

Code Golf is the art/science of creating wonderful little demos in an artificially constrained environment. This year the js1024 competition was looking for entries with the theme of "Creepy".

I am not a serious bit-twiddler. I can't create JS shaders which produce intricate 3D worlds in a scrap of code. But I can use slightly obscure JavaScript…

👀 Read more: shkspr.mobi/blog/2025/07/1kb-j

Linn von Ailurophilia 🏳️‍🌈+🏳️‍⚧️=❤️linnlaio.bsky.social@bsky.brid.gy
2025-07-20

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst