#speechAI

pinage404.rss :nixos:pinage404@mamot.fr
2025-05-11

As of today, my computer can __nicely__ read aloud for me !

I'm lazy, i read slowly, so i don't like reading, i skip a lot of articles

I have been looking for a solution for several months

#Accessibility #A11y #Orca #WebBrowser #ZenBrowser #Firefox #Piper #Pied #SpeechAI #AI #Nix #NixOS

Farooq | فاروقfarooqkz@cr8r.gg
2025-05-06

Yesterday, I ordered food online. However it went a little off. And I contacted Support. They called me and for one moment, I thought it's a bot or recorded voice or something. And I hated it. Then I realized it's a human on the line.

I was planning to do an LLM+TTS+Speech Recognition and deploy it on A311D. To see if I can practice british accent with it. Now I'm rethinking about what I want to do. This way we are going, it doesn't lead to a good destination. I would hate it if I would have to talk to a voice enabled chatbot as support agent rather than a human.

And don't get me wrong. Voice enabled chatbots can have tons of good uses. But replacing humans with LLMs, not a good one. I don't think so.

#LLM #AI #TTS #ASR #speechrecognition #speechai #ML #MachineLearning #chatbot #chatbots #artificialintelligence

Dirk Schnelle-Walkadsw@mastodontech.de
2025-04-28

SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems. Multi-modal LLM system simulates human communication using speech and generates human-like dialogues with consistent content, rhythm, & emotion.

Funnily, they also elaborate on a "think before you speak" design aspect. This might also be applicable to our everyday lives.

doi: 10.48550/arXiv.2401.03945
#LLM #multimodal #speechAI #multiagent #conversationalai

DigiProductzdigiproductz
2024-10-03

VoizHub AI Review - Clone Any Celebrity Voice 🎤, Dub Any Video or Voice In Any Language 🌍, Record In Real-Time 🎙️, AI Transcript Anything 📝, Turn Any Text Into Speech 🗣️, Turn Any URL Into Speech 🌐, Generate Podcasts 🎧, Narrate Audiobooks 📚, Narrate Short Videos 🎥, And More ✨!

Get Instant Access: digiproductz.com/get/voizhub-ai
Read Full Review: digiproductz.com/voizhub-ai-re

2024-03-21

For the past couple of years, as each new @mozilla #CommonVoice dataset of #voice #data is released, I've been using @observablehq to visualise the #metadata coverage across the 100+ languages in the dataset.

Version 17 was released yesterday (big ups to the team - EM Lewis-Jong, @jessie, Gina Moape, Dmitrij Feller) and there's some super interesting insights from the visualisation:

➡ Catalan (ca) now has more data in Common Voice than English (en) (!)

➡ The language with the highest average audio utterance duration at nearly 7 seconds is Icelandic (is). Perhaps Icelandic words are longer? I suspect so!

➡ Spanish (es), Bangla (Bengali) (bn), Mandarin Chinese (zh-CN) and Japanese (ja) all have a lot of recorded utterances that have not yet been validated. Albanian (sq) has the highest percentage of validated utterances, followed closely by Erzya / Arisa (myv).

➡ Votic (vot) has the highest percentage of invalidated utterances, but with 76% of utterances invalidated, I wonder if this language has been the target of deliberate invalidation activity (invalidating valid sentences, or recording sentences to be deliberately invalid) given the geopolitical instability in Russia currently.

See the visualisation here and let me know your thoughts below!

observablehq.com/@kathyreid/mo

#linguistics #languages #data #VoiceAI #VoiceData #SpeechAI #SpeechData #DataViz

2023-11-20

Last week, as part of my #PhD program at the #ANU School of #cybernetics, I gave my final presentation, which is a summary of my methods and #research findings. I covered my interview work, the #dataset documentation analysis work I've been doing and my analysis work around #accents in @mozilla's #CommonVoice platform.

There were some insightful and thought-provoking questions from my panel and audience members, and of course - so many ideas for future research inquiry!

A huge thanks to my panel, chaired so well by Professor Alexandra Zafiroglu, to Dr Elizabeth Williams, my meticulous, methodical and always-encouraging Primary Supervisor, and to my co-supervisors Dr Jofish Kaye and Dr Paul Wong 黃仲熙 for their deep expertise in #HCI and #data respectively.

Similarly, a huge thank you to my #PhD cohort - Charlotte Bradley, Tom Chan, Danny Bettay and Sam Backwell - as well as the other cohorts in the School - for your encouragement and intellectual journeying.

#PhD #PhDlife #cybernetics #milestone #ANU #voiceAI #speechAI #ASR #SpeechRecognition

Kathy Reid presenting her #PhD final presentation.Results from Kathy Reid's survey of #ML practitionersKathy Reid's work in assessing the Whisper #ASR engine
Norobiik @Norobiik@noc.socialNorobiik@noc.social
2023-03-23

#Quantiphi is working with #NeMo to build a modular generative AI solution to improve worker productivity. Nvidia also announced four inference GPUs, optimized for a diverse range of emerging LLM and generative AI applications. Each GPU is designed to be optimized for specific #AIInference workloads while also featuring specialized software.

#SpeechAI, #supercomputing in the #cloud, and #GPUs for #LLMs and #GenerativeAI among #Nvidia’s next big moves | #AI
venturebeat.com/ai/speech-ai-s

NeMO functional framework
Jiri Jerabekjirijerabek@c.im
2022-11-21

I *really* would love Audible to have speech to text recognition, for those hard to understand moments.

Also, give me a dwelling that can be generated alongside academic reference.

#Audible #speechtotext #speechAI #speechTechnology

2022-11-12

Hello Mastodon! Here's my belated #introduction. I am an Associate Professor of Language & Technology and the Director of the MSc Voice Technology at the University of Groningen. #OpenAcess Ambassador.

Interests: #SpeechTechnology, #VoiceTechnology, voice #synthesis, #speech #recognition #ASR, #speechAI, #multisensory perception, #audition, #soundscapes, #SituatedCognition, music, Andean languages, #Aymara, #Frisian

2022-11-07

💡 Interesting read on how one of the biggest commercial players out there plans to use Mozilla Open Voice data to make speech AI more inclusive and open to more language.

💬 Sounds idealistic, but Open Voice datasets are created by unpaid volunteers who donate hours and hours of their speech. Not sure whether I feel comfortable with that, tbh.

💭 Thoughts?

venturebeat.com/ai/nvidia-ente

#ethicalAI #transparentai #voice #speech #voicetech #speechtech #speechAI

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst