#SpeechRecognition

2025-06-17

Slow amplitude fluctuations in sounds, critical for #SpeechRecognition, seem poorly represented in the #brainstem. This study shows that overlooked intricacies of #SpikeTiming represent these fluctuations, reconciling low-level neural processing with #perception @plosbiology.org 🧪 plos.io/3FJ4adI

A regular spiking neuron (sustained chopper) that exhibits accurate identification of envelope frequency from its spike trains. Top left: peristimulus time histogram of the response to a pure tone at the characteristic frequency of the neuron. Bottom left: Spike train classifier decisions represented as a confusion matrix. Warm colors along the diagonal indicate a large proportion of individual spike trains were assigned to the correct modulation frequency.  Right: Raster plots of the responses to amplitude-modulated tones, for several modulation frequencies (stimulus waveforms displayed above each panel). The number of spikes in each stimulus period decreases as the modulation frequency is increased.
2025-05-23

🌟 Excited to share Thorsten-Voice's YouTube channel! 🎥 🗣️🔊 ♿ 💬

Thorsten presents innovative TTS solutions and a variety of voice technologies, making it an excellent starting point for anyone interested in open-source text-to-speech. Whether you're a developer, accessibility advocate, or tech enthusiast, his channel offers valuable insights and resources. Don't miss out on this fantastic content! 🎬

follow hem here: @thorstenvoice
or on YouTube: youtube.com/@ThorstenMueller YouTube channel!

#Accessibility #FLOSS #TTS #ParlerTTS #OpenSource #VoiceTech #TextToSpeech #AI #CoquiAI #VoiceAssistant #Sprachassistent #MachineLearning #AccessibilityMatters #FLOSS #TTS #OpenSource #Inclusivity #FOSS #Coqui #AI #CoquiAI #VoiceAssistant #Sprachassistent #VoiceTechnology #KünstlicheStimme #MachineLearning #Python #Rhasspy #TextToSpeech #VoiceTech #STT #SpeechSynthesis #SpeechRecognition #Sprachsynthese #ArtificialVoice #VoiceCloning #Spracherkennung #CoquiTTS #voice #a11y #ScreenReader

Farooq | فاروقfarooqkz@cr8r.gg
2025-05-06

Yesterday, I ordered food online. However it went a little off. And I contacted Support. They called me and for one moment, I thought it's a bot or recorded voice or something. And I hated it. Then I realized it's a human on the line.

I was planning to do an LLM+TTS+Speech Recognition and deploy it on A311D. To see if I can practice british accent with it. Now I'm rethinking about what I want to do. This way we are going, it doesn't lead to a good destination. I would hate it if I would have to talk to a voice enabled chatbot as support agent rather than a human.

And don't get me wrong. Voice enabled chatbots can have tons of good uses. But replacing humans with LLMs, not a good one. I don't think so.

#LLM #AI #TTS #ASR #speechrecognition #speechai #ML #MachineLearning #chatbot #chatbots #artificialintelligence

Richard Emling (DO9RE)tschapajew@metalhead.club
2025-05-01

I'm exploring ways to improve audio preprocessing for speech recognition for my [midi2hamlib](github.com/DO9RE/midi2hamlib) project. Do any of my followers have expertise with **SoX** or **speech recognition**? Specifically, I’m seeking advice on: 1️⃣ Best practices for audio preparation for speech recognition. 2️⃣ SoX command-line parameters that can optimize audio during recording or playback.
github.com/DO9RE/midi2hamlib/b #SoX #SpeechRecognition #OpenSource #AudioProcessing #ShellScripting #Sphinx #PocketSphinx #Audio Retoot appreciated.

Farooq | فاروقfarooqkz@cr8r.gg
2025-04-27

After my #wake_word_detection #research has delievered fruits, I have plans to continue works in the voice domain. I would love if I could train a #TTS model which has #British accent so I would use it to practice.

I was wondering if I could do the inference on #A311D #NPU. However, as I am skimming papers of different models, having inference on A311D with reasonable performance seems unlikely. Even training of these models on my entry level #IntelArc #GPU would be painful.

Maybe I could just finetune an already existing models. I am also thinking about using #GeneticProgramming for some components of these TTS models to see if there will be better inference performance.

There are #FastSpeech2 and #SpeedySpeech which look promising. I wonder how much natural their accents will be. But they would be good starting points.

BTW, if anyone needs opensource models, I would love to work as a freelancer and have an #opensource job. Even if someone can just provide access to computation resources, that would be good.

#forhire #opensourcejob #job #hiring

#AI #VoiceAI #opensourceai #ml #speechrecognition #speechsynthesis #texttospeech #machinelearning #artificialintelligence #getfedihired #FediHire #hireme #wakeworddetection

2025-04-05

Quand je passe par la commande vocale pour ajouter du VIANDOX à la liste de courses 🤦‍♂️

#siri #commandevocale #speechrecognition

Item de liste intitulé « Viandas Oksana »
Doug Holtondougholton
2025-02-10

Vibe is an desktop client (mac, windows, linux) for locally running Whisper to more accurately transcribe or caption videos & audio thewh1teagle.github.io/vibe/ Source code: github.com/thewh1teagle/vibe/ Easier to use than what I was using before (WhisperDesktop). Default settings use the medium Whisper model, which has been good enough in my experience.

Farooq | فاروقfarooqkz@cr8r.gg
2025-02-07

For learning languages, do you think it's a good idea to practice with an AI Speech Recognition and an AI Speech Synthesis engine?

I'm specifically interesting in British English and German.

#AI #ML #LanguageLearning #Learning #SprachenLernen #British #English #DeutchLernen #EnglishLearning #speechrecognition #speechtotext #speechrecognitionsoftware #speechsynthesis #SpeechSynthesizer

The Conversation U.S.TheConversationUS@newsie.social
2025-02-05

Speech recognition systems struggle with accents and dialects, risking problems in critical fields like healthcare and emergency services. Imagine calling 911 and the AI used to screen out non-emergency calls can’t understand you.

A Spanish language professor explains: theconversation.com/sorry-i-di #AI #speechrecognition

2025-02-05

#UnplugTrump - Tipp5:
Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.

#Alexa #OpenVoiceOS #Sprachassistent #VoiceControl #SpeechRecognition #datenschutz #privacy

RaspberryPi, der mit OpenVoiceOS bespielt ist. Daneben steht eine Lautsprecher-Box, die von jemandem mit einem USB-Kabel verbunden wird.
MosChip®MosChipTech
2025-01-28

Unlock the power of Speech Recognition with advanced Audio Codecs! Dive into how audio processing enables seamless interaction in IoT, automotive, and smart devices.

Explore the future of voice-driven tech. Read more: moschip.com/blog/speech-recogn

2025-01-06

Using LLMs to clean up the output of speech recognition has been a game changer for me in the past year:

blog.nawaz.org/posts/2023/Dec/

Note: I've improved my workflow compared to that post. I should write a followup.

#gpt #chatgpt #llm #speechrecognition

2024-12-14

University of Copenhagen: Coming soon – offline speech recognition on your phone. “More than one in four people currently integrate speech recognition into their daily lives. A new algorithm developed by a University of Copenhagen researcher and his international colleagues makes it possible to interact with digital assistants like ‘Siri’ without any internet connection. The innovation […]

https://rbfirehose.com/2024/12/14/university-of-copenhagen-coming-soon-offline-speech-recognition-on-your-phone/

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst