#speechrecognition

Ecologia Digitaljosemurilo@mato.social
2025-07-08

"#KarenHao only really gets her teeth into this point in the book’s epilogue, “How the Empire Falls.” She takes inspiration from #TeHiku, a #Māori AI #speechrecognition project. Te Hiku seeks to revitalize the #te_reo language through putting archived audio tapes of te reo speakers into an AI model, teaching new generations of Māori.
The tech has been developed on consent and active participation from the Māori community, and it is only licensed to organizations that respect Māori values"

2025-07-04
2025-07-03

@thelinuxEXP I really like Speech Note! It's a fantastic tool for quick and local voice transcription in multiple languages, created by @mkiol

It's incredibly handy for capturing thoughts on the go, conducting interviews, or making voice memos without worrying about language barriers. The app uses strictly locally running LLMs, and its ease of use makes it a standout choice for anyone needing offline transcription services.

I primarily use #WhisperAI for transcription and Piper for voice, but many other models are available as well.

It is available as flatpak and github.com/mkiol/dsnote

#TTS #transcription #TextToSpeech #translator translation #offline #machinetranslation #sailfishos #SpeechSynthesis #SpeechRecognition #speechtotext #nmt #linux-desktop #stt #asr #flatpak-applications #SpeechNote

The image shows a screenshot of the "About" page for Speech Note 4.8.1. The page is structured with a dark gray header and a light gray body. The header includes a title "About" and a version number "4.8.1" with a subtitle "Note taking, reading and translating with Speech to Text, Text to Speech and Machine Translation." Below this, there is a section titled "Changes," followed by "About," which includes links to the project website and bug reporting pages on GitHub and GitLab, along with a support email address. The page also states that Speech Note is developed as an open-source project under the Mozilla Public License 2.0. The "Authors" section lists Michal Kosciessa as the copyright holder for the years 2021-2025. The "Translators" section lists several names, including Heimen Stoffels, Béranger Arnaud, and others. The "Libraries in use" section lists various libraries such as Qt, Coqui STT, Vosk, and others. The page has a "Close" button in the bottom right corner.

Provided by @altbot, generated privately and locally using Ovis2-8B
2025-06-27

Gallaudet News: Gallaudet experts drive accessibility of speech tech for deaf voices . “Some people use their voices to control tech, from cell phones and remote controls to home appliances and in transportation. Voice command capabilities are made possible through training AI and machine learning. The Speech Accessibility Project is creating datasets of more diverse speech patterns, which […]

https://rbfirehose.com/2025/06/27/gallaudet-news-gallaudet-experts-drive-accessibility-of-speech-tech-for-deaf-voices/

2025-06-25

Going live at 4 PM Central to build a real-time speech-to-text app using SwiftUI and iOS 26 APIs.
I’ll walk through everything—mic permissions, live transcription, and Apple’s speech recognition tools.
No delay, no post-processing. Just fast, accurate voice-to-text in SwiftUI.
Watch live: youtube.com/watch?v=vIqZq1UYBO
Hit “Notify Me” to join in.
#SwiftUI #iOSDev #SpeechRecognition #LiveCoding #Accessibility

Sunrise Technologiessunrisetechnologies
2025-06-23

🔊 Sunrise Technologies offers AI-powered ASR with 95%+ accuracy for real-time & offline transcription across 50+ languages. Built for healthcare, finance & more—secure, scalable, and smart.

🎯 Book a free demo
👉 zurl.co/WjgnT

2025-06-17

Slow amplitude fluctuations in sounds, critical for #SpeechRecognition, seem poorly represented in the #brainstem. This study shows that overlooked intricacies of #SpikeTiming represent these fluctuations, reconciling low-level neural processing with #perception @plosbiology.org 🧪 plos.io/3FJ4adI

A regular spiking neuron (sustained chopper) that exhibits accurate identification of envelope frequency from its spike trains. Top left: peristimulus time histogram of the response to a pure tone at the characteristic frequency of the neuron. Bottom left: Spike train classifier decisions represented as a confusion matrix. Warm colors along the diagonal indicate a large proportion of individual spike trains were assigned to the correct modulation frequency.  Right: Raster plots of the responses to amplitude-modulated tones, for several modulation frequencies (stimulus waveforms displayed above each panel). The number of spikes in each stimulus period decreases as the modulation frequency is increased.
2025-05-23

🌟 Excited to share Thorsten-Voice's YouTube channel! 🎥 🗣️🔊 ♿ 💬

Thorsten presents innovative TTS solutions and a variety of voice technologies, making it an excellent starting point for anyone interested in open-source text-to-speech. Whether you're a developer, accessibility advocate, or tech enthusiast, his channel offers valuable insights and resources. Don't miss out on this fantastic content! 🎬

follow hem here: @thorstenvoice
or on YouTube: youtube.com/@ThorstenMueller YouTube channel!

#Accessibility #FLOSS #TTS #ParlerTTS #OpenSource #VoiceTech #TextToSpeech #AI #CoquiAI #VoiceAssistant #Sprachassistent #MachineLearning #AccessibilityMatters #FLOSS #TTS #OpenSource #Inclusivity #FOSS #Coqui #AI #CoquiAI #VoiceAssistant #Sprachassistent #VoiceTechnology #KünstlicheStimme #MachineLearning #Python #Rhasspy #TextToSpeech #VoiceTech #STT #SpeechSynthesis #SpeechRecognition #Sprachsynthese #ArtificialVoice #VoiceCloning #Spracherkennung #CoquiTTS #voice #a11y #ScreenReader

Farooq | فاروقfarooqkz@cr8r.gg
2025-05-06

Yesterday, I ordered food online. However it went a little off. And I contacted Support. They called me and for one moment, I thought it's a bot or recorded voice or something. And I hated it. Then I realized it's a human on the line.

I was planning to do an LLM+TTS+Speech Recognition and deploy it on A311D. To see if I can practice british accent with it. Now I'm rethinking about what I want to do. This way we are going, it doesn't lead to a good destination. I would hate it if I would have to talk to a voice enabled chatbot as support agent rather than a human.

And don't get me wrong. Voice enabled chatbots can have tons of good uses. But replacing humans with LLMs, not a good one. I don't think so.

#LLM #AI #TTS #ASR #speechrecognition #speechai #ML #MachineLearning #chatbot #chatbots #artificialintelligence

Richard Emling (DO9RE)tschapajew@metalhead.club
2025-05-01

I'm exploring ways to improve audio preprocessing for speech recognition for my [midi2hamlib](github.com/DO9RE/midi2hamlib) project. Do any of my followers have expertise with **SoX** or **speech recognition**? Specifically, I’m seeking advice on: 1️⃣ Best practices for audio preparation for speech recognition. 2️⃣ SoX command-line parameters that can optimize audio during recording or playback.
github.com/DO9RE/midi2hamlib/b #SoX #SpeechRecognition #OpenSource #AudioProcessing #ShellScripting #Sphinx #PocketSphinx #Audio Retoot appreciated.

Farooq | فاروقfarooqkz@cr8r.gg
2025-04-27

After my #wake_word_detection #research has delievered fruits, I have plans to continue works in the voice domain. I would love if I could train a #TTS model which has #British accent so I would use it to practice.

I was wondering if I could do the inference on #A311D #NPU. However, as I am skimming papers of different models, having inference on A311D with reasonable performance seems unlikely. Even training of these models on my entry level #IntelArc #GPU would be painful.

Maybe I could just finetune an already existing models. I am also thinking about using #GeneticProgramming for some components of these TTS models to see if there will be better inference performance.

There are #FastSpeech2 and #SpeedySpeech which look promising. I wonder how much natural their accents will be. But they would be good starting points.

BTW, if anyone needs opensource models, I would love to work as a freelancer and have an #opensource job. Even if someone can just provide access to computation resources, that would be good.

#forhire #opensourcejob #job #hiring

#AI #VoiceAI #opensourceai #ml #speechrecognition #speechsynthesis #texttospeech #machinelearning #artificialintelligence #getfedihired #FediHire #hireme #wakeworddetection

2025-04-05

Quand je passe par la commande vocale pour ajouter du VIANDOX à la liste de courses 🤦‍♂️

#siri #commandevocale #speechrecognition

Item de liste intitulé « Viandas Oksana »
Doug Holtondougholton
2025-02-10

Vibe is an desktop client (mac, windows, linux) for locally running Whisper to more accurately transcribe or caption videos & audio thewh1teagle.github.io/vibe/ Source code: github.com/thewh1teagle/vibe/ Easier to use than what I was using before (WhisperDesktop). Default settings use the medium Whisper model, which has been good enough in my experience.

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst