#dataaugmentation

InterData VNinterdatavn
2025-04-29

Data Augmentation là gì? Vai trò của tăng cường dữ liệu học máy

Học máy hiệu quả phụ thuộc rất nhiều vào dữ liệu đầu vào. Tuy nhiên, khi dữ liệu không đủ nhiều hoặc thiếu tính đa dạng, kỹ thuật Data Augmentation trở thành một giải pháp quan trọng. Bài viết dưới đây sẽ cung cấp cho bạn cái nhìn toàn diện về tăng cường dữ liệu: từ khái niệm cơ bản đến ứng dụng trong thực tiễn và những khó khăn đi kèm.

Xem bài viết ngay: interdata.vn/blog/data-augment

CoListycolisty
2025-01-16

Generative AI Using SAS | CoListy
Explore Generative AI with SAS, from SMOTE and GANs to LLMs like BERT, enhancing your skills in data generation and AI innovation. Free learning! | CoListy
-augmentedgeneration

colisty.netlify.app/courses/ge

GenAINews.coGenAINews_top
2024-08-07

Exciting news! Check out our latest article on Multimodal TextImage Augmentation for Document Images, a collaboration with Albumentations AI. Enhance your datasets with this new technique!

huggingface.co/blog/doc_aug_hf

SolGuruz LLPSolGuruzllp
2024-07-10

Learn how data augmentation enriches machine learning models with diverse datasets. Explore its benefits in AI, healthcare, retail, finance, and more!

More details: solguruz.com/generative-ai/wha


AI
AI

2024-04-24

"Good luck, Viv, I know that guy's a total douche."

"Thanks Tay. Have you got the multi-wake word model training?"

"Yeah. Are you're sure he'll pick that #WakeWord though?"

"I'm pretty sure. He won't outright use the phrase "dole bludger" but it's pretty close."

---

A sardonic smile crept over the Prime Minister's face, the hot summer sun reflecting off his near-bald temples.

"So, by choosing a Wake Word that has difficult to pronounce sounds in it, it means that it won't work well for people who speak with an accent?"

"Yes, Prime Minister, precisely".

"Like the 'th' sound in them, there that?"

"Yes, or words that start or end with hard consonants are also difficult for some accents".

"Do we know which Wake Word would be the hardest for immigrants? Indigenous people?".

Pained, the #linguist had feared this question. She knew exactly what he was trying to do.

The long call centre queues hadn't done the trick - people had added screeners to their phones so that after being on hold for 7 hours, the phone would alert them to a picked up call.

The Assistant was supposed to be the replacement for the call centre. Just load the app on your phone, and ask it a question! So simple! No queuing! The government actually wanted to help people!

You just needed to use the Wake Word to "wake up" the assistant first.

"Well ---", she hesitated.

"I don't have all day Doctor!"

"Our research shows that a Wake Word like `This Starts With Me` has lots of hard to pronounce phonemes - sounds".

"Excellent. And I like the overtone of personal responsibility."

Of course the fucker did.

"Very well Prime Minister, we will implement that Wake Word".

He trotted off, probably to kill some babies or kittens, she thought.

---

Tay was configuring the #DataAugmentation for the #ML training run for the #WakeWord model.

They had just finished downloading every instance of the words "this", "starts" "with" "me" in every accent of English, from Common Voice.

By augmenting the Wake Word model with accent data, they could make it recognise more accents, more accurately.

Resistance came in many forms.

---
arxiv.org/abs/2104.01454
dl.acm.org/doi/10.1145/3617694
---

#Tootfic #Microfiction

2024-03-04

"Beer, Data & Robots" ⚛️ è stata davvero una serata esplosiva.. grazie a tutti !! 💥 🙏🏻

Grazie a Simona Mazzarino e a Andrea Marchese per averci illustrato fino a che punto una IA può mal interpretare i dati e come poterli esplorare con un visore di realtà aumentata 📊

Ecco il video della serata: video.linux.it/w/3cnKaZqmSLtpE

#databeerstorino #ai #AIbias #AIfairness #syntheticdata #XGBoost #DataAugmentation #vr #dataframe #open3d #unity #pythontorino #datascience #python

whitonewhitone
2024-03-02

"Know your Bias: Tackling Data Bias through Synthetic Data" by Simona Mazzarino @torino — Beer, Data & Robots

2024-02-02

Listen to the #InfoQ #podcast featuring Sam Partee, where he shares insights on Redis' vector database offering, different approaches to embeddings, and how to enhance #LLMs by adding a search component for retrieval augmented generation: bit.ly/3ukrEjw

Plus, a peek into the world of hybrid search in Redis!

#AI #ML #DataBase #DataAugmentation

Amy Tabb 🇺🇦amytabb@hachyderm.io
2024-01-29

Soft Augmentation for Image Classification
Authors: Yang Liu, Shen Yan, Laura Leal-Taixé, James Hays, Deva Ramanan

abs: arxiv.org/abs/2211.04625
code: github.com/youngleox/soft_augm

#arXiv #ComputerVision #DataAugmentation

Modern neural networks are over-parameterized and thus rely on strong regularization such as data augmentation and weight decay to reduce overfitting and improve generalization. The dominant form of data augmentation applies invariant transforms, where the learning target of a sample is invariant to the transform applied to that sample. We draw inspiration from human visual classification studies and propose generalizing augmentation with invariant transforms to soft augmentation where the learning target softens non-linearly as a function of the degree of the transform applied to the sample: e.g., more aggressive image crop augmentations produce less confident learning targets. We demonstrate that soft targets allow for more aggressive data augmentation, offer more robust performance boosts, work with other augmentation policies, and interestingly, produce better calibrated models (since they are trained to be less confident on aggressively cropped/occluded examples). Comb
2023-11-20

🤖 Ever wondered how to improve your machine learning model's performance? Check out our latest blog on data augmentation! A powerful technique not to be overlooked. 💡 ➡️ ak-codes.com/data-augmentation/

Dan Stowelldanstowell
2023-04-24

People applying to (animal sounds) often ask me about strategies. Here's my answer, on Stack Exchange bioacoustics.stackexchange.com

Marcin Paprzyckimarcinpaprzycki@masto.ai
2023-03-22

Enhancing the quality of a small and unbalanced dataset by use of preprocessing and augmentation methods in: “Treating Dataset Imbalance in Fetal Echocardiography Classification” by Guilherme Gusmão, Alberto Raposo, Renato de Oliveira, Carlos Barbosa. Communication Papers of the 17th Conference on Computer Science and Intelligence Systems; ACSIS, Vol. 32, pages 3–9 (2022).

#fetalechocardiography #dataaugmentation #imageprocessing
Open Access: lnkd.in/dwuYikMf

2023-01-29

tldr; #data #augmentation in #NLProc degrades #textClassification performance in most cases. aclanthology.org/2022.insights

Well that was fun. Just spent last night experimenting with #dataAugmentation (using #flan-T5 for paraphrasing) and in wondering why it seems to degrade #textClassification performance I came across this great paper essentially saying the same thing. I guess I’ll revisit this in a year or so when there are better language models.
@linguistics @sigmoid.social

2022-11-15

What do you think of this?
Are we going to see a new trend of Knowledge Distilation using diffusion models to create customizable datasets.

In the past we have seen some data augmentation using generative models like GANs so why not.

#GAN #GenerativeModels #DataAugmentation

2/2

heise online (inoffiziell)heiseonline@squeet.me
2022-01-31
heise+ | Neuronale Netze zur Bilderkennung mit Python trainieren

Auch auf Mikrocontrollern mit vergleichsweise wenig RAM und geringem CPU-Takt übernehmen neuronale Netze anspruchsvolle Aufgaben. Wir zeigen, wie das geht.
Neuronale Netze zur Bilderkennung mit Python trainieren
Quasimondo ♾Quasimondo
2019-07-05

Trying to make my face marker recognition model to loosen up a little.

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst