#inference

st1nger :unverified: 🏴‍☠️ :linux: :freebsd:st1nger@infosec.exchange
2025-07-16

#GPUHammer is the first attack to show #Rowhammer bit flips on #GPU memories, specifically on a GDDR6 memory in an #NVIDIA A6000 GPU. Our attacks induce bit flips across all tested DRAM banks, despite in-DRAM defenses like TRR, using user-level #CUDA #code. These bit flips allow a malicious GPU user to tamper with another user’s data on the GPU in shared, time-sliced environments. In a proof-of-concept, we use these bit flips to tamper with a victim’s DNN models and degrade model accuracy from 80% to 0.1%, using a single bit flip. Enabling Error Correction Codes (ECC) can mitigate this risk, but ECC can introduce up to a 10% slowdown for #ML #inference workloads on an #A6000 GPU.

gpuhammer.com/

N-gated Hacker Newsngate
2025-07-15

🧠🚀 Apparently, someone thought we needed yet another engine, but this time exclusively for Apple's golden child, the chip. Because clearly, the world was just yearning for a new way to "infer" things while stuck in a walled garden. 🌳🔒
github.com/trymirai/uzu

Dr Mircea Zloteanu ☀️ 🌊🌴mzloteanu
2025-07-09

#383 Berkson's paradox

Thoughts: aka Berkson's bias, collider bias, or Berkson's fallacy. Important for interpreting conditional probabilities. Can produce counterintuitive patterns.

en.m.wikipedia.org/wiki/Berkso

2025-06-27

Эффективный инференс множества LoRA адаптеров

LoRA — популярный метод дообучения больших моделей на небольших датасетах, однако на этапе инференса низкоранговые адаптеры работают неэффективно, а их объединение с весами требует хранения отдельной полной копии модели для каждого адаптера. MultiLoRA решает эту проблему, позволяя одновременно выполнять инференс с несколькими адаптерами на основе одной базовой модели. В статье мы сравним производительность MultiLoRA-инференса в двух популярных фреймворках — vLLM и TensorRT-LLM . Тесты проведём на готовых релизных Docker-образах, оценивая, какой фреймворк эффективнее обрабатывает батчи запросов в сценариях, близких к офлайн и асинхронному инференсу.

habr.com/ru/articles/922290/

#multilora #offline_inference #async_inference #vllm #TensorRTLLM #tensorrt #peft #inference #benchmark #lora

Andrzej Wąsowski ☑️ 🟥AndrzejWasowski@social.itu.dk
2025-06-26
Dr Mircea Zloteanu ☀️ 🌊🌴mzloteanu
2025-06-20

#370 The Problem with “Magnitude-based Inference”

Thoughts: An appealing but flawed approach. Good overview of the error inflation issue.

journals.lww.com/acsm-msse/ful

Dr Mircea Zloteanu ☀️ 🌊🌴mzloteanu
2025-06-18

#368 The FisherlPearson Chi-Squared Controversy: A Turning Point for
Inductive Inference

Thoughts: An overview of the difference between Pearson's descriptive view and Fisher's inferential view of X2.

genepi.qimr.edu.au/contents/p/

Nvidia Dynamo: If disaggregation is Dynamo’s backbone, the smart management of the KV cache is its brain At around 300 tokens per second per user, you can generate 30 times more tokens per normalized per #GPU www.vastdata.com/sharedeveryt... #AI #Inference via @nicolehemsoth.bsky.social

Why Everyone’s Talking About N...

N-gated Hacker Newsngate
2025-05-28

Ah, behold the majestic -0528, a model so and elusive that not even dare to touch it. 🤔✨ With a grand total of zero downloads last month, it's clear that this parameter behemoth is the hottest sensation—if only in its creator's wildest dreams. 🐒💭
huggingface.co/deepseek-ai/Dee

AMD vs NVIDIA #Inference Benchmark: Who Wins? – Performance & Cost Per Million Tokens Report by @SemiAnalysis_ tldr: It's not a simple answer semianalysis.com/2025/05/23/a... #AI #GPU #HPC via @ogawa-tadashi.bsky.social

2025-05-22

'On Consistent Bayesian Inference from Synthetic Data', by Ossi Räisä, Joonas Jälkö, Antti Honkela.

jmlr.org/papers/v26/23-1428.ht

#bayesian #privacy #inference

Don Curren 🇨🇦🇺🇦dbcurren.bsky.social@bsky.brid.gy
2025-05-21

#Inference is actually quite close to a #theoryofeverything – including #evolution, #consciousness, and #life itself. It is #abduction all the way down.” (The process of abduction may be much more pervasive than the relatively rare use of the word “abduction” would suggest) aeon.co/essays/consc...

Consciousness is not a thing, ...

Hacker Newsh4ckernews
2025-05-21
2025-05-20

inference, not training, represents an increasing majority of #AI’s energy demands and will continue to do so in the near future. It’s now estimated that 80–90% of computing power for AI is used for #inference technologyreview.com/2025/05/2

N-gated Hacker Newsngate
2025-05-20

🎉 Behold! The emerges from the depths of the abyss, promising the holy grail of Kubernetes-native distributed . 🤖 Because who doesn't want their served with extra buzzwords and a side of "competitive performance per dollar"? 🍽️
llm-d.ai/blog/llm-d-announce

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst