#CUDA

Melroy van den Bergmelroy@mastodon.melroy.org
2025-12-13

Good to see this happening. It was already a pain to configure and install rocm under Linux. It will also create a more even playfield for us all, so AMD videocards can just as easy being used (instead of only nvidia cuda).

canonical.com/blog/canonical-a

#amd #rocm #nvidia #cuda #linux #Ubuntu #debian

2025-12-12

Hace un cuarto de siglo un estudiante unió 32 tarjetas gráficas GeForce para jugar a #Quake III. De allí salió #CUDA

xataka.com/robotica-e-ia/hace-

2025-12-11

GPU là cốt lõi cho huấn luyện mô hình ngôn ngữ nhờ xử lý song song và tính toán ma trận nhanh. Bài viết phân tích kiến trúc GPU, phân biệt vs CPU, vai trò của CUDA/Tensor Cores, và quản lý VRAM. Hiệu suất GPU được đo lường bằng FLOPS, quyết định tốc độ huấn luyện. #AI #ML #GPU #MôHìnhNgônNgữ #CôngNghệ #ParallelComputing #DeepLearning #CUDA #VRAM #FLOPS #HiểuGPU #MachineLearningVietNam

reddit.com/r/LocalLLaMA/commen

2025-12-11

Hôm nay khám phá kiến thức về GPU – linh hồn của mô hình ngôn ngữ. GPU xử lý siêu song song, lý tưởng cho matrix multiplication trong ML nhờ hàng nghìn CUDA và Tensor Cores. So sánh CPU (lõi mạnh, xử lý tuần tự) vs GPU (nhiều lõi, song song). VRAM quan trọng để lưu trọng số/activations, thiếu gây lỗi training. FLOPS đo tốc độ tính toán, nhưng phụ thuộc bandwidth và hiệu suất Tensor Cores. Hiểu GPU để tối ưu hiệu quả huấn luyện mô hình!

#AI #ML #GPU #DeepLearning #VRAM #CUDA #TensorCore #FLOPS

Lazy Bear Dudereallylazybear
2025-12-11

Just catching up after a year of not fiddling around with local image generation.

Yeah this was made with Z-Image, the model that's been popping up after 2 weeks or so because it generates images fast, only taking about 25 seconds on my RTX 3060 power capped at 145w, with good prompt adherence.. or some thing.

Idk about prompt adherence but I'm still "training" myself to make prompts.

A brown bear with brown vest and grey utility pants walking, its back turned to the camera, walking forward
Lazy Bear Dudereallylazybear
2025-12-09

Been experimenting with the gguf quants on Flux dev and it seems like I get about the same or a bit less seconds per iteration on Q8 than any other bits so I'm just gonna stick with Q8. I can't run the fp8 one on my GPU cos I'm 4 - 5 gigs short of VRAM

As for safetensors and gguf on Flux1 Krea Dev, I get lesser seconds per iteration on the fp8 one than the gguf q8 quant so I'm gonna stick with safetensors

BuySellRam.comjimbsr
2025-12-08

Is Google making its proprietary TPUs (Ironwood) available to Meta, directly challenging Nvidia’s 90% dominance? The massive cost of AI compute is forcing tech giants to turn from Nvidia's biggest customers into its fiercest competitors. Can Nvidia's software moat (CUDA) hold up against the combined might of hyperscalers?

buysellram.com/blog/the-escala

S.v. N.Sönmeznsonmez84
2025-12-08

CUDA 13.1 ve yeni CUDA Tile ile NVIDIA, AI kodlamasını 35% hızlandırırken enerji verimliliğini %20 artırıyor. Blackwell GPU'ları ve derinlemesine optimizasyonlarla, veri akışı ve paralel iş yükü yönetimi çok daha sezgisel hâle geldi. Detay için resmi NVIDIA belgelerine göz atın.

🚩

2025-12-07

Microbenchmarking NVIDIA’s Blackwell Architecture: An in-depth Architectural Analysis

#PTX #CUDA #Benchmarking #Blackwell #HPC

hgpu.org/?p=30437

GripNewsGripNews
2025-12-05

🌕 CUDA-L2:以強化學習超越 cuBLAS 的矩陣乘法效能
➤ 透過 AI 驅動,在 GPU 上實現更快的矩陣運算
github.com/deepreinforce-ai/CU
CUDA-L2 是一個創新的系統,結合了大型語言模型 (LLMs) 和強化學習 (RL),能自動化優化 Half-precision General Matrix Multiply (HGEMM) 的 CUDA 核心。此係統透過系統性的評估,展現出超越目前主流矩陣乘法基準的效能,包括廣泛使用的 torch.matmul 以及 NVIDIA 的先進閉源函式庫(如 cuBLAS、cuBLASLt-heuristic 和 cuBLASLt-AutoTuning)。該專案已發布針對 A100 GPU 的 HGEMM 核心,涵蓋千種不同的 (M,N,K) 組態,並計畫未來支援更多 GPU 架構和更高精度的累加器。
+ 這聽起來很有潛力!我很想知道它在實際應用中的延遲表現如何。
+ 對於需要大量矩陣運算的科學計算來說,這
加速

2025-12-04

CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL

Link: github.com/deepreinforce-ai/CU
Discussion: news.ycombinator.com/item?id=4

#cuda

2025-12-04

I find it quite hilarious that all NVIDIA "CUDA Tile" material has programming misspelled as "programmiing" in the project logo. Surely the logo wasn't generated by some AI model?

developer.nvidia.com/blog/focu

#NVIDIA #CUDA #cudatile

"CUDA Tile programmiing model"
Hacker Newsh4ckernews
2025-12-04

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication Through RL

github.com/deepreinforce-ai/CU

2025-12-03

Absolute Zustimmung zu dem Crashout über #Raytracing. So gut es aussehen kann, denn das ist nicht über all der Fall, ist der Performance Impact letztlich einfach zu groß. RT ist auch der Grund, warum die #UnrealEngine5 so schlecht läuft, da ihr RT Lumen die Standard Licht Engine fast komplett ersetzt hat. Ich hätte viel lieber mehr #CUDA Cores bzw. mehr Shader Einheiten, als auch nur einen RT Core.

https://www.youtube.com/watch?v=KMXXcqpFZzo

Miguel Afonso Caetanoremixtures@tldr.nettime.org
2025-12-03

"If you go back a year or two, you might make the case that Nvidia had three moats relative to TPUs: superior performance, significantly more flexibility due to GPUs being more general purpose than TPUs, and CUDA and the associated developer ecosystem surrounding it. OpenAI, meanwhile, had the best model, extensive usage of their API, and the massive number of consumers using ChatGPT.

The question, then, is what happens if the first differentiator for each company goes away? That, in a nutshell, is the question that has been raised over the last two weeks: does Nvidia preserve its advantages if TPUs are as good as GPUs, and is OpenAI viable in the long run if they don’t have the unquestioned best model?

Nvidia’s flexibility advantage is a real thing; it’s not an accident that the fungibility of GPUs across workloads was focused on as a justification for increased capital expenditures by both Microsoft and Meta. TPUs are more specialized at the hardware level, and more difficult to program for at the software level; to that end, to the extent that customers care about flexibility, then Nvidia remains the obvious choice.

CUDA, meanwhile, has long been a critical source of Nvidia lock-in, both because of the low level access it gives developers, and also because there is a developer network effect: you’re just more likely to be able to hire low level engineers if your stack is on Nvidia. The challenge for Nvidia, however, is that the “big company” effect could play out with CUDA in the opposite way to the flexibility argument. While big companies like the hyperscalers have the diversity of workloads to benefit from the flexibility of GPUs, they also have the wherewithal to build an alternative software stack. That they did not do so for a long time is a function of it simply not being worth the time and troube..."

stratechery.com/2025/google-nv

#AI #GenerativeAI #Nvidia #Google #ChatGPT #OpenAI #LLMs #Chatbots #CUDA #GPUs #TPUs

AI Daily Postaidailypost
2025-12-02

NVIDIA is pouring $2 billion into Synopsys, marrying its CUDA stack with industry‑leading EDA tools. The move could accelerate AI‑driven chip design, digital twins, and semiconductor innovation. What does this mean for developers and the open‑source hardware community? Dive in to find out.

🔗 aidailypost.com/news/nvidia-in

2025-12-02

AI 칩 전쟁의 역설: Google TPU는 Nvidia보다 빠른데 왜 안 팔릴까?

Google TPU는 Nvidia GPU보다 2배 효율적인데 왜 안 팔릴까요? AI 칩 전쟁의 핵심은 하드웨어가 아니라 소프트웨어 생태계였습니다. Google의 10년 장기전을 분석합니다.

aisparkup.com/posts/7052

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst