Lmst

#3DFX dominaba en su día con su #API propietaria #Glide. El desarrollo de #OpenGL (y más tarde de #Direct3D para atarlos a todos en las Ventanas) permitió a otros fabricantes de tarjetas gráficas como #NVIDIA que los desarrolladores de software pudieran utilizar hardware alternativo manteniendo la compatibilidad.

Es irónico que NVIDIA ahora haga con #CUDA lo mismo que hacía 3DFx con Glide. Bueno, no lo es. Es el cuento de siempre.

Comparing Parallel Functional Array Languages: Programming and Performance

#OpenCL #HIP #CUDA #Performance #Package

https://hgpu.org/?p=29901

Can Large Language Models Predict Parallel Code Performance?

#CUDA #OpenMP #LLM #Performance #Benchmarking #Package

https://hgpu.org/?p=29903

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

#CUDA #SSD #Performance #Package

https://hgpu.org/?p=29902

Exploration of Cryptocurrency Mining-Specific GPUs in AI Applications: A Case Study of CMP 170HX

#OpenCL #CUDA #Performance #Benchmarking

https://hgpu.org/?p=29900

GPU Performance Portability needs Autotuning

#HIP #CUDA #Autotuning #Performance

https://hgpu.org/?p=29899

🌗 RightNow AI：加速 CUDA 程式碼的最佳方案
➤ 人工智慧驅動的 CUDA 核心程式碼優化
✤ https://www.rightnowai.co/
RightNow AI 是一個利用人工智慧技術，自動分析、偵測瓶頸並最佳化 CUDA 核心程式碼的平臺。它支援所有主要的 NVIDIA 架構，提供伺服器端 GPU 分析、簡化提示生成優化 CUDA 核心程式碼等功能，能將效能提升 2-20 倍。平臺提供免費、Pro 和 Enterprise 三種方案，滿足不同規模的團隊需求，並提供常見問題解答與聯絡方式。
+ 「我一直被一個緩慢的 CUDA 核心程式碼困擾了很久，RightNow AI 在幾分鐘內就完成了最佳化，速度提升了三倍！這簡直太棒了！」
+ 「過去嘗試過三種不同的優化工具，但 RightNow AI 纔是真正能帶來改變的利器。我們的推論程式碼速度提升了 20 倍！」
#人工智慧 #CUDA #GPU 最佳化 #效能提升

[Перевод] Объяснение графических процессоров для тех, кто привык работать с ЦП

За годы работы я подробно изучил, как центральные процессоры (CPU) выполняют код и как они устроены внутри. Дело в том, что я участвовал в разработке ядра Linux и ScyllaDB, а этот код очень близок к металлу. Я даже немного баловался с Verilog, безрезультатно попытавшись собрать моё собственное ядро RISC-V. Графические процессоры (GPU) в отличие от обычных в основном оставались для меня чёрным ящиком, несмотря на то, что поработать с ними всё-таки довелось. Помню, что экспериментировал с NVIDIA RIVA 128 или чем-то подобным, проверяя, как там работает DirectX. Тогда такие процессоры ещё не выделялись на фоне ускорителей 3D-графики. Я также пытался идти в ногу со временем и немного упражнялся в программировании элементарных шейдеров на современных GPU. Но я никогда глубоко не вдавался в работу с GPU, и мои взгляды можно назвать CPU-центричными. Однако, поскольку сегодня наблюдается всплеск рабочих нагрузок, связанных с ИИ, и, в частности, приходится работать с большими языковыми моделями (БЯМ), графические процессоры становятся незаменимыми для современных вычислений. К задачам, решаемым с применением ИИ, относятся масштабные прикладные тензорные операции, в том числе — сложение и перемножение матриц. А это уже работа для GPU. Но как современный GPU выполняет их, и насколько при этом возрастает эффективность по сравнению с выполнением таких же рабочих нагрузок на CPU?

https://habr.com/ru/companies/timeweb/articles/909122/

#timeweb_статьи_перевод #gpu #linux #scylladb #nvidia #cpu #процессор #искусственный_интеллект #cuda #simd

Open source at last: Warp, Nvidia's Python framework for CUDA

Following criticism from the community, Nvidia has decided to switch to the Apache 2 license with the Warp framework.

https://www.heise.de/en/news/Open-source-at-last-Warp-Nvidia-s-Python-framework-for-CUDA-10381872.html?wt_mc=sm.red.ho.mastodon.mastodon.md_beitraege.md_beitraege&utm_source=mastodon

#CUDA #Framework #Grafikchip #IT #MachineLearning #Mathematik #Nvidia #Physik #Python #news

Endlich Open Source: Warp, das Python-Framework von Nvidia für CUDA

Nach Kritik aus der Community hat sich Nvidia entschlossen, mit dem Warp-Framework in die Apache-2-Lizenz zu wechseln.

https://www.heise.de/news/Endlich-Open-Source-Warp-das-Python-Framework-von-Nvidia-fuer-CUDA-10381641.html?wt_mc=sm.red.ho.mastodon.mastodon.md_beitraege.md_beitraege&utm_source=mastodon

#CUDA #Framework #Grafikchip #IT #MachineLearning #Mathematik #Nvidia #Physik #Python #news

China’s few remaining weaknesses:

Manufacturing the most advanced chips is a still an issue because ”China lacks a domestic alternative to the cutting-edge lithography tools produced by ASML, a Dutch company. … chip designers [are] reliant on SMIC, a state-owned foundry. “

China may initially struggle awhile with “… the software used by coders to program chips. Nvidia’s platform, called CUDA, is still by far the best in the world. Nearly all AI developers learn how to use it. And it works only with Nvidia’s chips. Switching to an alternative is costly, because it pulls developers out of an enormous network of fellow users that can help solve problems.”

“Huawei has created a substitute for CUDA, called CANN …But the software is years behind Nvidia’s [It’s buggy and unenthusiastic used by local techies. But,] Huawei has beaten the odds before, and it may well do the same again with CANN.”

#china #technology #CUDA
https://economist.com/business/2025/05/08/huawei-and-other-chinese-chip-firms-are-catching-up-fast?giftId=28b8f2dc-d227-467f-b526-9b3917e158b0&utm_campaign=gifted_article

I will be at the end of my PC build's lifecycle for local #LLM inference when #CUDA 13 arrives.

I wonder how many other #Linux users, who extend the life of old #PC #hardware, will be facing the same issue.

Time for a poll! 2/

#Nvidia

Like many #Linux users, I use that #OS to extend old #PC #hardware lifecycle. My #Dell #Optiplex 9020 includes a Pascal based #GPU.

My issue is I cannot install a more modern GPU because of physical constraints with the chassis itself.

>Maxwell, Pascal, and Volta architectures are now feature-complete with no further enhancements planned. ... Users should plan migration .., as future toolkits will be unable to target [these] GPUs. 1/

https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions

#Nvidia #CUDA

Kuwait ha ufficialmente dichiarato il mining di criptovalute "illegale e non autorizzato",il consumo energetico nella zona è diminuito del 55% in una sola settimana.
.
#kuwait #energy #electricity #MiningNews #mining
.
Quante #cuda da utilizzare per la ricerca scientifica, utilizzate invece per il #denaro.
.
#foldingathome #boinc #Berkley #scientificresearch

Distesa di GPU Nvidia per il Mining di criptovalute in una abitazione in Kuwait

Faster sorting with SIMD CUDA intrinsics (2024)

Link: https://winwang.blog/posts/bitonic-sort/
Discussion: https://news.ycombinator.com/item?id=43898717

#cuda

🎉 Oh joy, another thrilling journey into the depths of #CUDA #intrinsics, as our brave author heroically tackles the burning issue of sorting faster! 🚀 Because who wouldn't want a detailed dissertation on an #algorithm called "bitonic sort" that promises to revolutionize... absolutely nothing in your daily life. 😅
https://winwang.blog/posts/bitonic-sort/ #BitonicSort #TechJourney #SortingAlgorithms #HackerNews #ngated

Faster sorting with SIMD CUDA intrinsics (2024)

https://winwang.blog/posts/bitonic-sort/

#HackerNews #FasterSorting #SIMD #CUDA #Intrinsics #BitonicSort #TechInnovation #2024

🌘 使用 SIMD CUDA intrinsic 進行更快速的排序
➤ CUDA 與 SIMD 技術加速排序效能
✤ https://winwang.blog/posts/bitonic-sort/
本文探討瞭如何利用 SIMD (Single Instruction, Multiple Data) 和 CUDA intrinsic 加速排序演算法，特別是 Bitonic Sort。作者分享了在 Recurse Center 的項目經驗，並詳細介紹了 Bitonic Sort 的原理、SIMD 程式設計的基本概念以及 CUDA 實作中的關鍵技巧。透過使用 CUDA 的 `__shfl_sync` 指令，作者成功實現了 30% 以上的效能提升。文章闡述了 Bitonic Sort 如何將排序問題分解成可高度並行的操作，並說明瞭 SIMD 技術如何加速這些操作，尤其是在 GPU 環境下。
+ 真是篇深入的文章！對想了解 GPU 排序和 SIMD 程式設計的人來說很有幫助，而且作者的解釋非常清晰易懂。
+ 我一直覺得 GPU 排序很複雜，這
#GPU #CUDA #排序演算法 #SIMD