Lmst

Tilus: A Tile-Level GPGPU Programming Language for Low-Precision Computation

#CUDA #PTX #Triton #ProgrammingLanguages #Package

https://hgpu.org/?p=30481

📝 Pics in Weblog Entries https://otaviocc.weblog.lol/2025/12/pics-in-weblog-entries

#statuslog #omglol #triton #somepics #weblog

I rebuilt FlashAttention in Triton to understand the performance archaeology

https://aminediro.com/posts/flash_attn/

#HackerNews #FlashAttention #Triton #Performance #Archaeology #Rebuild #TechInnovation #MachineLearning

Ускоряем LLM по максимуму. Как я создал кросс-платформенный Flash Attention с поддержкой Turing+ архитектур и не только

На сегодняшний день трансформеры правят балом хайпа в мире машинного обучения, особенно после появления ChatGPT и ему подобных языковых моделей. Это стало возможным благодаря лежащему в основе их архитектуры механизму внимания (attention), однако он же и является слабым местом с точки зрения производительности и потребления памяти. Хотя в связи с этим и была разработана изящная концепция Flash Attention (Tri Dao), её существующие реализации имеют ряд ограничений. Поэтому представляю вашему вниманию первую и единственную open-source реализацию Flash Attention 2 на Triton с поддержкой Linux и Windows, Turing-Blackwell архитектур (теперь можно работать в Google Colab и Kaggle), гомо и гетерогенных кластеров, опциональным детерминизмом, а также возможностью ручной кастомизации ядер (kernels) для более гибкой настройки под каждую GPU архитектуру отдельно. Более подробно о том как это устроено и не только — далее в статье.

https://habr.com/ru/articles/976576/

#машинное_обучение #transformers #трансформеры #внимание #attention #flashattention #triton #большие_языковые_модели #llm #оптимизация_производительности

Saturn and Titan - NASA/ESA JWST Webb Space Telescope 🪐

#astronomy #esa #jwst #nasa #saturn #space #triton

▶️ 1 new picture from @andrealuck https://commons.wikimedia.org/wiki/File:Saturn_and_Titan_-_NASA-ESA_JWST_Webb_Space_Telescope_%2854983814283%29.jpg

Saturn_and_Titan_-_NASA-ESA_JWST_Webb_Space_Telescope_(54983814283).jpg

Accelerating Molecular Simulations with Triton: Fused GPU Kernels for TensorNet Neural Potentials

#Triton #CUDA #MolecularDynamics #MD #MolecularSimulations #PyTorch #Chemistry #Biology

https://hgpu.org/?p=30453

TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization

#Triton #CUDA #PyTorch #Package

https://hgpu.org/?p=30450

tritonBLAS: Triton-based Analytical Approach for GEMM Kernel Parameter Selection

#Triton #BLAS #GEMM #AMD #ROCm #HPC #Performance #Package

https://hgpu.org/?p=30441

Decoupled Triton: A Block-Level Decoupled Language for Writing and Exploring Efficient Machine-Learning Kernels

#Triton #Compilers #MachineLearning #ML #Thesis

https://hgpu.org/?p=30439

QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

#Triton #CUDA #AI #CodeGeneration #LLM

https://hgpu.org/?p=30413

KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit

#Triton #CUDA #LLM #CodeGeneration

https://hgpu.org/?p=30412

Iris: First-Class Multi-GPU Programming Experience in Triton

#Triton #HIP #CUDA #Package

https://hgpu.org/?p=30375

The Anatomy of a Triton Attention Kernel

#Triton #HIP #CUDA #LLM #Performance

https://hgpu.org/?p=30371

Wow a first for me. A USN MQ-4 squawking 7600 at FL502 14 hours into first flight off Okinawa. 7600 mean a radio issue #RQ4 #drone #HALE #Triton #USN #MQ4C

NVIDIA ASR được phục vụ hiệu quả nhất tại quy mô lớn bằng khung nào? Vllm, triton...? Cấu hình nào tốt cho batching? #NVIDIA #ASR #AI #TríTuệNhânTạo #PhátNgàyÀo #TiếngAnh #ViệtNam #CôngNghệ #Triton #VLLM

https://www.reddit.com/r/LocalLLaMA/comments/1orp997/best_way_to_serve_nvidia_asr_at_scale/

🎹 #TRITON #vstplugin by #KORG on #PluginBoutique, for my fellow #musicproducers, 43% off #sale limited time, buy it to get one of two vsts free!

https://www.pluginboutique.com/product/1-Instruments/4-Synth/11203-TRITON-TRITON-Extreme?a_aid=63375bfdb595f

#music #musicproduction #vstplugins #blackfridaysales

#news ⚡ Triton sieht Umbruch der deutschen Industrie als Kaufchance: Nach dem Verkauf mehrerer deutscher Beteiligungen und dem Erwerb einer Bosch-Sparte will das Private-Equity-Haus Triton wieder verstär... https://hubu.de/?p=298588 | #industrie #kaufchance #triton #umbruch #hubu

Introducing the most riveting tale of all time: the #love #story between a #GPU and its #graphics, sprinkled with just enough #Triton #jargon to make you nod off faster than a PyTorch Profiler. We've got #parallelism, pheromones, and more tangents than a high school geometry class 💤. Pack your bags, folks, because we're going on an #adventure through a sea of terrifying colors and kernels that nobody asked for! 🚀🌈
https://ut21.github.io/blog/triton.html #HackerNews #ngated

"The G in GPU is for Graphics damnit "

https://ut21.github.io/blog/triton.html

#HackerNews #The #G #in #GPU #is #for #Graphics #damnit #GPU #Graphics #TechTalk #HackerNews #Triton

🌘 GPU 中的 G 代表圖形：Triton 核心、剖析、平行處理與更多
➤ 運用 Triton 語言，解鎖 GPU 圖形處理單元（GPU）的真實潛力
✤ https://ut21.github.io/blog/triton.html
本文深入探討了使用 NVIDIA Triton 語言開發高效 GPU 核心的過程，特別是在模擬黏菌（Physarum）生長模型時。作者分享了從背景知識、模型理解、PyTorch 實作，到利用 Triton 進行核心優化的實踐經驗。透過 Triton，作者成功將原本在 PyTorch 中效率不彰的運算轉換為 GPU 上的高效執行，並利用 PyTorch Profiler 進行效能分析，展示了 Triton 在 GPU 程式開發中的潛力與優勢，尤其是在需要大量平行運算的場景下。
+ 這篇文章對於 Triton 的介紹很棒，讓我對如何在 GPU 上寫出高效能的程式有了新的認識。
+ 非常喜歡作者將黏菌模擬與 GPU 優化結合的方式，概念很有趣，技術細節也很紮實。
#GPU 計算 #Triton #PyTorch #效能優化 #物理模擬

#TRITON

Client Info