#TensorRT

2025-05-20

Computex 2025: NVIDIA und Microsoft stärken KI-Funktionalitäten auf RTX AI PCs und Azure
Auf der Computex 2025 in Taipei und im Rahmen der Microsoft Build 2025 haben NVIDIA und Microsoft eine Reihe technischer Neuerungen vorg
xboxdev.com/computex-2025-nvid
#COMPUTEX2025 #Entwicklung #Event #AzureAIFoundry #BXDXO #COMPUTEX2025 #DLSS4 #FLux1 #MicrosoftBuild2025 #NIMMicroservices #RTXAIPCs #TensorRT #WindowsML

2025-01-14

Как просто добавить ИИ в приложения на Rust: универсальный опенсорсный инструмент

Системный разработчик ИТ-компании «Криптонит» написал статью про новый инструмент на Rust, который облегчает запуск моделей машинного обучения и их внедрение в приложения. Дальше публикуем текст от первого лица. Статья написана по материалам выступления Михаила на RustCon 2024. Посмотреть видеозапись доклада можно в VK Видео .

habr.com/ru/companies/kryptoni

#rust #библиотека #машинное_обучение #ml #модели #triton #deepstream #tensorrt #cuda #ии

PPC Landppcland
2024-12-20

Bing optimizes search speed with TensorRT-LLM, cutting model latency by 36 percent: Microsoft's Bing search engine implements TensorRT-LLM optimization, reducing inference time and operational costs for language models. ppc.land/bing-optimizes-search

Judith van Stegerenjd7h@fosstodon.org
2024-10-19

Fitting an LLM on a GPU is a bit like photography. Model weights = film sensitivity, activation size = shutter speed, I/O tensors = aperture. These 3 dials control your model's memory footprint, just as they shape a photo's exposure.

Just realised this while trying to fit Llama 3.1 on my 24GB GPU with TRT-LLM: nvidia.github.io/TensorRT-LLM/.

#llms #genai #llama #gpu #nvidia #trtllm #tensorrt

Judith van Stegerenjd7h@fosstodon.org
2024-10-17

Many companies are currently scrambling for ML infra engineers. They need people that know how to manage AI infrastructure, and that can seriously speed up training and inference with specialized tooling like vLLM, Triton, TensorRT, Torchtune, etc.

#inference #training #genai #triton #vllm #pytorch #torchtune #tensorrt #nvidia

GenAINews.coGenAINews_top
2024-08-16

Check out the latest release of NVIDIA TensorRT Model Optimizer v0.15! This toolkit includes techniques like quantization and sparsity to optimize inference speed for generative AI models.

developer.nvidia.com/blog/nvid

2023-12-12

Note to self: #NVIDIA have an open-source inference server for machine learning models. (They mostly sell SaaS on top of it)

Supports #TensorFlow, #PyTorch, #ONNX, #TensorRT, #mxnet.

Runs on #k8s. Features queue control, monitoring.

Triton Inference Server github.com/triton-inference-se

KINEWS24KiNews
2023-09-09
Alistair Buxtonali1234
2023-03-27

Why does have four different installation methods? Two of which will mess up your system in different extremely-hard-to-fix ways, one which won't work on any SRU distribution, and only one that works at all? It's like they are trying to make it as hard as possible to install.

stackoverflow.com/questions/75

github.com/NVIDIA/TensorRT/iss

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst