Henry Saputra

Working on intersection of system and machine learning

github.com/hsaputra

Henry Saputra boosted:
2025-05-31
Henry Saputra boosted:
2025-05-30

Microsandbox: Virtual Machines that feel and perform like containers
github.com/microsandbox/micros
#ycombinator

Henry Saputra boosted:
Ars Technicaarstechnica
2025-05-30

RFK Jr.‘s fluoride ban would ruin 25 million kids’ teeth, cost $9.8 billion
The modeling estimates don't account for other costs, like parents' lost work.
arstechnica.com/health/2025/05

2025-05-28

Efficiently Scaling Transformer Inference

arxiv.org/abs/2211.05102

2025-05-28

Data splitting to avoid information leakage with DataSAIL | Nature Communications
nature.com/articles/s41467-025

2025-05-27

Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

arxiv.org/abs/2303.14151v1

Henry Saputra boosted:
2025-05-26
2025-05-26

Agent Distillation transfers reasoning and task-solving capabilities from large language models to smaller models using enhanced prompts and self-consistent actions, matching performance of larger models on various reasoning tasks.

Paper page - Distilling LLM Agent into Small Models with Retrieval and Code Tools
huggingface.co/papers/2505.176

2025-05-26

Oh, Hello, what is this ...

Red Hat Launches the llm-d Community, Powering Distributed Gen AI Inference at Scale
redhat.com/en/about/press-rele

2025-05-26

NVIDIA Dynamo Accelerates llm-d Community Initiatives for Advancing Large-Scale Distributed Inference | NVIDIA Technical Blog
developer.nvidia.com/blog/nvid

2025-05-26

Announcing the llm-d community! | llm-d
llm-d.ai/blog/llm-d-announce

2025-05-26

llm-d is a Kubernetes-native high-performance distributed LLM inference framework

llm-d · GitHub
github.com/llm-d

2025-05-25

TIL - Amazon SageMaker HyperPod recipes help customers get started with training and fine-tuning popular publicly available foundation models in just minutes, with state-of-the-art performance

github.com/aws/sagemaker-hyper

2025-05-25

KernelLLM, a large language model based on Llama 3.1 Instruct, which has been trained specifically for the task of authoring GPU kernels using Triton. KernelLLM translates PyTorch modules into Triton kernels and was evaluated on KernelBench-Triton

huggingface.co/facebook/Kernel

Henry Saputra boosted:
2025-05-25

Performance of Confidential Computing GPUs

#CUDA #Security #LLM #Performance

hgpu.org/?p=29912

2025-05-25

Gemma 3n model overview  |  Google AI for Developers
ai.google.dev/gemma/docs/gemma

2025-05-25

AI Hallucination Cases Database – Damien Charlotin
damiencharlotin.com/hallucinat

Henry Saputra boosted:
2025-05-25
2025-05-25

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst