#KServe

Excited to share that I am speaking at KubeCon Europe in London next month! Looking forward to catching up with friends and collaborators! You can find me at the following sessions ๐Ÿงต #KubeCon #CloudNativeCon #CloudNative #Kubernetes #DevOps #MLOps #AI #K8s @kubernetes.io #KServe #Kubeflow @cncf.io

Adam :redhat: :ansible: :bash:maxamillion@fosstodon.org
2024-12-04

Achieve better large language model inference with fewer GPUs

"we achieved approximately 55-65% of the throughput on a server config that is approximately 15% of the cost"

redhat.com/en/blog/achieve-bet

#OpenShiftAI #RedHat #OpenShift #AI #Kubernetes #vllm #kubeflow #kserve

Get ready for KubeCon next week! Below are the three talks I'll be presenting! See you there and let's catch up! #KubeCon #CloudNativeCon #CloudNative #Kubernetes #DevOps #MLOps #AI #K8s @CloudNativeFdn @kubernetesio @kubeflow #KServe

๐ŸŽ„ Happy Holidays! KServe v0.12 release candidate is available! Try it out! https://github.com/kserve/kserve/releases/tag/v0.12.0-rc0 #KServe #kubernetes #MLOps #DevOps #CloudNative #Kubeflow #ModelServing #AI #MachineLearning @KnativeProject @LFAIDataFdn @CloudNativeFdn

Release v0.12.0-rc0 ยท kserve/k...

๐ŸŽ„ Happy Holidays! KServe v0.12 release candidate is available! Try it out! https://github.com/kserve/kserve/releases/tag/v0.12.0-rc0 #KServe #kubernetes #MLOps #DevOps #CloudNative #Kubeflow #ModelServing #AI #MachineLearning @KnativeProject @CloudNativeFdn

Release v0.12.0-rc0 ยท kserve/k...

udo m. rader โ˜• ๐Ÿ‡ช๐Ÿ‡บ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿงriaschissl@sigmoid.social
2024-11-14

An interesting talk here at #KubeCon by @terrytangyuan from #RedHat and Adam Tetelman from #NVIDIA on the many pitfalls of using LLMs in production.

#KServe and #Knative come to the rescue of many Day 2 problems, but there's still a lot to do.

And, as Adam Teleman said so well, this year's KubeCon could easily be called RAGCon, given how many talks there are about #RAG ๐Ÿ˜€

#KubeCon24 #AI #LLM

Adam Tetelman wearing an NVIDIA t-shirt, glasses, and a conference lanyard stands in front of a projection screen during a presentation. The slide shown is titled 'Inference Optimizations' and displays two categories: 'Inference Platform Features' (including Response Caching, Context Caching, and Inflight Batching) and 'Model Features' (including Multi-LoRA Loading, Just-in-time Compilation, and Ahead-of-time Compilation). The slide also includes a diagram showing sequence batching workflows. Adam is gesturing with both hands while explaining the content
Yuan Tang :redhat:terrytangyuan@fosstodon.org
2024-11-10

Get ready for KubeCon next week! Below are the three talks I'll be presenting! See you there! github.com/terrytangyuan/publi

- Cloud Native AI Day Keynote: Advancing Cloud Native AI Innovation Through Open Collaboration, sponsored by Red Hat

- Unlocking Potential of Large Models in Production with Adam Tetelman

- WG Serving: Accelerating AI/ML Inference Workloads on Kubernetes with Eduardo Arango

#KubeCon #CloudNativeCon #CloudNative #Kubernetes #DevOps #MLOps #AI #K8s #KServe #Kubeflow

Get ready for KubeCon next week! Below are the three talks I'll be presenting! See you there and let's catch up! #KubeCon #CloudNativeCon #CloudNative #Kubernetes #DevOps #MLOps #AI #K8s @kubernetes.io #KServe #Kubeflow

2024-05-13

Say Serve to LLM on OpenShift AI- OpenShiftโ€™s Multi-GPU Marvel with KServe | by faisal shah medium.com/@fassha08/say-serve
#OpenShift #aiml #llm #genai #kserve

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst