#quantized

Greg Donald 🤘gregdonald
2025-12-08

@OpenForum A small language model is one that uses 8-bit floats. Did I win?

aaron ~# :blinkingcursor:neuroexception@infosec.exchange
2025-09-10

Making the most out of a small LLM

Yesterday i finally built my own #AI #server. I had a spare #Nvidia RTX 2070 with 8GB of #VRAM laying around and wanted to do this for a long time.

The problem is that most #LLMs need a lot of VRAM and i don't want to buy another #GPU just to host my own AI. Then i came across #gemma3 and #qwen3. Both of these are amazing #quantized models with stunning reasoning given that they need so less resources.

I chose huihui_ai/qwen3-abliterated:14b since it supports #deepthinking, #toolcalling and is pretty unrestricted. After some testing i noticed that the 8b model performs even better than the 14b variant with drastically better performance. I can't make out any quality loss there to be honest. The 14b model sneaked in chinese characters into the response very often. The 8b model on the other hand doesn't.

Now i've got a very fast model with amazing reasoning (even in German) and tool calling support. The only thing left to improve is knowledge. #Firecrawl is a great tool for #webscraping and as soon as i implemented websearching, the setup was complete. At least i thought it was.

I want to make the most out of this LLM and therefore my next step is to implement a basic #webserver that exposes the same #API #endpoints as #ollama so that everywhere ollama is supported, i can point it to my python script instead. This way it feels like the model is way more capable than it actually is. I can use these advanced features everywhere without being bound to it's actual knowledge.

To improve this setup even more i will likely switch to a #mixture_of_experts architecture soon. This project is a lot of fun and i can't wait to integrate it into my homelab.

#homelab #selfhosting #privacy #ai #llm #largelanguagemodels #coding #developement

➴➴➴Æ🜔Ɲ.Ƈꭚ⍴𝔥єɼ👩🏻‍💻AeonCypher@lgbtqia.space
2025-05-18

Okay, Back of the napkin math:
- There are probably 100 million sites and 1.5 billion pages worth indexing in a #search engine
- It takes about 1TB to #index 30 million pages.
- We only care about text on a page.

I define a page as worth indexing if:
- It is not a FAANG site
- It has at least one referrer (no DD Web)
- It's active

So, this means we need 40TB of fast data to make a good index for the internet. That's not "runs locally" sized, but it is nonprofit sized.

My size assumptions are basically as follows:
- #URL
- #TFIDF information
- Text #Embeddings
- Snippet

We can store an index for 30kb. So, for 40TB we can store an full internet index. That's about $500 in storage.

Access time becomes a problem. TFIDF for the whole internet can easily fit in ram. Even with #quantized embeddings, you can only fit 2 million per GB in ram.

Assuming you had enough RAM it could be fast: TF-IDF to get 100 million candidated, #FAISS to sort those, load snippets dynamically, potentially modify rank by referers etc.

6 128 MG #Framework #desktops each with 5tb HDs (plus one raspberry pi to sort the final condidates from the six machines) is enough to replace #Google. That's about $15k.

In two to three years this will be doable on a single machine for around $3k.

By the end of the decade it should be able to be run as an app on a powerful desktop

Three years after that it can run on a #laptop.

Three years after that it can run on a #cellphone.

By #2040 it's a background process on your cellphone.

2023-08-30

'Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity', by Ali Kara, Naci Saldi, Serdar Yüksel.

jmlr.org/papers/v24/21-1457.ht

#quantization #quantized #mdps

New Submissions to TMLRtmlrsub@sigmoid.social
2023-04-27

Quantization Robust Federated Learning for Efficient Inference on Heterogeneous Devices

openreview.net/forum?id=lvevdX

#quantization #quantized #federated

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst