Today at #DIFF @lk108 is trying to build @deltatouch on my Pocket
Today at #DIFF @lk108 is trying to build @deltatouch on my Pocket
On my way to #DIFF in Freiburg for a week of hacker time!
Tomorrow our ten day #DIFF gathering in Freiburg in the black forest starts. Around 40 people from various projects, contexts and countries are joining ... And would you believe it? There is no date/clock scheduling to speak of, no hour slots etc
On Friday the 13th we might do live streaming sessions with talks and demos but the focus generally is on in-person hanging out and work, in a relaxed manner. Its still possible to join if you are of the spontaneous kind https://delta.chat/en/2025-05-12-diff-invitation
用 Wikidata 紀錄人口普查資訊:奈及利亞社群的貢獻經驗
伊博維基媒體用戶組進行了奈及利亞人口資料的編輯,將 1991 年與 2006 年的人口普查資訊收錄進 Wikidata 中,並進行比較跟分析。
在 Wikidata 中可以透過屬性人口(P1082)、男性人口(P1540)以及女性人口(P1539),三個屬性作人口資料的紀錄。透過 Wikidata 機讀資料的特性,能夠將人口資料轉化為更容易分析與研究的資料。
在臺灣社群參與者也時常更新各行政區的人口普查資料,也歡迎各資料愛好社群夥伴一起來協助更新與維護。
diff 原文 Documenting Nigerian Census Data on Wikidata: My Contributor Experience:https://diff.wikimedia.org/2025/06/03/documenting-nigerian-census-data-on-wikidata-my-contributor-experience/
DiffX – Next-Generation Extensible Diff Format
#HackerNews #DiffX #Extensible #Diff #Format #NextGen #Diff #Tools #Tech #Innovation
SQL Workbench – Republicans not welcome
https://www.sql-workbench.eu/
#ycombinator #sql #query #tool #analyzer #gui #jdbc #database #isql #viewer #frontend #java #dbms #oracle #postgres #h2database #firebirdsql #hsql #hsqldb #sqlplus #replacement #import #export #csv #unload #convert #insert #blob #clob #xml #etl #migrate #compare #diff #structure #table
@petarov I like Eskil - https://eskil.tcl-lang.org/home/doc/trunk/htdocs/index.html - source and binaries for windows/linux/macos available at https://eskil.tcl-lang.org/home/doc/trunk/htdocs/download.html .
#diff #merge #gui #sourcecontrol #programming
Compare Files Line by Line Using diff Command in Linux: A Beginner's Tutorial #diff #linuxcommands #linuxbasics #linuxhowto #linux
https://ostechnix.com/linux-diff-command-tutorial/
(As a side-note on this topic: I'd like anyone who is very into the idea that "GNU/Linux/POSIX ecosystem software is so great because of the philosophy of the tool doing one thing only and doing it very well" explain to me why diff
is shot through with options that do things like output C preprocessor macros. 😉 )
#WinMerge 2.16.48.2 has been released (#ComparisonSoftware / #ComparisonTools / #DataComparison / #DiffViewer / #Diff / #DiffTool) https://winmerge.org/
New Git diff alternative just dropped! 🔥
🎸 **riff** — A diff filter highlighting which line parts have changed.
💯 Supports highlighting conflict markers, merge commits & more!
🦀 Written in Rust!
⭐ GitHub: https://github.com/walles/riff
#rustlang #git #diff #highlight #commandline #vcs #development #programming #terminal
@lemming Das Warning auf deiner Hotelrechnung bedeutet, dass es 'nen kleinen Rechenunterschied (-0,67 Euro) bei der Netto-Summe einer Steuer gab, vermutlich wegen Rundungen oder so. Nichts Wildes, für dich.
Jedenfalls sollte die Warnung genau das ausdrücken.
Warning w-#DIFF
: Dies ist ein Hint, dass ein Unterschied festgestellt wurde.
TAX_A
: Steuerkategorie, die in der Kassenbuchung verwendet wird. Vermute A ist die Standardkaterogie.
Auf meiner Rechnung des #Intercity-Hotels, welches ich über die #Easterhegg gebucht hatte, findet sich folgende Zeile. Gehört zu den TSE-Informationen, wie Start, Stop, Signatur etc:
Warning w-#DIFF -0.67 TAX[TAX_A].Net-computed
Kann jemand erklären was das Warning bedeutet?
Just added difftastic to #guix on the #rust team branch. It's a really good 'structural diff' which can provide a more human-readable diff.
Thanks to the contributor for sending it. I've found it really helpful for looking at package changes as I've been updating it.
#the smartest ai in the room? #diff #chatgpt #openwebui #rag
Retrieval-Augmented Generation (RAG) & OpenWebUI
If you're looking to build an always-on AI pipeline that spiders/scrapes, processes federated feeds, and performs market sentiment analysis, RAG and OpenWebUI are great tools to integrate into your workflow.
🔹 Retrieval-Augmented Generation (RAG)
RAG is a technique that enhances AI responses by retrieving relevant external data in real-time. Instead of relying solely on a trained model, RAG dynamically pulls the most up-to-date, contextually relevant info before generating a response.
✅ Why Use RAG?
Keeps AI responses fresh (e.g., integrating real-time web data, document stores, APIs).
Uses vector databases (e.g., FAISS, Chroma, Weaviate, Milvus) to store and retrieve relevant content.
Efficiently handles long-term memory without needing massive model retraining.
✅ How RAG Helps Your Project:
Web Scraping + RAG: Store indexed pages and retrieve the most relevant insights.
Federated Feeds + RAG: Organize decentralized data from multiple sources (RSS, Mastodon, real-time news).
Market Sentiment Analysis + RAG: Combine scraped financial news, social media posts, and reports to extract insights.
🚀 Tools to Build RAG Pipelines:
LangChain – Framework for LLM + RAG automation.
LlamaIndex (GPT Index) – Connects LLMs to external knowledge bases.
FAISS / ChromaDB – Vector DBs to store and retrieve relevant chunks of data.
🔹 OpenWebUI
OpenWebUI is a powerful open-source interface for managing AI models locally and remotely.
✅ Why Use OpenWebUI?
Acts as a self-hosted ChatGPT-style UI for managing AI.
Works with open-source models (LLama, Mistral, Mixtral, etc.).
Supports API integration, making it easy to use with your own data sources.
Enables fine-tuned control over how AI processes and responds to incoming queries.
💡 Use Case for Your Setup:
Run OpenWebUI on your VPS to interact with AI models processing real-time web feeds.
Automate RAG-based queries (e.g., “Summarize top financial news in the last hour”).
Host custom fine-tuned LLMs that work with your scraped/federated data.
🔹 Other Open-Source AI Enablers
Here are some key tools to power your continuous AI scraping & analysis pipeline:
✅ Web Scraping & Spidering
Scrapy – Python-based web crawling framework.
BeautifulSoup & Selenium – HTML parsing & JS rendering.
trafilatura – Extracts structured text from raw web pages.
✅ Federated Feeds & Decentralized Data
RSSHub – Open-source RSS generator for sites without feeds.
Fediverse APIs – Scrape data from Mastodon, Lemmy, etc.
nitter/rss-bridge – Scrapes Twitter/X, YouTube, and other social platforms.
✅ Market Sentiment Analysis
FinBERT – NLP model for financial text sentiment analysis.
LlamaIndex + Alpha Vantage/NewsAPI – Fetch & analyze financial news.
Twitter/X & Reddit sentiment pipelines (with OpenAI/Mistral fine-tuning).
✅ Storage & Retrieval
FAISS / ChromaDB – For storing vector embeddings of scraped text.
DuckDB / SQLite – For structured storage of scraped + analyzed data.
Redis / Weaviate – Fast key-value storage for real-time analysis.
✅ LLMs & Inference APIs
Ollama – Locally run models like Mistral, Llama, Gemma, etc.
vLLM / TGI (Text Generation Inference) – Optimized inference engine for LLMs.
LoRA + Fine-tuning – Improve responses based on your scraped dataset.
🔹 How Much Storage Do You Need?
Storage depends on how much data you buffer for RAG and sentiment analysis.
📌 Baseline Estimate:
Data Type Storage Estimate (Per Month)
Text-based Web Scraping 10GB – 100GB
Federated Feeds (RSS/Mastodon) 5GB – 50GB
Financial News & Social Media Feeds 50GB – 500GB
Vector Embeddings (FAISS/ChromaDB) 20GB – 200GB
💡 Optimal Setup:
A minimum of 500GB – 1TB of SSD/NVMe storage for a scalable pipeline.
A distributed setup with multiple VPS handling different tasks (crawling, inference, storage, retrieval).
Cold storage/archive VPS for storing older scraped data efficiently.
🔹 Can You Run This Continuously on a Few VPS?
Yes! A multi-VPS setup is a great way to distribute the workload.
🔹 Recommended Architecture:
1️⃣ Scraping VPS → Runs Scrapy/Selenium to collect web & social media feeds.
2️⃣ Storage VPS → Stores scraped text, embeddings, and structured data (FAISS/Chroma).
3️⃣ AI Inference VPS → Hosts LLMs (OpenWebUI, Ollama) to analyze market sentiment in real-time.
4️⃣ Dashboard VPS → Uses OpenWebUI + custom scripts to visualize and interact with processed data.
⚡ Performance Considerations:
Dedicated GPU VPS (or local GPU) for inference speedup.
High-memory VPS (16GB+ RAM) for running vector DBs efficiently.
Fast NVMe SSDs for indexing and retrieval speeds.
🔹 Summary
🚀 With a few VPS instances, you can run an always-on AI pipeline that scrapes, indexes, and analyzes web data using RAG, OpenWebUI, and federated sources. Storage needs start at ~500GB but scale based on data volume.
💡 Next Steps?
Decide on which LLM to run (e.g., Llama, Mixtral, Mistral fine-tuned for finance/news).
Choose a vector database for RAG-based retrieval.
Set up separate VPS roles for scraping, storage, inference, and dashboarding.
Want help setting up a pipeline? Let me know how deep you want to go! 🔥
開發 Git 的人所推薦的 .gitconfig 設定
在 Hacker News 上看到這個有趣的主題:「How Core Git Developers Configure Git (gitbutler.com)」,原文則是在「How Core Git Developers Configure Git」這邊。
裡面提到的東西我自己大多數都已經 tune 過了,不過還是有些之前沒設定的,像是 tag.sort 用 version:refnam
#Computer #Murmuring #Software #algorithm #config #diff #git #gitconfig #sort #tag