#DIFF

wakest ⁂liaizon@wake.st
2025-06-15

Today at #DIFF @lk108 is trying to build @deltatouch on my Pocket

A photo of lk108's hands on an MNT Pocket
wakest ⁂liaizon@wake.st
2025-06-10

On my way to #DIFF in Freiburg for a week of hacker time!

2025-06-06

Tomorrow our ten day #DIFF gathering in Freiburg in the black forest starts. Around 40 people from various projects, contexts and countries are joining ... And would you believe it? There is no date/clock scheduling to speak of, no hour slots etc

On Friday the 13th we might do live streaming sessions with talks and demos but the focus generally is on in-person hanging out and work, in a relaxed manner. Its still possible to join if you are of the spontaneous kind delta.chat/en/2025-05-12-diff-

Wikidata Taiwan 臺灣維基數據社群wikidatatw@liker.social
2025-06-06

用 Wikidata 紀錄人口普查資訊:奈及利亞社群的貢獻經驗

伊博維基媒體用戶組進行了奈及利亞人口資料的編輯,將 1991 年與 2006 年的人口普查資訊收錄進 Wikidata 中,並進行比較跟分析。
在 Wikidata 中可以透過屬性人口(P1082)、男性人口(P1540)以及女性人口(P1539),三個屬性作人口資料的紀錄。透過 Wikidata 機讀資料的特性,能夠將人口資料轉化為更容易分析與研究的資料。

在臺灣社群參與者也時常更新各行政區的人口普查資料,也歡迎各資料愛好社群夥伴一起來協助更新與維護。

diff 原文 Documenting Nigerian Census Data on Wikidata: My Contributor Experience:diff.wikimedia.org/2025/06/03/

#Wikidata #維基資料 #維基數據
#奈及利亞 #Nigerian #Igbo
#人口資料 #diff

奈及利亞人口資料編輯範例Wikidata 中人口資料編輯範例:三峽里 (Q61994580)
Hacker Newsh4ckernews
2025-06-04
2025-05-22

dpkg: warning: 'diff' not found in PATH or not executable #updates #dpkg #diff

askubuntu.com/q/1549226/612

Updated at haikuports/Haiku, KDiff3 1.12.3, available for 32bit and 64bit, enjoy! :)

#HaikuOS #KDE #Qt6 #KDiff3 #diff #compare

KDiff3 1.12.3 running on Haiku R1B5 64bit.
2025-05-01

Compare Files Line by Line Using diff Command in Linux: A Beginner's Tutorial #diff #linuxcommands #linuxbasics #linuxhowto #linux
ostechnix.com/linux-diff-comma

2025-04-29

(As a side-note on this topic: I'd like anyone who is very into the idea that "GNU/Linux/POSIX ecosystem software is so great because of the philosophy of the tool doing one thing only and doing it very well" explain to me why diff is shot through with options that do things like output C preprocessor macros. 😉 )

#linux #diff #gnu

Neustradamus :xmpp: :linux:neustradamus
2025-04-29
Orhun Parmaksız 👾orhun@fosstodon.org
2025-04-29

New Git diff alternative just dropped! 🔥

🎸 **riff** — A diff filter highlighting which line parts have changed.

💯 Supports highlighting conflict markers, merge commits & more!

🦀 Written in Rust!

⭐ GitHub: github.com/walles/riff

#rustlang #git #diff #highlight #commandline #vcs #development #programming #terminal

Ramses Revengeday ​:cv_purple: :revengeday:revengeday@corteximplant.com
2025-04-26

@lemming Das Warning auf deiner Hotelrechnung bedeutet, dass es 'nen kleinen Rechenunterschied (-0,67 Euro) bei der Netto-Summe einer Steuer gab, vermutlich wegen Rundungen oder so. Nichts Wildes, für dich.

Jedenfalls sollte die Warnung genau das ausdrücken.

Warning w-#DIFF: Dies ist ein Hint, dass ein Unterschied festgestellt wurde.

TAX_A: Steuerkategorie, die in der Kassenbuchung verwendet wird. Vermute A ist die Standardkaterogie.

2025-04-26

Auf meiner Rechnung des #Intercity-Hotels, welches ich über die #Easterhegg gebucht hatte, findet sich folgende Zeile. Gehört zu den TSE-Informationen, wie Start, Stop, Signatur etc:

Warning w-#DIFF -0.67 TAX[TAX_A].Net-computed

Kann jemand erklären was das Warning bedeutet?

2025-04-11

if you would send a #patch to me... what command would you use to make the #diff?

2025-03-28

Just added difftastic to on the team branch. It's a really good 'structural diff' which can provide a more human-readable diff.

Thanks to the contributor for sending it. I've found it really helpful for looking at package changes as I've been updating it.

difftastic.wilfred.me.uk/

2025-03-12

#the smartest ai in the room? #diff #chatgpt #openwebui #rag

Retrieval-Augmented Generation (RAG) & OpenWebUI

If you're looking to build an always-on AI pipeline that spiders/scrapes, processes federated feeds, and performs market sentiment analysis, RAG and OpenWebUI are great tools to integrate into your workflow.
🔹 Retrieval-Augmented Generation (RAG)

RAG is a technique that enhances AI responses by retrieving relevant external data in real-time. Instead of relying solely on a trained model, RAG dynamically pulls the most up-to-date, contextually relevant info before generating a response.

✅ Why Use RAG?

Keeps AI responses fresh (e.g., integrating real-time web data, document stores, APIs).
Uses vector databases (e.g., FAISS, Chroma, Weaviate, Milvus) to store and retrieve relevant content.
Efficiently handles long-term memory without needing massive model retraining.

✅ How RAG Helps Your Project:

Web Scraping + RAG: Store indexed pages and retrieve the most relevant insights.
Federated Feeds + RAG: Organize decentralized data from multiple sources (RSS, Mastodon, real-time news).
Market Sentiment Analysis + RAG: Combine scraped financial news, social media posts, and reports to extract insights.

🚀 Tools to Build RAG Pipelines:

LangChain – Framework for LLM + RAG automation.
LlamaIndex (GPT Index) – Connects LLMs to external knowledge bases.
FAISS / ChromaDB – Vector DBs to store and retrieve relevant chunks of data.

🔹 OpenWebUI

OpenWebUI is a powerful open-source interface for managing AI models locally and remotely.

✅ Why Use OpenWebUI?

Acts as a self-hosted ChatGPT-style UI for managing AI.
Works with open-source models (LLama, Mistral, Mixtral, etc.).
Supports API integration, making it easy to use with your own data sources.
Enables fine-tuned control over how AI processes and responds to incoming queries.

💡 Use Case for Your Setup:

Run OpenWebUI on your VPS to interact with AI models processing real-time web feeds.
Automate RAG-based queries (e.g., “Summarize top financial news in the last hour”).
Host custom fine-tuned LLMs that work with your scraped/federated data.

🔹 Other Open-Source AI Enablers

Here are some key tools to power your continuous AI scraping & analysis pipeline:

✅ Web Scraping & Spidering

Scrapy – Python-based web crawling framework.
BeautifulSoup & Selenium – HTML parsing & JS rendering.
trafilatura – Extracts structured text from raw web pages.

✅ Federated Feeds & Decentralized Data

RSSHub – Open-source RSS generator for sites without feeds.
Fediverse APIs – Scrape data from Mastodon, Lemmy, etc.
nitter/rss-bridge – Scrapes Twitter/X, YouTube, and other social platforms.

✅ Market Sentiment Analysis

FinBERT – NLP model for financial text sentiment analysis.
LlamaIndex + Alpha Vantage/NewsAPI – Fetch & analyze financial news.
Twitter/X & Reddit sentiment pipelines (with OpenAI/Mistral fine-tuning).

✅ Storage & Retrieval

FAISS / ChromaDB – For storing vector embeddings of scraped text.
DuckDB / SQLite – For structured storage of scraped + analyzed data.
Redis / Weaviate – Fast key-value storage for real-time analysis.

✅ LLMs & Inference APIs

Ollama – Locally run models like Mistral, Llama, Gemma, etc.
vLLM / TGI (Text Generation Inference) – Optimized inference engine for LLMs.
LoRA + Fine-tuning – Improve responses based on your scraped dataset.

🔹 How Much Storage Do You Need?

Storage depends on how much data you buffer for RAG and sentiment analysis.

📌 Baseline Estimate:
Data Type Storage Estimate (Per Month)
Text-based Web Scraping 10GB – 100GB
Federated Feeds (RSS/Mastodon) 5GB – 50GB
Financial News & Social Media Feeds 50GB – 500GB
Vector Embeddings (FAISS/ChromaDB) 20GB – 200GB

💡 Optimal Setup:

A minimum of 500GB – 1TB of SSD/NVMe storage for a scalable pipeline.
A distributed setup with multiple VPS handling different tasks (crawling, inference, storage, retrieval).
Cold storage/archive VPS for storing older scraped data efficiently.

🔹 Can You Run This Continuously on a Few VPS?

Yes! A multi-VPS setup is a great way to distribute the workload.

🔹 Recommended Architecture:
1️⃣ Scraping VPS → Runs Scrapy/Selenium to collect web & social media feeds.
2️⃣ Storage VPS → Stores scraped text, embeddings, and structured data (FAISS/Chroma).
3️⃣ AI Inference VPS → Hosts LLMs (OpenWebUI, Ollama) to analyze market sentiment in real-time.
4️⃣ Dashboard VPS → Uses OpenWebUI + custom scripts to visualize and interact with processed data.

⚡ Performance Considerations:

Dedicated GPU VPS (or local GPU) for inference speedup.
High-memory VPS (16GB+ RAM) for running vector DBs efficiently.
Fast NVMe SSDs for indexing and retrieval speeds.

🔹 Summary

🚀 With a few VPS instances, you can run an always-on AI pipeline that scrapes, indexes, and analyzes web data using RAG, OpenWebUI, and federated sources. Storage needs start at ~500GB but scale based on data volume.

💡 Next Steps?

Decide on which LLM to run (e.g., Llama, Mixtral, Mistral fine-tuned for finance/news).
Choose a vector database for RAG-based retrieval.
Set up separate VPS roles for scraping, storage, inference, and dashboarding.

Want help setting up a pipeline? Let me know how deep you want to go! 🔥

Gea-Suan Lingslin@abpe.org
2025-02-26

開發 Git 的人所推薦的 .gitconfig 設定

在 Hacker News 上看到這個有趣的主題:「How Core Git Developers Configure Git (gitbutler.com)」,原文則是在「How Core Git Developers Configure Git」這邊。

裡面提到的東西我自己大多數都已經 tune 過了,不過還是有些之前沒設定的,像是 tag.sort 用 version:refnam

blog.gslin.org/archives/2025/0

#Computer #Murmuring #Software #algorithm #config #diff #git #gitconfig #sort #tag

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst