#duckdb

2025-12-21

"We propose to re-think data management system parser design to create modern, extensible #parsers, which allow a dynamic configuration of the accepted syntax at run-time, for example to allow syntax extensions, new statements, or to add entirely new query languages."

#duckdb

duckdb.org/2024/11/22/runtime-

2025-12-20

New blog post on #tiledb and #snowflake integration and what is possible for #biomedical #research and what I wish I could do in #duckdb

brianrepko.github.io/blog/post

Scott Gallowayscottgal@hachyderm.io
2025-12-20

Quick research project: using DuckDB for deterministic data profiling and drift, with LLMs only doing interpretation - not math. All local, no data in context.
If that sounds interesting to any data/ML folks (especially Ollama users), I’m open to feedback.
mostlylucid.net/blog/datasumma
github.com/scottgal/mostlyluci

Worth continuing with / it's doing what some other tool does quicker and better? 🤓
#duckdb #ollama #dataengineering #analytics #privacybydesign #opensource

2025-12-20

The reason I made a sample dataset was that I thought it was a bit sluggish querying the GeoPackage file from DuckDB. The query in the image took 2.56 s on the GeoPackage file. I now tried to save the entire dataset into a Parquet file (sorted on county and municipality) and compressed with ZSTD. The same query takes 0.0140s.

Also the Parquet file is 141 MiB compared to 1.18 GiB for the GeoPackage file. The Parquet file is smaller than the original zip file with the GeoPackage file.

#DuckDB #GeoPackage #Skogsstyrelsen #Parquet

Scott Gallowayscottgal@hachyderm.io
2025-12-19

New prototype (v0.1.0): DataSummarizer

A statistics-first data profiling CLI built on DuckDB.

Computes deterministic profiles (nulls, skew, leakage, drivers, outliers) locally, then optionally lets a local LLM narrate or generate SQL — never the other way round.

Early days, but already useful for first-pass analysis, data contracts, drift checks, and segment comparison.

LLMs reason. Databases compute.

mostlylucid.net/blog/datasumma

github.com/scottgal/mostlyluci

#duckdb #dataengineering #csharp #llm #datascience

2025-12-18

#duckdb spatial folks - am I being an idiot or is it really not possible to save a parquet file with an SRS other than WGS84? I am assuming (hoping) I'm wrong about that because its quite annoying... maybe #rspatial peeps might have some idea also?

2025-12-18

For the past few days I've been trying to figure out to use #duckdb COLUMNS(...) to do a ANY reduce on multiple bool columns, instead of the default which is ALL.

I think there's a syntax proposal but not merged in yet? Anyways, in the meantime, I've discovered that `GREATEST(*COLUMNS(...))` works to this effect. Only took me a whole week to figure this out 😅

(Fellow #rstats nerds will recognize these ops as if_any and if_all!)

-- Create test table with mixed conditions
CREATE TABLE test_data AS
SELECT 
  id,
  condition1,
  condition2,
  condition3
FROM (
  VALUES
    (1, true, true, true),      -- All true
    (2, true, false, true),     -- Mixed
    (3, false, true, false),    -- Mixed
    (4, false, false, false),   -- All false
    (5, false, false, false)    /* All false */
) AS t(id, condition1, condition2, condition3);

FROM test_data;

-- Across ALL
FROM test_data WHERE COLUMNS('condition\d');

-- Across ANY
FROM test_data WHERE GREATEST(*COLUMNS('condition\d'));
2025-12-17

Big week for FOSS! 🎉 FreeBSD 15.0 dropped Dec 2 with major improvements to pkgbase and security. Meanwhile, NVIDIA open-sourced NeMo Data Designer under Apache 2.0 - awesome for AI development! DuckDB 1.4.3 LTS also landed Dec 9. The community keeps pushing forward! #FreeBSD15 #NeMo #DuckDB #FOSS #opensource

2025-12-17

Learn how to conduct geospatial analysis to determine which UK city has the safest drivers. Thomas Reid's new article uses GeoPandas to process boundary data and #DuckDB to query 5 years of traffic accident statistics.

towardsdatascience.com/geospat

2025-12-16

Deep dive on pruning: Dewey Dunnington’s latest blog post dives deep into how pruning, the selective reading of relevant data, makes #GeoParquet blazing fast in both local and #cloudnative contexts. Featuring hands-on comparisons across #SedonaDB, #DuckDB, #GeoPandas,...
spatialists.ch/posts/2025/12/1 #GIS #GISchat #geospatial #SwissGIS

Scott Gallowayscottgal@hachyderm.io
2025-12-16

Ever wonder how tools analyse gigabytes of data, while you’re fighting to squeeze a few KB of code into an Ollama context window?

The trick isn’t bigger models.
It’s not giving the data to the LLM at all.

LLM → SQL → DuckDB → CSV-on-disk
Schema + samples only. Zero data leakage. Sub-100ms queries on 500MB+ files.

Local. Private. Scales past RAM.

Full C# walkthrough:
mostlylucid.net/blog/analysing

#llm #csharp #duckdb #ollama #localllama

2025-12-13

So many to choose from: In his latest post, Mark Litwintschik compares a range of global administrative #boundary datasets, from #OpenStreetMap to #NaturalEarth, assessing geometric #accuracy, data #completeness, and information content. The analysis, powered by #DuckDB...
spatialists.ch/posts/2025/12/1 #GIS #GISchat #geospatial #SwissGIS

Bart Louwersbart@floss.social
2025-12-12

I have published #DuckDB #NodeJS bindings for #FreeBSD. github.com/duckdb/duckdb-node-

You know, in case I ever need it (needed an excuse to play with sourcehut's CI).

2025-12-12

From the @DSLC :rstats:​chives:

:rstats: :python: :julia: DuckDB in Action: Club meetings & An introduction to DuckDB youtu.be/Cdi1lPdMfG8 #RStats #PyData #JuliaLang #DuckDB

:rstats: pharmaverse Examples: Demographic Table youtu.be/Vgpej0_C3lw #RStats

:rstats: R for Data Science: youtu.be/HicrANUfnj0 #RStats

Visit dslc.video for hours of new #DataScience videos every week!

2025-12-11

Recent @DSLC club meetings:

🦆 DuckDB in Action: Executing SQL Queries youtu.be/EHgXpnrJAB4 #RStats #PyData #JuliaLang #duckdb

From the @DSLC :rstats:​chives:

:rstats: Advanced R: Functionals youtu.be/tYqFMtmhmiI #RStats

:rstats: :javascript: JS4R: Managing JavaScript youtu.be/NyBUNAqENhg #RStats #JavaScript

:rstats: SMLTAR: Word Embeddings Part 1 youtu.be/2ZxXAUivrVc #RStats

Visit dslc.video for hours of new #DataScience videos every week!

François Michonneaufmic_@hachyderm.io
2025-12-11

The Advent of SQL has started 🎄

This year it's hosted by "Database School" and requires signing up: databaseschool.com/series/adve

I'll be solving the challenges using #DuckDB and my solutions for the first two days are up on my website: francoismichonneau.net/2025/12

:rss: Qiita - 人気の記事qiita@rss-mstdn.studiofreesia.com
2025-12-11

<月刊>S3+DuckDB+GrafanaでQiita記事メトリクスのダッシュボードをつくる vol.3 データのクエリと可視化【完結】
qiita.com/melknzw/items/a71b18

#qiita #AWS #S3 #grafana #duckdb #AmazonQ

Mohit Sindhwanionghu@ruby.social
2025-12-11

I'm a tad bit nervous to be presenting an Introductory talk about #DuckDB at the RubySG Dec meetup today (9 Dec)! But I have a good 11million record synthetic dataset to show some of the features!

Register: luma.com/0em8ixuy

#Programming #Ruby #Singapore

2025-12-10

Using the #duckdb COLUMNS(<regex>) pattern with capture groups to compute and rename multiple columns all in one line of SELECT is *sooo slick*

Been on a DuckDB learning spree lately and really proud of myself for figuring this out by just reading the docs with no AI help (they all come up with more verbose solutions than this and look gross)

Screenshot showing a query that takes all *_length_mm columns in penguins dataset to derive new *_length_avg_cm columns which take the average values of each column and multiply by 10. With DuckDB, that can be expressed in 1 line of SELECT: `AVG(COLUMNS('(.*_length)_mm')) * 10 AS '\1_avg_cm'`
2025-12-09

No more sf::st_as_text() or st_geomfromtext() to move geospatial data between {sf} and duckdb spatial 😃 cidree.github.io/duckspatial/ #gisChat #Rstats #DuckDB

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst