Lmst

une bibliothèque de validation de données pour les dataframes Polars et pandas, qui propose une interface pour consulter les problèmes de cohérence de données et aider à les résoudre : https://posit-dev.github.io/pointblank/

Un tutoriel de @markpitblado expliquant son utilisation avec Polars : https://www.markpitblado.me/blog/validating-data-with-pointblank-in-python/

#Python #tool #data #validation #dataframe

Giới thiệu các tính năng độc đáo của thư viện C++ DataFrame dành cho phân tích dữ liệu hiệu năng cao. Đây là công cụ mã nguồn mở mạnh mẽ, hỗ trợ xử lý dữ liệu dạng bảng (table), chuỗi thời gian (time-series) với tốc độ vượt trội của C++. Thư viện cung cấp nhiều thuật toán thống kê, tài chính và khả năng thao tác dữ liệu linh hoạt tương tự như Pandas nhưng tối ưu hơn về bộ nhớ.

#Cpp #DataFrame #OpenSource #Programming #DataAnalysis #LapTrinh #PhanTichDuLieu #MaNguonMo

https://www.reddit.com/r/o

🥁🐼 la version 3.0 de #pandas arrive bientôt
Découvrez les nouvelles fonctionnalités et les améliorations qui arrivent dans cette nouvelle édition de la bibliothèque de #dataframe #python : https://pandas.pydata.org/docs/dev/whatsnew/v3.0.0.html
- type string dédié (ça n'est plus un "object")
- copy-on-write : l'échantillonnage d'un dataframe crée désormais une copie et préserve l'original
- utilisation plus généralisée de la syntaxe pd.col pour faire de l'algèbre de colonnes

🌍🔧 "SedonaDB: #The 'revolutionary' #geospatial #DataFrame #library that no one asked for, written in #Rust because why not? 😒 It's like they took a detour through the desert to reinvent the wheel—again. 🚗💨"
https://sedona.apache.org/latest/blog/2025/09/24/introducing-sedonadb-a-single-node-analytical-database-engine-with-geospatial-as-a-first-class-citizen/ #SedonaDB #reinventing #wheel #tech #news #HackerNews #ngated

SedonaDB: A new geospatial DataFrame library written in Rust

https://sedona.apache.org/latest/blog/2025/09/24/introducing-sedonadb-a-single-node-analytical-database-engine-with-geospatial-as-a-first-class-citizen/

#HackerNews #SedonaDB #Geospatial #Rust #DataFrame #Library #DataScience

Spark Connect. А нужны ли перемены?

Привет, Хабр! Я Станислав Габдулгазиев, архитектор департамента поддержки продаж Arenadata. Apache Spark давно и прочно занял место одного из ключевых инструментов в арсенале инженеров и дата-сайентистов, работающих с большими данными. Его способность быстро обрабатывать огромные объёмы информации, гибкость за счёт поддержки множества языков (Python, Scala, Java, SQL) и возможность решать самые разнообразные задачи — от сложных ETL до машинного обучения и стриминга — делают его незаменимым инструментом в мире анализа данных.

https://habr.com/ru/companies/arenadata/articles/921246/

#spark_connect #apache #datalake #lakehouse #платформа_данных #bigdata #dataframe #интеграция_сервисов #apache_arrow #spark

Polars — «убийца Pandas» на максималках

Всем привет! Меня зовут Александр Андреев, я инженер данных. Сегодня я хочу рассказать вам о библиотеке Polars - потенциальной замене Pandas, любимой у большинства дата-инженеров и дата-саентистов библиотеки для работы с данными. В своей статье я последовательно пройдусь от истории библиотеки Polars до примеров кода, технических аспектов ее производительности и в конце дам ссылки на все бенчмарки, обучающие материалы и дополнительные статьи, которые использовались для написания данного обзора-туториала по этой замечательной библиотеке.

https://habr.com/ru/articles/946788/

#polars #pandas #data_engineering #data_science #data_analysis #dataframe #library #python #rust #dataset

Phiên bản mới (3.6.0) của C++ DataFrame đã ra mắt! 🎉 Bản cập nhật này bao gồm nhiều cải tiến về phân tích, xử lý dữ liệu và đặc biệt là tài liệu được làm lại toàn diện về cả hình thức lẫn nội dung. Rất mong nhận được phản hồi từ cộng đồng!
#cpp #dataframe #datascience #laptrinh #cplusplus #thưviện #dữliệu

https://www.reddit.com/r/programming/comments/1ndeyjx/c_dataframe_new_version_360_is_out/

Phiên bản mới (3.6.0) của C++ DataFrame đã ra mắt với nhiều cải tiến về phân tích và xử lý dữ liệu, đặc biệt là việc làm lại đáng kể tài liệu hướng dẫn sử dụng. Mong nhận được phản hồi từ cộng đồng!
#cpp #dataframe #laptrinh #programming

https://www.reddit.com/r/programming/comments/1ndeyjx/c_dataframe_new_version_360_is_out/

How to generate dataframe summaries with python and AI for a type of dataset #datascience #dataframe #pandas #llm #Ollama #mistral #dev (https://fundor333.com/post/2025/generate-dataframe-summaries-with-python/)

I was annoyed that there is no "expand_grid()" function in :python: #Python as in :rstats: #RStats #tidyverse

So I just published a small package on #PyPI !

Introducing polarsgrid
https://pypi.org/project/polarsgrid/

Using the excellent #polars 🐻‍❄️ package, easily create a table with product of factors:

from polarsgrid import expand_grid
expand_grid(a=[1, 2, 3], b=["x", "y"])

Yields all combinations of its inputs as a #DataFrame

It can also produce a #LazyFrame for streaming extra-big tables to disk

Nuevo post en el blog de #juncotic! 💪

Seguimos con #python de la mano de @andrea_navarro

¿Han usado #Pandas para trabajar con datos?

Hoy Andrea nos explica cómo usarlo para ordenar columnas de un DataFrame, con ejemplos prácticos, y un CSV descargable para jugar con los datos 😃

Pueden leerlo acá: 👇

https://juncotic.com/ordenamiento-de-columnas-con-pandas/

Espero que les guste y sirva! 🙂

#python #pandas #dataframe #datascience #data

I've talked about creating data.frames and tibbles before, but it is an important topic so I have covered it again. This time specifically from the perspective of creating them from vectors. Post: www.spsanderson.com/steveondata/... #R #RStats #tibble #dplyr #tidyverse #dataframe #baseR #blog

Anybody using the narwhals #python package for #dataframe manipulations?

Very comfortable with #pandas and #polars syntax, but although #narwhals is supposed to be very close to polars, it is a subset and I find that there is a lot of basic stuff missing. Trying to figure out how to get the first part of a str.split.

in pandas I just add .str[0] and in polars .list.get(0)

with narwhals neither approaches are implemented. Any idea as what is supported in narwhals to do this?
#programming

Fahmatrix – A Lightweight, Pandas-Like DataFrame Library for Java

https://github.com/moustafa-nasr/fahmatrix

#HackerNews #Fahmatrix #Pandas #Java #DataFrame #Lightweight #Library

Computing travel time matrices in r⁵py from @geopandas #DataFrame is two lines of code:

(1) create an r5py.TransportNetwork from @openstreetmap and #GTFS data

(2) turn it into an r5py.TravelTimeMatrix()

Try it out in #binder: https://r5py.readthedocs.io/stable/user-guide/user-manual/quickstart.html

A map of central Helsinki. A transparent overlay shows a grid of cells that are coloured according to the time needed to travel to their centre point from the railway station

[備忘録] GradioでExcel風インターフェースを実装してみる
https://qiita.com/Tadataka_Takahashi/items/24c8f519d963d7dc9dae?utm_campaign=popular_items&utm_medium=feed&utm_source=popular_items

#qiita #Python #DataFrame #gradio

OPEN SOURCE 🚀

The Problem❔

There have been many instances where I needed to compare two dataframes and analyze their differences. To address this need, I created a fast Python library called "data_fingerprint" that does exactly that.

Check it out and let me know what you think! 🕵‍♂️
https://github.com/SimpleSimpler/data_fingerprint

#datascience #python #pandas #dataanalytics #dataengineering #dataframe #data

Parsing CSV with units in the header · Issue #166 · hgrecco/pint-pandas

https://github.com/hgrecco/pint-pandas/issues/166

Now we can read a #csv file with a header like `time / s,mass / g` into #pandas and call `.pint.quantify()` to get a #dataframe in which the columns have #units as in #Pints !

Handy for CSV restricted to single-row headers, as in Confluence Databases and Microsoft Lists.

Hi fedi 😊
I am struggling with #vegaaltair.

Do you have any resources of density faceted plots ?

I am trying to #plot densities of a selection of columns of a #dataframe with mean and median highlighted.

#dataviz #DataScience #Python

#Dataframe

Client Info