#pyarrow

2025-11-01

Released scrapy-contrib-bigexporter 1.0.0 (codeberg.org/ZuInnoTe/scrapy-c) - additional export formats for the webscraping framework Scrapy.

Migrated parquet export from fastparquet to pyarrow as fastparquet is deprecated (docs.dask.org/en/stable/change)

Migrated orc export from pyorc to pyarrow to reduce the number of dependencies

#scrapy #crawling #python #parquet #orc #pyarrow #webcrawling #scraping

If the purpose of a library is to "process and transport large data sets" but the code base contains an error message like "array cannot contain more than 2147483646 bytes" then there must be a big misunderstanding somewhere. #pyarrow

2025-06-22

Easily obtain OSM and OMF data: #Python and CLI tools #QuackOSM and #OvertureMaestro offer easier access to data from #OpenStreetMap (#OSM) and the Overture Maps Foundation (#OMF) through #PyArrow, #GeoParquet, or #DuckDB. These tools can simplify large-scale geospatial data...
spatialists.ch/posts/2025/05/2 #GIS #GISchat #geospatial #SwissGIS

2025-05-23

Easily obtain OSM and OMF data: #Python and CLI tools #QuackOSM and #OvertureMaestro offer easier access to data from #OpenStreetMap (#OSM) and the Overture Maps Foundation (#OMF) through #PyArrow, #GeoParquet, or #DuckDB. These tools can simplify large-scale geospatial data...
spatialists.ch/posts/2025/05-2 #GIS #GISchat #geospatial #SwissGIS

Nic Cranenic_crane
2025-05-07

Currently taking a look at refreshing some of the and docs, so if you use Arrow in or Python and there's any areas you'd like to understand better, give me a shout, and we'll see what we can do!

Stefaan Lippenssoxofaan@fosstodon.org
2024-07-22

#Python hot take: "import ... as ..." is an anti-pattern.

I'm reading up on #Parquet and #pyarrow and all tutorials and even docs start with something like this:

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

Seriously?

Jesus Michał "Le Sigh" 🏔 (he)mgorny@treehouse.systems
2024-01-20

Irony: when you've just added a blocker on #PyArrow in the #Gentoo ebuild for #pandas because it causes #CPython to crash, and then read that pandas are planning on making PyArrow obligatory. Sigh.

github.com/pandas-dev/pandas/i

Jezus Michał "Le Wzdych" (on)mgorny@pol.social
2024-01-20

Ironia losu: kiedy właśnie dodałeś blokera na #PyArrow w ebuildzie #Gentoo dla #pandas, bo powoduje, że #CPython się wykrzacza, a potem czytasz, że zależność od PyArrow będzie obowiązkowa w przyszłości. Wzdych.

github.com/pandas-dev/pandas/i

Antonio Oneantonio_one
2023-08-14

pyarrow.serialization and default_serialization_context are used for converting Python objects to Arrow binary format and vice versa.

Antonio Oneantonio_one
2023-08-09

pyarrow.parquet.core provides functionality for reading/writing Parquet files in Python. write_to_dataset is used to write a PyArrow table to a Parquet dataset.

GripNewsGripNews
2023-06-01

🌗 Lance - 現代化的列式資料格式,適用於機器學習和大型機器學習模型,使用 Rust 實現。
➤ 只需兩行程式碼即可從 Parquet 轉換,實現 100 倍更快的隨機存取、向量索引和資料版本控制。與 Pandas、DuckDB、Polars、Pyarrow 兼容,更多整合正在到來。
github.com/lancedb/lance
Lance 是一種現代化的列式資料格式,針對機器學習工作流程和資料集進行了優化。它具有高性能的隨機存取、向量索引、資料版本控制等特點,並與 Pandas、DuckDB、Polars、Pyarrow 等生態系統整合。只需兩行程式碼即可從 Parquet 轉換,實現 100 倍更快的隨機存取、向量索引和資料版本控制。
+ 這是一個非常有用的工具,特別是對於需要處理大型資料集的機器學習工作流程。它的性能比 Parquet 更好,而且與多個生態系統整合,使得使用起來非常方便。
+ 使用 Rust 實現是一個

2023-04-20

Estaba probando la nueva versión de #Pandas y la integración con #PyArrow.
Se nota que es muy rápido, pero parece que hay opciones pendientes de implementar. Por ejemplo "converter" al leer un CSV.

ValueError: The 'converters' option is not supported with the 'pyarrow' engine

PyIceberg: Python Development Setup

This video will walk you through the steps required to set up the Python development environment for PyIceberg. We will set up a local instance of Spark, Rest catalog, and MinIO for querying an actual table. This makes it easy to do interactive development and test everything end to end.

#iceberg #python #pyiceberg #tabular #minio #spark #datalake #datalakehouse #pyarrow
youtu.be/D0HJuB0uSio

Fokko Driesprong has written a very interesting new blog on using the latest version of #PyIceberg with #PyArrow and DuckDB Labs to load data from an #Iceberg table into PyArrow or DuckDB with PyIceberg.

#python #spark #minio

tabular.medium.com/pyiceberg-0

With #PyIceberg 0.2.1 now available, we thought a video that illustrates using it with #PyArrow and DuckDB Labs would be in order. Thank you Fokko Driesprong for the content.

youtu.be/rYbSu9wvQmk

#iceberg #apacheiceberg #duckdb #voltrondata #datalake #datalakehouse

sʌǝu ᴧɐɹʞǝʃ 🤘🇪🇪🤘varkel@est.social
2022-12-19

Good news - Python's CSV reader supports unicode characters like 🤘 as CSV field delimiters.

Bad news is that #PyArrow doesn't support it yet :(

Make PyArrow great again!

#Python #developer #unicode #CSV #bigdata

A hearty thank you to the PyIceberg community on the release of Apache PyIceberg release 0.2.0!

This release includes a few major features, such as

* Read support using PyArrow and DuckDB

* Support for AWS Glue

Please check the updated docs (py.iceberg.apache.org/) for the details.

This release can be downloaded from: pypi.org/project/pyiceberg/0.2

And can be installed using: pip3 install pyiceberg==0.2.0

#iceberg #python #pyiceberg #duckdb #pyarrow

Taras Novak 🇺🇦dataSamurai@vis.social
2022-11-23

Hey #dataNerds 🤓, good news:

#DuckDB v0.6.0 brings reading #CSV data on par with #PyArrow & #Polars and loads 1.66 GB of #ChicagoCrimes data in 1.9s with 12 cores/24 threads when experimental parallel CSV reader & unordered insertion are enabled.

🧐 github.com/RandomFractals/chic

#dataTools 🔬 ...

cmsadler 🏳️‍🌈📊🦽🥁🐈‍⬛cmsadler@mastodon.online
2022-11-11

Today I'm doing some #MachineLearning archeology, and once again dabbling in #PyArrow to read and convert parquet to #Pandas so that I can do some minimal data exploration and answer questions on how the models were trained. If you store data in parquet format, PyArrow is a great resource.

arrow.apache.org/docs/python/i

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst