#apacheSpark

2025-05-25

Last week, the 2025 Edition of our “Current Data Science for Business Students Meet Alumni” Event took place at the Facultyof Economics and Business Administration (Ghent University). #ORMS #DataScience #DataAnalytics #Python #ApacheSpark #SQL

linkedin.com/pulse/current-ds4

Markus Breuermbreuer@ruhr.social
2025-03-18

🚀 From 24h to 20min – A Small Change, Huge Impact!

A Spark query ran almost a full day on a large dataset. Stats showed 300GB traffic between worker nodes! 🔍 The Explain Plan revealed the culprit: a costly JOIN causing shuffles.

The fix? No JOIN needed! A simple filter replaced it—resulting in a 20-minute runtime instead of 24h.

💡 Lesson: Always check the Explain Plan!

#BigData #ApacheSpark #PerformanceTuning #DataEngineering

2025-03-17
2025-03-14

Easier to use: DuckDB gets local web user interface

As of version 1.2.1, the DuckDB in-process database can be conveniently operated via a local UI, which is installed as an extension, as an alternative to CLI.

heise.de/en/news/Easier-to-use

#ApacheSpark #Datenbanken #SQL #news

2025-03-14

Einfacher bedienen: DuckDB erhält lokale Web-Benutzeroberfläche

Die In-Process-Datenbank DuckDB lässt sich ab Version 1.2.1 alternativ zur CLI komfortabel über ein lokales UI bedienen, das als Extension installiert wird.

heise.de/news/Einfacher-bedien

#ApacheSpark #Datenbanken #SQL #news

2025-03-13

TIL: You can get a list of Spark-enabled GATK tools with the command

gatk --list | grep Spark

(The website doesn't seem to have a list anywhere)

#bioinformatics #GATK #ApacheSpark

Microsoft DevBlogsmsftdevblogs@dotnet.social
2025-02-21

Overall, leveraging StreamingQueryListener is vital for optimizing streaming workloads. More details and code examples can be found here. #ApacheSpark #OpenTelemetry #StreamingData

For more information check: devblogs.microsoft.com/ise/spa.

Python Job Supportpythonjobsupport
2025-02-08

PySpark Tutorial for Beginners

PySpark Tutorial for Beginners ========== VIDEO CONTENT ... source

quadexcel.com/wp/pyspark-tutor

khushnumakhushnuma
2024-12-04

In the world of data science, raw data serves as the foundation for generating actionable insights. However, managing, processing, and transforming this data into a usable format requires specialized tools.

read more: reshukhushi.wordpress.com/2024

Towards Data Sciencetowardsdatascience@me.dm
2024-11-12

Spark Connect is revolutionizing the way we run Spark applications. With version 3.4 and beyond, remote client applications written in Scala or Python can now run on a Spark cluster, offering more flexibility than ever before. Read Sergey Kotlov's latest article now.

#ApacheSpark #DataEngineering

towardsdatascience.com/adoptin

rmoff 🏃🏻 🍺 🥓rmoff@data-folks.masto.host
2024-10-30

🎃The October issue of #CheckpointChronicle is now out 🌟

It covers Ververica's Fluss, #ApacheFlink 2.0, Iggy.rs, Strimzi's support for #ApacheKafka 4.0, tons of OTF material from @vanlightly, Christian Hollinger's write up of ngrok's data platform, nice detail of how SmartNews use #ApacheIceberg with Flink and #ApacheSpark, a good writeup from Sudhendu Pandey on #ApachePolaris, notes from Kir Titievsky on Kafka's Avro serialisers, and much more!

dcbl.link/cc-oct242

2024-09-09

▶️ Data Engineering: Aufbau und Wartung von #Dateninfrastrukturen, einschließlich #Datenbanken und Datenpipelines (SQL, #Hadoop, #ApacheSpark, #AWS, #Azure, #Kafka) 🖥

Mehr dazu in unserem #Blog unter: vioffice.de/de/blog/data-scien 🇩🇪🇬🇧

2/2

2024-08-08

I'm getting back into #VizierDB development after a lengthy hiatus with an experiment in polyglot IDEs. Although the experiment was not (yet) successful, it's opened up several ideas for Vizier, including ways to improve Vizier's state model, and decouple it from #ApacheSpark to also allow lighter-weight SQL engines like #DuckDB. I'm also inspired to explore #Curses as an alternative frontend to Vizier.

For now, just some maintenance with Vizier's plugin architecture.
github.com/VizierDB/vizier-sca

Coupon Froggcouponfrogg
2024-07-12

Apache Spark 3 - Spark Programming in Python for Beginners

Data Engineering using PySpark

couponfrogg.com/coupons/apache

Doug Whitfield [Minneapolis]musicman
2024-07-09

anybody know if it is ok to run and on the same box? I have 969 processes on this box, which seems like a lot, but not sure if it is actually a problem.

Something is certainly a problem.

2024-06-10

Ente gut, alles gut? DuckDB ist eine besondere Datenbank

DuckDB ist in Version 1.0 erschienen. Was hat es mit dieser Datenbank auf sich, die einiges anders macht als andere Datenbanken?

heise.de/blog/Ente-gut-alles-g

#ApacheSpark #Datenbanken #SQL #news

2024-04-09

🛍️ Unlock the power of personalized shopping with Apache Spark! 🌟 Dive into data transformation and machine learning to craft tailored experiences for your customers. Spark revolutionizes retail analytics, predicting preferences with precision.
Read the full article: squads.com/blog/making-shoppin
#ApacheSpark #RetailAnalytics #Personalization 🚀🛒

Elizabeth K. Josephpleia2@floss.social
2024-04-05

The s390x open source team at IBM confirms the latest versions of various software packages run well on #Linux on #IBMZ.

In March 2024 validation was maintained for over 30 projects, including: #ApacheSolr #WildFly & #ApacheSpark

Full report: community.ibm.com/community/us 🐧

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst