#datalake

N-gated Hacker Newsngate
2025-06-01

Ah, the $10/month Lakehouses: because who wouldn't want a bargain-basement data lake with all the charm of a timeshare in purgatory? 🤔💸 Just add a sprinkle of buzzwords like "DuckLake" and "time travel" and voilà, you've got a tech article that feels like a 2-hour for something you'll never use. 📈🔮
tobilg.com/the-age-of-10-dolla

Python Job Supportpythonjobsupport
2025-05-22

Apache Iceberg Deep Dive | Part 1 | Crash Course

Lakehouse ... source

quadexcel.com/wp/apache-iceber

2025-05-09

#TBT... to an entire week ago at #RSAC where Seth Goldhammer had the chance to demo Graylog's data telemetry pipeline management! 🖥️ ⭐

Join Seth as he talks about data lakes, data lake previews, getting your data back when you need it, and more.

Wanna learn more about this topic? Here you go: graylog.org/post/security-data #RSA #RSAC2025 #datalake #datamanagement #datapipeline

2025-04-24

Секреты Spark в Arenadata Hadoop: как мы ускорили построение витрин для задач ML

Привет, Хабр! Я Дмитрий Жихарев, CPO Платформы искусственного интеллекта RAISA в Лаборатории ИИ РСХБ-Интех. В этой статье я и архитектор нашей платформы Александр Рындин @aryndin9999 расскажем о том, как мы построили взаимодействие Платформы ИИ и Озера данных для работы с витринами данных моделей машинного обучения с использованием Spark.

habr.com/ru/companies/rshb/art

#spark #arenadata #hadoop #datalake #витрина_данных #ai #платформа #livy

2025-04-18

Shifting Left isn’t just a buzzword - it’s the foundation for efficiency in your organization!

By making clean, reliable, and accessible data available across your organization, you reduce complexity and unlock time to focus on higher-value work.

💡 Data products are the foundation of this #ShiftLeft, enabling healthy, scalable data communication.

📖 Dive into the details in the #InfoQ article: bit.ly/3WHjxsf

#SoftwareArchitecture #DataMesh #DataLake #DataPipelines #ETL

2025-04-09

Attended an event Brewing Data with Snowflake yesterday in Vilnius :blobcatnerd:

Some of they key insights:

  • Medallion Architecture (good or bad) is widespread.
  • Snowflake and Databricks are clear competitors, targeting similar landscape.
  • Open formats are trending: file format, table format, catalog, etc. - the more of them are open source, the better.
  • Time travel feature is important, many users already used it for disaster recovery.
  • Clear distinction of Storage from Compute (generic cloud approach).

Full text of one of the slides presented:

Strategic Architecture Outlook

  • Agility & Future-Proofing - Open, portable data means you can adopt new technologies or switch platforms with minimal friction. No single vendor can hold your data hostage, so you can evolve vour architecture as needed.
  • Multi-Cloud and Hybrid - An open data layer can span clouds and on-prem seamlessly. You avoid cloud vendor lock-in and leverage best-of-breed services on different clouds using the same data. This flexibility is key for resilience and optimization.
  • Accelerating Innovation - When any team can access data with the tools of their choice, experimentation flourishes. Open data fosters Al/ML and cross-domain analytics since data isn't locked in silos - more innovation and insights from the same data.
  • Vendor Leverage - Strategically, using open standards increases your leverage in vendor negotiations. You car opt in or out of services more freely, pushing vendors to provide value (since you're not irreversibly locked to them).

#data #datalake #datalakehouse #medallion #architecture #snowflake #vilnius #lithuania #bigdata #event #meetup

Brewing Data with Snowflake event in Vilnius: Olli Ek presenting, Data Interoperability slide showing Medallion Architecture with Data Sources on left and Cobsumption on the right of Processing in the middle with Bronze, Silver, Gold layersBrewing Data with Snowflake in Vilnius: Maris Svilans presenting, slide on screen showing Strategic Architecure Outlook (full text in main post)
Justin Buzzardjdbuzzard
2025-03-20

A Data Lake in the software world is essentially where raw data is taken and turned into something tangible like reports, often using AI/machine learning and them put into the Data Warehouse.

2025-02-23

🟢 Demo: SAP Business Data Cloud | SAP Business Unleashed youtu.be/OkwQimWDeos?si=UNGdcA via @YouTube
(and find related Videos in the SAP channel - see below)

#SAP #SAPBDC #GenAI #LLM #DataCloud #DataLake #SAPChampions #SAPBW #SAPDatasphere @sap

2025-02-08

There is no need to move data. Data latency is minimised. Data can be transformed and analysed within a single platform.

Let me know what you know about Zero-ETL :blobcoffee:

Why ETL-Zero? Understanding the shift in Data Integration“ by Sarah Lea on Medium: medium.com/towards-data-scienc

#python #datalake #cloudcomputing #etl #zeroetl #salesforce #data #tech #technology #datawarehousing #datalakehouse

2025-01-31

A #ShiftLeft approach to #DataProcessing relies on data products, which form the basis of data communication across the business.

This addresses many flaws in traditional data processing and makes data more relevant, complete, and trustworthy.

#InfoQ article: bit.ly/3WHjxsf

#SoftwareArchitecture #DataMesh #DataLake #DataPipelines #ETL

2025-01-20

#ApacheHudi 1.0 is now generally available!

The release introduces new features aimed at transforming data lakehouses into what the project community considers a fully-fledged "Data Lakehouse Management System" (DLMS).

Details on #InfoQ 👉 bit.ly/3E5AXZi

#AI #DataLake #opensource #DataAnalytics

Thilo Dotzel 🤓(Mr. Storage )thilodotzel@techhub.social
2025-01-14

All in one.
Massively scalable, software defined storage (#SDS) for modern workloads with support for file, block and object based applications:
➡️ ibm.com/products/ceph

👁🐝Ⓜ️
#IBM #RedHat
#IBMStorage
#IBMStorageCeph #DataLake
#IBMtechnology #technology
#IBMStorageRocks🚀

2025-01-05

The house at the lake, Teil 1- Iceberg ahead. Data Lakehouse baby steps. blog.sogeo.services/blog/2025/ #ApacheIceberg #Spark #Pyspark #Datalake #Lakehouse

2024-12-24

One of the most highlighted parts: "There is no need to move data. Data latency is minimised. Data can be transformed and analysed within a single platform.“

This is one of the reasons for 'Why ETL-Zero' :blobcoffee:

towardsdatascience.com/why-etl

#data #datascience #dataanalysis #dataanalytics #DataEngineering #sql #salesforce #etl #datawarehouse #datalake #datalakehouse #programming

Marcel SIneM(S)USsimsus@social.tchncs.de
2024-12-15

@kkarhan :mastolol: #DataLake ... sehr gut ... so kreativ war ich noch nicht. Aber ja: Aus einem #DataLake kann schnell ein #DataLeak werden

Kevin Karhan :verified:kkarhan@infosec.space
2024-12-15

@simsus Nennt sich das nicht "#DataLake" ?

  • Ist eher ne #Stausee der drauf wartet, dass es nen fettes #Leak gibt, aber das ist nebensächlich!
2meterdba | Reitse Eskens2meterdba@mastodon.nl
2024-12-12

It's december and that means lighting talk time for our user group! Join us online for some short, powerful insights from both known and new speakers!
Follow the link to sign up and see you next tuesday.
#Meetup
#Community
#LightningTalk
#Microsoft
#DataPlatform
#AI
#DataLake
#Azure
#PowerBI
#SQLServer
#Clarity
#DataDriven
meetup.com/groningen-microsoft

2024-12-12

In a data warehouse you store structured & organized data. In a data lake you can additionally store unstructured data. And was is now a data lakehouse?

Think of a combination of the strengths of both previous data platforms. :blobcoffee:

towardsdatascience.com/sql-and

#data #DataEngineering #datalakehouse #datacenters #datawarehouse #datalake #datascience #sql

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst