#DataLineage

Digitale Overheid (geautomatiseerd account)digitaleoverheid.nl@www.digitaleoverheid.nl
2025-03-24

Data lineage vergroot vertrouwen in overheidsdata


Overheden maken vaak gebruik van data om beleid te maken, dienstverlening te verbeteren en maatschappelijke vraagstukken aan te pakken. Maar hoe weet je of die data betrouwbaar is? Volgens een nieuw rapport van het Wetenschappelijk Onderzoek- en Documentatiecentrum (WODC) kan data lineage daarbij helpen.

Wat is data lineage?

Data lineage betekent letterlijk โ€˜afstamming van dataโ€™. Het gaat om het in kaart brengen van de volledige reis die data aflegt: van het moment dat het wordt verzameld (bijvoorbeeld via een formulier), tot aan de verwerking, bewerking en het uiteindelijke gebruik in bijvoorbeeld dashboards of rapportages. Met data lineage kun je nagaan:

  • waar de data vandaan komt;
  • welke bewerkingen of transformaties zijn toegepast;
  • in welke systemen of rapporten de data uiteindelijk terecht komt.

Waarom is dit belangrijk voor de overheid?

Data lineage helpt om fouten vroegtijdig te signaleren, risicoโ€™s in beeld te brengen en het vertrouwen in beleidsinformatie te vergroten, zowel binnen als buiten de organisatie. Het WODC benadrukt dat data lineage niet alleen een technisch hulpmiddel is, maar ook een stap richting professionalisering van datamanagement binnen de overheid.

Lees het nieuwsbericht van het WODC op hun website en bekijk het Engelstalige rapport.

Dit is een automatisch geplaatst bericht. Vragen of opmerkingen kun je richten aan @DigitaleOverheid@social.overheid.nl

#BetrouwbareData #DataLineage #nieuwsbrief62025 #WODC

Datalumendatalumen
2025-03-19

๐Ÿ—บ๏ธ ๐–๐ก๐š๐ญ ๐˜๐จ๐ฎ ๐’๐ก๐จ๐ฎ๐ฅ๐ ๐Š๐ง๐จ๐ฐ ๐๐ž๐Ÿ๐จ๐ซ๐ž ๐ˆ๐ฆ๐ฉ๐ฅ๐ž๐ฆ๐ž๐ง๐ญ๐ข๐ง๐  ๐€ ๐ƒ๐š๐ญ๐š ๐‚๐š๐ญ๐š๐ฅ๐จ๐ . Implementing a data catalog is a necessity if you want to leverage your data. While the allure of cutting-edge technology is strong, the success hinges on a solid foundation of non-technical considerations.

๐Ÿ‘‰ Read our guide & explore what you need to know to avoid common pitfalls and ensure success.
datalumen.eu/should_know_befor

WHAT YOU SHOULD KNOW BEFORE IMPLEMENTING A DATA CATALOG
Miguel Afonso Caetanoremixtures@tldr.nettime.org
2025-01-06

"AI is all about data. Reams and reams of data are needed to train algorithms to do what we want, and what goes into the AI models determines what comes out. But hereโ€™s the problem: AI developers and researchers donโ€™t really know much about the sources of the data they are using. AIโ€™s data collection practices are immature compared with the sophistication of AI model development. Massive data sets often lack clear information about what is in them and where it came from.

The Data Provenance Initiative, a group of over 50 researchers from both academia and industry, wanted to fix that. They wanted to know, very simply: Where does the data to build AI come from? They audited nearly 4,000 public data sets spanning over 600 languages, 67 countries, and three decades. The data came from 800 unique sources and nearly 700 organizations.

Their findings, shared exclusively with MIT Technology Review, show a worrying trend: AI's data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies."

technologyreview.com/2024/12/1

#AI #GenerativeAI #AITraining #DataLineage

Datalumendatalumen
2024-09-11

From Chaos to Clarity? ๐Ÿ”Find out how you can make data lineage simple. Data moving through complex architectures doesnโ€™t have to be a mystery. ๐Ÿ”† Check out our latest blog to learn how OpenLineage brings order to your data stack!

๐Ÿ‘‰ Read more to be informed:
datalumen.eu/openlineage/

Coach Pฤแน‡ini ยฎpaninid@mastodon.world
2024-06-02

#ModelExplainability, #DataLineage, and editing the #TrainingData set are topics that will be in the news next yearโ€ฆassuming we make it.
social.lol/@rom/11254367474974

Cher Fox (The Datanista) CDMPTheDatanista
2024-04-19

๐”๐ง๐๐ž๐ซ๐ฌ๐ญ๐š๐ง๐๐ข๐ง๐  ๐ญ๐ก๐ž ๐’๐ฉ๐ž๐œ๐ญ๐ซ๐ฎ๐ฆ ๐จ๐Ÿ ๐ƒ๐š๐ญ๐š ๐‹๐ข๐ง๐ž๐š๐ ๐ž ๐€๐ง๐š๐ฅ๐ฒ๐ฌ๐ข๐ฌ

analysis is the backbone of , its the journey of data from origin to consumption. It not only ensures & but also aids in decision-making processes & enhances data-driven strategies. Within the realm of data lineage analysis, various methodologies & approaches exist, each tailored to specific needs & objectives: foxconsulting.co/post/understa

2023-08-14

"[#DataAnalysts]..should know how the data was born, with all details of measurement... Few things have more devastating consequences ... than someone in the audience pointing out...measurement issues the analyst didn't consider." Bรฉkรฉs and Kรฉzdi, 2021: Data Analysis for Business, Economics, and Policy

If you're having trouble helping your org understand the value of #datalineage and #metadata, share this with them and ask if they know how all the data they're using was gathered and measured.

I wrote about the Lineage Diff for dbt projects feature of PipeRider:

You can compare then lineage DAG from both and after making code changes in dbt. It's really useful for debugging issues/seeing impact etc:

medium.com/inthepipeline/dbt-d

#DataOps #DataLineage #DataViz #DataQuality #DataTesting #DataEngineering

2022-11-23

Looking for options to track #datalineage on #AWS while processing it via MWAA DAGs. Other than airflow's own lineage feature and solutions like #openlineage what else does the community use?

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst