#DataTalk

A Trustworthy AI Assistant for Investigative Journalists | Stanford HAI

news

A Trustworthy AI Assistant for Investigative Journalists

Date, December 01, 2025, Topics

Communications, Media

Image from article…

Gathering and analyzing data require time and expertise — two resources that cash-strapped newspapers often don’t have. Can AI help?

In 2023, an average of 2.5 local newspapers shut down every week. More than half of U.S. counties now have little or no reliable local news coverage, and the trend is accelerating.

This is a business problem. It is also, arguably, a democracy problem. For centuries, local journalism has kept voters engaged in local politics and politicians accountable to those voters. Small papers with investigative tenacity have also routinely broken stories of national importance — the Patriot-News uncovering Penn State’s Jerry Sandusky scandal, for instance.

The answer to this crisis? “Everybody says, ‘Let’s use AI to help,’ ” replies Monica Lam, a professor of computer science at Stanford University. The problem with this, she adds, is that most AI tools aren’t reliable. She cites a 2025 study conducted by the BBC in which the media outlet used major AI models to analyze news content on its website. Over half of answers from the AI had “significant issues,” according to the BBC, including factual errors and fabricated quotations.

“It’s not so easy,” says Lam.

Now, Lam is working with technologists and journalists to develop a more useful tool for the news industry. With Cheryl Phillips, the founder of Stanford’s Big Local News, along with seed funding from the Stanford Institute for Human-Centered AI and a grant from the Brown Institute for Media Innovation at Stanford and Columbia, Lam created DataTalk, a chatbot specifically designed to help investigative journalists and cash-strapped newsrooms do their work more efficiently without sacrificing factual accuracy. DataTalk is built on top of a large language model and designed to retrieve and analyze information kept in big, sometimes unruly, public databases.

“Journalism is losing a lot of people and deep investigative work is harder than ever,” Lam says. “If more people know about the tool we’re building, and if we can keep improving it and keep generating success stories, then our hope is to bolster this type of journalism into the future.”

What is DataTalk?

Investigative journalists often rely on knowledge of database languages like SQL and the expertise of data scientists to unearth important stories. With DataTalk, they could instead simply type their question into a chat window and get an answer within a few seconds.

Continue/Read Original Article Here: A Trustworthy AI Assistant for Investigative Journalists | Stanford HAI

#AI #AIAssistant #artificialIntelligence #DataTalk #HAI #InvestigativeJournalists #Journalism #Journalists #LLM #SQL #StanfordUniversity #Technology

The DatanistaTheDatanista
2025-03-07

🍬𝐃𝐚𝐭𝐚-𝐛𝐞𝐭𝐞𝐬: 𝐒𝐮𝐠𝐚𝐫-𝐂𝐨𝐚𝐭𝐞𝐝 𝐌𝐞𝐭𝐫𝐢𝐜𝐬 𝐆𝐨𝐧𝐞 𝐖𝐫𝐨𝐧𝐠🍫

🎥Where You Can See Me Now

👀Where You Can See Me Next

And please don't forget to take a look at my 𝑺𝒕𝒓𝒂𝒕𝒆𝒈𝒊𝒄 𝑩𝒖𝒔𝒊𝒏𝒆𝒔𝒔 𝑷𝒂𝒓𝒕𝒏𝒆𝒓𝒔 (some AMAZING service providers!)

What in this week's newsletter resonates with you most?

Let's continue the conversation in the comments.💡

linkedin.com/pulse/march-2025-

DataTalk: A Campaign Finance Agent (Beta)

This is an extremely useful tool just
introduced by Stanford OVAL. It dynamically auto-generates the appropriate SQL queries for you, using an LLM, to fetch your data. It only covers money raised and spent in the 2024 U.S. presidential and congressional campaigns.

datatalk.genie.stanford.edu/

#research #digitalHumanities #FEC #opensecrets #DataTalk #campaignFinance

Screenshot of initial screen for DataTalk tool.Screenshot of the DataTalk tool where it is displaying the various steps being taken to process your query.
The DatanistaTheDatanista
2024-10-17

𝐂𝐡𝐞𝐫 𝐅𝐨𝐱, 𝐏𝐫𝐞𝐬𝐢𝐝𝐞𝐧𝐭 𝐅𝐨𝐱 𝐂𝐨𝐧𝐬𝐮𝐥𝐭𝐢𝐧𝐠 | 𝐓𝐡𝐞 𝐑𝐢𝐝𝐞𝐫𝐟𝐥𝐞𝐱 𝐏𝐨𝐝𝐜𝐚𝐬𝐭

Catch up with my conversation with host Steve Urban. We talk about family, why is important for next generation technologies, bodybuilding & some of the projects I've rescued for clients here: youtube.com/watch?v=3WAy0yOJ1Yw

2024-01-26

@nfdi4culture Im Dezember 2022 gab es bereits einen ähnlichen #HeFDI-Datatalk "Anforderungen der @dfg_public an das #Forschungsdatenmanagement" doi.org/10.5281/zenodo.6567185

Mehr Präsentationen - und bald auch die vom heutigen #Datatalk - sind bei #Zenodo zu finden: t1p.de/npyf
Hier schon mal drei Folien als #Sneakpreview

2. Framework: DFG-Guidelines and checklist

Data reuse

How do I find a suitable repository for my data?

* DFG recommends to contact a suitable research data repository as early as possible, already during the planning phase / when writing your proposal
** Metadata and costs in your proposal can only be explained on this basis!

* Have alook at ‘your’ nfdi-consortium

* or contact your local rdm service center

* or see re3data.org2. Framework: DFG-Guidelines and checklist

Three types of repositories:
* Generic
** See e.g. https://zenodo.org/
* Subject-specific
** See https://www.re3data.org/
** See NFD| consortia

* Institutional
** See e.g. the HeFDl repositories based on DSpace https://t1p.de/hefdi-repos3. Your DFG-proposal: What to look out for?

Data description - metadata

* Metadata help to understand research data
* Structured description of research data
* machine-readible —> research ln data become Findable in data bases
* Without metadata, research data won’t be understandable
* Rich and correct metadata are a strong provision to good scientific practice

METADATA IS A LOVE NOTE TO THE FUTURE
2023-09-06

Veranstaltungshinweis: Am Freitag, d. 8. September 2023, findet von 11-12 Uhr der nächste #HeFDI #DataTalk statt. Vortragender ist Robert Werth, Frankfurt University of Applied Sciences, der Ergebnisse aus dem Projekt "Entwicklung und Verbreitung von #Forschungsdatenmanagement an #Fachhochschulen und #Hochschulen für Angewandte Wissenschaften" vorstellt. Die Veranstaltung findet #online statt und ist kostenlos. Hier geht es zur Anmeldung: t1p.de/3oghd.

2023-08-28

I'm extremely proficient in R. I can analyze data in my sleep in the tidyverse.

But I'm a total n00b in Python. When I try to write some simple data transformations, I find that even my reasoning about data in general gets muddier, as if the language barrier translates to a larger block in my logic.

Is there a name for this phenomenon? How do you make this easier without resorting to writing in R and then translating?

#rstats #python #datatalk

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst