#TPDL2023

2023-09-29

A lot of discussions at #tpdl2023 about OCR, tesseract and the post-processing steps

2023-09-29

@shawnmjones overheard at #tpdl2023 "be more like perma.cc"

2023-09-29

In his #tpdl2023 talk "Synthesizing Web Archive Collections Into Big Data: Lessons From Mining Data From Web Archives" @shawnmjones summarizes status quo of lack of interoperability among web archives, specifically for large-scale use, aka "think about the robots!".

2023-09-29

At #tpdl2023 Shawn Jones (@shawnmjones) is giving a talk on „Synthesizing Web Archive Collections Into Big Data: Lessons From Mining Data From Web Archives“.

He talks about challenges when mining data from web archives. The authors reviewed 22 web archives and discuss methods needed to re-synthesize a memento to something close to its original capture without augmentations.

He especially cares about robots.

Paper is available at: doi.org/10.1007/978-3-031-4384

Shawn M. Jones, PhDshawnmjones@hachyderm.io
2023-09-29

At #TPDL2023 right now, @martinklein is presenting “It's Not Just GitHub: Identifying Data and Software Sources Included in Publications”

The authors trained a classifier to classify open-access data and software (OADS) URLs from research papers as dataset or code. Archivists can then take these URLs and preserve the referenced datasets and code for reproducibility.

Paper: doi.org/10.1007/978-3-031-4384
Preprint: arxiv.org/abs/2307.14469

Martin Klein is presenting “It's Not Just GitHub: Identifying Data and Software Sources Included in Publications” at TPDL2023.Martin Klein is presenting “It's Not Just GitHub: Identifying Data and Software Sources Included in Publications” at TPDL2023.Martin Klein is presenting “It's Not Just GitHub: Identifying Data and Software Sources Included in Publications” at TPDL2023. Behind Martin is a slide containing a stacked bar chart with grey and yellow bars.Martin Klein is presenting “It's Not Just GitHub: Identifying Data and Software Sources Included in Publications” at TPDL2023. Behind Martin is a slide showing a screenshot of one paper that has 896 links to GitHub in a single paper.
2023-09-29

Martin Klein (@martinklein) is giving a presentation at #tpdl2023. The title is: „It’s Not Just GitHub: Identifying Data and Software Sources Included in Publications“. He talks about in which repositories researcher published their data and code.
Paper is available at doi.org/10.1007/978-3-031-4384

Shawn M. Jones, PhDshawnmjones@hachyderm.io
2023-09-28

Beatrice Alex is giving the second #TPDL2023 keynote “AI language technologies and digital collections: the need for interdisciplinary communication and co-design and training”

* How can we invite #AI into the #archive?
* AI can provide a lot of positive opportunities.
* To improve its application, we need #interdisciplinary collaborations going forward.
* AI #literacy needs to be taught early in education.

Ref:
* ed.ac.uk/profile/dr-beatrice-a
* ltg.ed.ac.uk
* ed.ac.uk/usher/clinical-natura

Beatrice Alex is giving the second keynote at TPDL 2023.
Shawn M. Jones, PhDshawnmjones@hachyderm.io
2023-09-28

Yesterday at #TPDL2023 David Pride presented “CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering”

Rather than #ZeroShot question/answering, Pride’s team combines the #CORE #OpenAccess dataset with #ElasticSearch to create #FewShot prompts that leverage the strength of combining #search results with the #LLM’s (#GPT) #summarization abilities to produce an answer to a user’s question including citations.

Ref: doi.org/10.1007/978-3-031-4384

David Pride at TPDL2023 is presenting “CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering” The current slide is titled “Do LLMs produce accurate citations?”
Shawn M. Jones, PhDshawnmjones@hachyderm.io
2023-09-28

Yesterday at #TPDL2023 Gianmaria Silvello presented“How to Cite a Web Ranking and Make it #FAIR

Researchers often need to cite #SearchEngine results. Unfortunately, search engines change their algorithms and their index all the time. Alessandro Lotta and Gianmaria Silvello presented a #prototype that captures this ranking in a human- and machine-readable #format and posts it to #Zenodo for citing with a #DOI.

My suggestion: include #webarchiving and #webarchives

Ref: doi.org/10.1007/978-3-031-4384

Gianmaria Silvello stands in front of a slide from“How to Cite a Web Ranking and Make it #FAIR” The slide is titled “Data Citation: What is it?” And includes 3 categories: datasets, databases, data papers with examples.
Shawn M. Jones, PhDshawnmjones@hachyderm.io
2023-09-28

#TPDL2023 @hkroll and Mirjam Cuper from the Institute for Information Systems presented “Aspect-Driven Structuring of Historical Dutch Newspaper Archives”

The authors discussed the challenges of automatically organizing and structuring content in a corpus when the #OCR is unreliable, the #metadata might be inconsistent, and the #licensing restrictions dictate who can see the content.

Ref: doi.org/10.1007/978-3-031-4384

Herman Kroll and Mirjam Cuper begin their presentation “Aspect-Driven Structuring of Historical Dutch Newspaper Archives”
2023-09-27

Just learned about Publication Retriever (github.com/LSmyrnaios/Publicat) which scrapes the PDF URL based on an article URL.
This would have been very useful to me a year ago!
#tpdl2023

Shawn M. Jones, PhDshawnmjones@hachyderm.io
2023-09-27

Laura Hollink is giving the first #TPDL2023 keynote “Responsible AI & GLAM: challenges and opportunities” :
* defining “diversity” and “fairness” for #GLAM
* producer fairness vs. user-fairness
* treatment equality vs. counterfactional fairness
* popularity bias in recommender systems
* publication country bias in datasets
* identifying contentious terminology

More information:
* cwi.nl/en/groups/human-centere
* cultural-ai.nl
* aim4dem.nl

Laura Hollink is giving the first #TPDL2023 keynote “Responsible AI & GLAM: challenges and opportunities”Laura Hollink is giving the first #TPDL2023 keynote “Responsible AI & GLAM: challenges and opportunities”
Shawn M. Jones, PhDshawnmjones@hachyderm.io
2023-09-27

Zvjezdan Penezić, Marijana Tomić, and Gianmaria Silvello are kicking off #TPDL2023!

64 submissions
* 39 full papers
* 25 short papers

Acceptance:
* 33% of full papers were accepted for oral presentation
* 18% of full papers were accepted as short papers
* 10 short papers were accepted for oral presentation (40%)

Authors from 3 best papers from the International Journal of Digital Libraries were also invited to present.

Proceedings are available now: doi.org/10.1007/978-3-031-4384

Gianmaria Silvello is kicking off TPDL 2023.Zvjezdan Penezić is kicking of TPDL2023.Marijana Tomić is kicking off TPDL 2023.
2023-09-27

Laura Hollink kicks off #TPDL2023 with her keynote on responsible AI in GLAM

Laura Hollink standing in front of presentation.
Shawn M. Jones, PhDshawnmjones@hachyderm.io
2023-09-26

Zadar, Croatia is pretty. The #TPDL2023 conference starts later today. I cannot wait.

(I also have jet lag, but I’m fighting through it.)

Conference website: tpdl2023.dei.unipd.it/index.ht

Pedestrians walk by awnings and cafe tables amidst a variety of architectural eras on a busy street in Zadar, Croatia.Cars sit parked outside of the arch providing entry at Porta di Terraferma in Zadar, Croatia.A line of broken columns, a circular building, and a steeple are some of the ancient ruins at the Roman Forum in Zadar, Croatia.Many boats float along the waterfront of Zadar, Croatia.
Shawn M. Jones, PhDshawnmjones@hachyderm.io
2023-09-23

@kentborg @Dianora (I’m on a plane & the WiFi is spotty.)

I’m going to #TPDL2023 and will not live-tweet it like past conferences. I’ll post to #Mastodon and #Bluesky, but not #Twitter.

But I ask myself, why didn’t I quit when #Musk:
* made using the word #cisgender grounds for suspension?
* was promoting #antivaxx, #antisemitism, #racism, #sexism & hurting #BlackTwitter?
* suspended #journalists?

I stayed for the people who stayed. I stayed to help promote & support them.

#TwitterMigration

Shawn M. Jones, PhDshawnmjones@hachyderm.io
2023-09-22

Hi #Mastodon! Tomorrow I'm going to #Zadar, #Croatia for #TPDL2023! Any suggestions on cool things to do or places to eat at outside of the #DigitalLibraries #conference?

Conference site: tpdl2023.dei.unipd.it

An aerial view of the Greeting to the Sun and Sea Organ in Zadar, Croatia.

Image by dronepicr used with permission via CC BY 2.0 and courtesy of Wikimedia Commons.

https://commons.wikimedia.org/wiki/File:Aerial_view_of_The_Greeting_to_the_Sun_and_the_Sea_Organ_in_Zadar,_Croatia_%2848607771252%29.jpg
2023-07-20

Our pre-print is out! Very happy to share our (@CuperMirjam and @kreutz) @tpdl2023 #TPDL2023 full paper
„Aspect-Driven Structuring of Historical Dutch Newspaper Archives“

Check it out:
arxiv.org/abs/2307.09203

2023-07-06

Very happy to share that our has been accepted as a full paper at @tpdl2023 #TPDL2023
„Aspect-Driven Structuring of Historical Dutch Newspaper Archives“

Stay tuned for our upcoming pre-print next week!

EUPublicationsOffice 🇪🇺EULawDataPubs@respublicae.eu
2023-05-23

RT @Europeanaeu: Exciting opportunity for researchers and practitioners in #DigitalLibraries!📚 Join @tpdl2023 and explore all about bridging Research & Information Science with Digital Libraries. Submit your work! 💡tpdl2023.dei.unipd.it/index.ht
#TPDL2023

🐦🔗: n.respublicae.eu/EULawDataPubs

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst