A lot of discussions at #tpdl2023 about OCR, tesseract and the post-processing steps
A lot of discussions at #tpdl2023 about OCR, tesseract and the post-processing steps
@shawnmjones overheard at #tpdl2023 "be more like perma.cc"
In his #tpdl2023 talk "Synthesizing Web Archive Collections Into Big Data: Lessons From Mining Data From Web Archives" @shawnmjones summarizes status quo of lack of interoperability among web archives, specifically for large-scale use, aka "think about the robots!".
At #tpdl2023 Shawn Jones (@shawnmjones) is giving a talk on „Synthesizing Web Archive Collections Into Big Data: Lessons From Mining Data From Web Archives“.
He talks about challenges when mining data from web archives. The authors reviewed 22 web archives and discuss methods needed to re-synthesize a memento to something close to its original capture without augmentations.
He especially cares about robots.
Paper is available at: https://doi.org/10.1007/978-3-031-43849-3_19
At #TPDL2023 right now, @martinklein is presenting “It's Not Just GitHub: Identifying Data and Software Sources Included in Publications”
The authors trained a classifier to classify open-access data and software (OADS) URLs from research papers as dataset or code. Archivists can then take these URLs and preserve the referenced datasets and code for reproducibility.
Paper: https://doi.org/10.1007/978-3-031-43849-3_17
Preprint: https://arxiv.org/abs/2307.14469
Martin Klein (@martinklein) is giving a presentation at #tpdl2023. The title is: „It’s Not Just GitHub: Identifying Data and Software Sources Included in Publications“. He talks about in which repositories researcher published their data and code.
Paper is available at https://doi.org/10.1007/978-3-031-43849-3_17
Beatrice Alex is giving the second #TPDL2023 keynote “AI language technologies and digital collections: the need for interdisciplinary communication and co-design and training”
* How can we invite #AI into the #archive?
* AI can provide a lot of positive opportunities.
* To improve its application, we need #interdisciplinary collaborations going forward.
* AI #literacy needs to be taught early in education.
Ref:
* https://www.ed.ac.uk/profile/dr-beatrice-alex
* https://www.ltg.ed.ac.uk
* https://www.ed.ac.uk/usher/clinical-natural-language-processing/people
Yesterday at #TPDL2023 David Pride presented “CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering”
Rather than #ZeroShot question/answering, Pride’s team combines the #CORE #OpenAccess dataset with #ElasticSearch to create #FewShot prompts that leverage the strength of combining #search results with the #LLM’s (#GPT) #summarization abilities to produce an answer to a user’s question including citations.
Yesterday at #TPDL2023 Gianmaria Silvello presented“How to Cite a Web Ranking and Make it #FAIR”
Researchers often need to cite #SearchEngine results. Unfortunately, search engines change their algorithms and their index all the time. Alessandro Lotta and Gianmaria Silvello presented a #prototype that captures this ranking in a human- and machine-readable #format and posts it to #Zenodo for citing with a #DOI.
My suggestion: include #webarchiving and #webarchives
#TPDL2023 @hkroll and Mirjam Cuper from the Institute for Information Systems presented “Aspect-Driven Structuring of Historical Dutch Newspaper Archives”
The authors discussed the challenges of automatically organizing and structuring content in a corpus when the #OCR is unreliable, the #metadata might be inconsistent, and the #licensing restrictions dictate who can see the content.
Just learned about Publication Retriever (https://github.com/LSmyrnaios/PublicationsRetriever) which scrapes the PDF URL based on an article URL.
This would have been very useful to me a year ago!
#tpdl2023
Laura Hollink is giving the first #TPDL2023 keynote “Responsible AI & GLAM: challenges and opportunities” :
* defining “diversity” and “fairness” for #GLAM
* producer fairness vs. user-fairness
* treatment equality vs. counterfactional fairness
* popularity bias in recommender systems
* publication country bias in datasets
* identifying contentious terminology
More information:
* https://www.cwi.nl/en/groups/human-centered-data-analytics/
* https://www.cultural-ai.nl
* https://www.aim4dem.nl
Zvjezdan Penezić, Marijana Tomić, and Gianmaria Silvello are kicking off #TPDL2023!
64 submissions
* 39 full papers
* 25 short papers
Acceptance:
* 33% of full papers were accepted for oral presentation
* 18% of full papers were accepted as short papers
* 10 short papers were accepted for oral presentation (40%)
Authors from 3 best papers from the International Journal of Digital Libraries were also invited to present.
Proceedings are available now: https://doi.org/10.1007/978-3-031-43849-3
Laura Hollink kicks off #TPDL2023 with her keynote on responsible AI in GLAM
Zadar, Croatia is pretty. The #TPDL2023 conference starts later today. I cannot wait.
(I also have jet lag, but I’m fighting through it.)
Conference website: https://tpdl2023.dei.unipd.it/index.html
@kentborg @Dianora (I’m on a plane & the WiFi is spotty.)
I’m going to #TPDL2023 and will not live-tweet it like past conferences. I’ll post to #Mastodon and #Bluesky, but not #Twitter.
But I ask myself, why didn’t I quit when #Musk:
* made using the word #cisgender grounds for suspension?
* was promoting #antivaxx, #antisemitism, #racism, #sexism & hurting #BlackTwitter?
* suspended #journalists?
I stayed for the people who stayed. I stayed to help promote & support them.
Hi #Mastodon! Tomorrow I'm going to #Zadar, #Croatia for #TPDL2023! Any suggestions on cool things to do or places to eat at outside of the #DigitalLibraries #conference?
Conference site: https://tpdl2023.dei.unipd.it
Our pre-print is out! Very happy to share our (@CuperMirjam and @kreutz) @tpdl2023 #TPDL2023 full paper
„Aspect-Driven Structuring of Historical Dutch Newspaper Archives“
Check it out:
https://arxiv.org/abs/2307.09203
Very happy to share that our has been accepted as a full paper at @tpdl2023 #TPDL2023
„Aspect-Driven Structuring of Historical Dutch Newspaper Archives“
Stay tuned for our upcoming pre-print next week!
RT @Europeanaeu: Exciting opportunity for researchers and practitioners in #DigitalLibraries!📚 Join @tpdl2023 and explore all about bridging Research & Information Science with Digital Libraries. Submit your work! 💡https://tpdl2023.dei.unipd.it/index.html
#TPDL2023
🐦🔗: https://n.respublicae.eu/EULawDataPubs/status/1660672353511120896