The Library of Congress BIBFRAME Update is online today at 1PM EDT.
Talks about:
- Hubs (BF ontology)
- BF Cataloging at Penn Libraries
- BF Validation Tooling
https://listserv.loc.gov/cgi-bin/wa?A2=BIBFRAME;c9f7a556.2506&S=
Libraries/Data
The Library of Congress BIBFRAME Update is online today at 1PM EDT.
Talks about:
- Hubs (BF ontology)
- BF Cataloging at Penn Libraries
- BF Validation Tooling
https://listserv.loc.gov/cgi-bin/wa?A2=BIBFRAME;c9f7a556.2506&S=
@stuartyeates https://www.loc.gov/cds/products/marcDist.php If your are looking for MARC XML, under “MARC Open-Access”
New blog post: Communicating Ontology – Technical approaches for facilitating use of our Wikibase data
https://semlab.io/blog/communicating-ontology
A look at some tools made to help communicate research data stored in our Wikibase including property usage visualizations and JSON-LD bulk data downloads.
New blog post, three interfaces to explore the 50K 1929 HathiTrust resources that entered the public domain last month:
https://thisismattmiller.com/post/hathi-pd-2025/
Including this one which lets you find literature/fiction books by genre and lcsh.
New publication: “Knowledge Graphing Art Archives: Methods and Tools from the Semantic Lab’s E.A.T. Project”
Highlighting work creating a knowledge graph for archival materials from the avant-garde movement, Experiments in Art and Technology (E.A.T.).
https://openhumanitiesdata.metajnl.com/articles/10.5334/johd.268
@platypus nice, glad it's working!
@emrys thanks! I did only test it with default profile, so good to know how to get it working with your own profile.
With TikTok probably shutting down I made some scripts to download and build a local web interface for your TikTok liked and favorited videos:
https://github.com/thisismattmiller/tiktok-shutdown
It downloads the videos locally, I had 2200 videos, which takes up about 20GB.
A new post on using models like Segment Anything 2 and LLaVA on 14,000 woodcut images from Plantin-Moretus Museum: https://thisismattmiller.com/post/woodblockshop/
I used the results to make a little toy that lets you mashup elements from the woodcuts into new images: https://woodblockshop.glitch.me/
For Banned Book week I took a look at the metadata for 1500 titles identified by PEN America’s banned and challenged book list. Analyzing subject headings used and other data.
@electricarchaeo thanks for checking it out
New post looking at using the Whisper speech to text model on 400+ 1938 folk songs collected by Alan Lomax.
I look at quality, building a lyric focus web component player, search interface and LLM enrichment:
I am not no. There is a small blog about the initial work in 2019 https://blogs.loc.gov/thesignal/2019/05/integrating-wikidata-at-the-library-of-congress/
In the research group I'm part of (outside of LC) https://semlab.io/ we do this in our own local wikibase, for example we maintain a local identifier for a entity and then link to the wikidata as well when appropriate (eg: https://base.semlab.io/wiki/Item:Q314)
Also reminds me of "cluster drift" in resources like VIAF. Where rebuilding the identity cluster can changes between versions.
Played a small part in this new Atlantic article looking at diversity in publishing:
https://www.theatlantic.com/books/archive/2024/06/diversity-publishing-backlash-study/678734/
(my part being supplying the book metadata)
I had some nice examples I wrote of using the new Worldcat /v2/ API endpoints but I guess I better keep those off github, wouldn't want it to be used as evidence of some imaginary offense in the future. Talk about a stupid chilling effect.
If you have +11 million names, like in the LC Name Authority File, how many of them anagram to each other? A lot: https://thisismattmiller.com/post/lcnaf-anagrams/
Wrote a tutorial on how to migrate your data if you use Dockerized Wikibase to a new server:
https://thisismattmiller.com/post/migrating-your-docker-wikibase/
Very niche, but would have saved me a ton of time if existed.
@edsu yeah possibly, will need to look at the outputs and the current process.
@edsu
All the docs are in the DB yes, I think the easiest solution is to modify the current existing conversion to produce "nicer" json-ld, which I think would be a great, and I can definitely mention it to the team.
@edsu @thatandromeda @hochstenbach @acka47
To go from a xml doc to json representation it probably can but to do doc + sem triples store into a valid json-ld serialization there is no native way of doing it, that I’m aware of.
Yep, marklogic is a doc db/triple store and application layer built in. It’s all xquery code running everything.