Lmst

The Library of Congress BIBFRAME Update is online today at 1PM EDT.
Talks about:
- Hubs (BF ontology)
- BF Cataloging at Penn Libraries
- BF Validation Tooling
https://listserv.loc.gov/cgi-bin/wa?A2=BIBFRAME;c9f7a556.2506&S=

@stuartyeates https://www.loc.gov/cds/products/marcDist.php If your are looking for MARC XML, under “MARC Open-Access”

New blog post: Communicating Ontology – Technical approaches for facilitating use of our Wikibase data

https://semlab.io/blog/communicating-ontology

A look at some tools made to help communicate research data stored in our Wikibase including property usage visualizations and JSON-LD bulk data downloads.

New blog post, three interfaces to explore the 50K 1929 HathiTrust resources that entered the public domain last month:

https://thisismattmiller.com/post/hathi-pd-2025/

Including this one which lets you find literature/fiction books by genre and lcsh.

New publication: “Knowledge Graphing Art Archives: Methods and Tools from the Semantic Lab’s E.A.T. Project”

Highlighting work creating a knowledge graph for archival materials from the avant-garde movement, Experiments in Art and Technology (E.A.T.).

https://openhumanitiesdata.metajnl.com/articles/10.5334/johd.268

@platypus nice, glad it's working!

@emrys thanks! I did only test it with default profile, so good to know how to get it working with your own profile.

With TikTok probably shutting down I made some scripts to download and build a local web interface for your TikTok liked and favorited videos:

https://github.com/thisismattmiller/tiktok-shutdown

It downloads the videos locally, I had 2200 videos, which takes up about 20GB.

A new post on using models like Segment Anything 2 and LLaVA on 14,000 woodcut images from Plantin-Moretus Museum: https://thisismattmiller.com/post/woodblockshop/

I used the results to make a little toy that lets you mashup elements from the woodcuts into new images: https://woodblockshop.glitch.me/

For Banned Book week I took a look at the metadata for 1500 titles identified by PEN America’s banned and challenged book list. Analyzing subject headings used and other data.

https://thisismattmiller.com/post/banned-metadata/

@electricarchaeo thanks for checking it out

New post looking at using the Whisper speech to text model on 400+ 1938 folk songs collected by Alan Lomax.

I look at quality, building a lyric focus web component player, search interface and LLM enrichment:

https://thisismattmiller.com/post/lomax-whisper/

@edsu @trc

I am not no. There is a small blog about the initial work in 2019 https://blogs.loc.gov/thesignal/2019/05/integrating-wikidata-at-the-library-of-congress/

In the research group I'm part of (outside of LC) https://semlab.io/ we do this in our own local wikibase, for example we maintain a local identifier for a entity and then link to the wikidata as well when appropriate (eg: https://base.semlab.io/wiki/Item:Q314)

Also reminds me of "cluster drift" in resources like VIAF. Where rebuilding the identity cluster can changes between versions.

Played a small part in this new Atlantic article looking at diversity in publishing:

https://www.theatlantic.com/books/archive/2024/06/diversity-publishing-backlash-study/678734/

(my part being supplying the book metadata)

I had some nice examples I wrote of using the new Worldcat /v2/ API endpoints but I guess I better keep those off github, wouldn't want it to be used as evidence of some imaginary offense in the future. Talk about a stupid chilling effect.

If you have +11 million names, like in the LC Name Authority File, how many of them anagram to each other? A lot: https://thisismattmiller.com/post/lcnaf-anagrams/

A list of LC NAF Names that all anagram to each other, screenshot from the website linked:

Kley, Mortin, 1975-
Klein, Marty
Markle, Tiny
Lantry, Mike
Martin, Kyle

Wrote a tutorial on how to migrate your data if you use Dockerized Wikibase to a new server:

https://thisismattmiller.com/post/migrating-your-docker-wikibase/

Very niche, but would have saved me a ton of time if existed.

@edsu yeah possibly, will need to look at the outputs and the current process.

@edsu
All the docs are in the DB yes, I think the easiest solution is to modify the current existing conversion to produce "nicer" json-ld, which I think would be a great, and I can definitely mention it to the team.

@edsu @thatandromeda @hochstenbach @acka47
To go from a xml doc to json representation it probably can but to do doc + sem triples store into a valid json-ld serialization there is no native way of doing it, that I’m aware of.

Yep, marklogic is a doc db/triple store and application layer built in. It’s all xquery code running everything.

Client Info