Matt Miller

Libraries/Data

2025-06-30

The Library of Congress BIBFRAME Update is online today at 1PM EDT.
Talks about:
- Hubs (BF ontology)
- BF Cataloging at Penn Libraries
- BF Validation Tooling
listserv.loc.gov/cgi-bin/wa?A2

2025-03-07

@stuartyeates loc.gov/cds/products/marcDist. If your are looking for MARC XML, under “MARC Open-Access”

2025-03-05

New blog post: Communicating Ontology – Technical approaches for facilitating use of our Wikibase data

semlab.io/blog/communicating-o

A look at some tools made to help communicate research data stored in our Wikibase including property usage visualizations and JSON-LD bulk data downloads.

2025-02-06

New blog post, three interfaces to explore the 50K 1929 HathiTrust resources that entered the public domain last month:

thisismattmiller.com/post/hath

Including this one which lets you find literature/fiction books by genre and lcsh.

2025-01-28

New publication: “Knowledge Graphing Art Archives: Methods and Tools from the Semantic Lab’s E.A.T. Project”

Highlighting work creating a knowledge graph for archival materials from the avant-garde movement, Experiments in Art and Technology (E.A.T.).

openhumanitiesdata.metajnl.com

2025-01-14

@platypus nice, glad it's working!

2025-01-13

@emrys thanks! I did only test it with default profile, so good to know how to get it working with your own profile.

2025-01-13

With TikTok probably shutting down I made some scripts to download and build a local web interface for your TikTok liked and favorited videos:

github.com/thisismattmiller/ti

It downloads the videos locally, I had 2200 videos, which takes up about 20GB.

2024-12-17

A new post on using models like Segment Anything 2 and LLaVA on 14,000 woodcut images from Plantin-Moretus Museum: thisismattmiller.com/post/wood

I used the results to make a little toy that lets you mashup elements from the woodcuts into new images: woodblockshop.glitch.me/

2024-09-27

For Banned Book week I took a look at the metadata for 1500 titles identified by PEN America’s banned and challenged book list. Analyzing subject headings used and other data.

thisismattmiller.com/post/bann

2024-09-13

@electricarchaeo thanks for checking it out

2024-09-13

New post looking at using the Whisper speech to text model on 400+ 1938 folk songs collected by Alan Lomax.

I look at quality, building a lyric focus web component player, search interface and LLM enrichment:

thisismattmiller.com/post/loma

2024-08-09

@edsu @trc

I am not no. There is a small blog about the initial work in 2019 blogs.loc.gov/thesignal/2019/0

In the research group I'm part of (outside of LC) semlab.io/ we do this in our own local wikibase, for example we maintain a local identifier for a entity and then link to the wikidata as well when appropriate (eg: base.semlab.io/wiki/Item:Q314)

Also reminds me of "cluster drift" in resources like VIAF. Where rebuilding the identity cluster can changes between versions.

2024-06-20

Played a small part in this new Atlantic article looking at diversity in publishing:

theatlantic.com/books/archive/

(my part being supplying the book metadata)

2024-04-16

I had some nice examples I wrote of using the new Worldcat /v2/ API endpoints but I guess I better keep those off github, wouldn't want it to be used as evidence of some imaginary offense in the future. Talk about a stupid chilling effect.

2024-04-01

If you have +11 million names, like in the LC Name Authority File, how many of them anagram to each other? A lot: thisismattmiller.com/post/lcna

A list of LC NAF Names that all anagram to each other, screenshot from the website linked:

Kley, Mortin, 1975- 
Klein, Marty 
Markle, Tiny 
Lantry, Mike 
Martin, Kyle
2024-03-21

Wrote a tutorial on how to migrate your data if you use Dockerized Wikibase to a new server:

thisismattmiller.com/post/migr

Very niche, but would have saved me a ton of time if existed.

2024-02-15

@edsu yeah possibly, will need to look at the outputs and the current process.

2024-02-15

@edsu
All the docs are in the DB yes, I think the easiest solution is to modify the current existing conversion to produce "nicer" json-ld, which I think would be a great, and I can definitely mention it to the team.

2024-02-15

@edsu @thatandromeda @hochstenbach @acka47
To go from a xml doc to json representation it probably can but to do doc + sem triples store into a valid json-ld serialization there is no native way of doing it, that I’m aware of.

Yep, marklogic is a doc db/triple store and application layer built in. It’s all xquery code running everything.

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst