Matthew Honnibal

Co-founder and CTO of @explosion

Matthew Honnibal boosted:
2025-02-14

Just published part 3 of my blog post series on making beautiful slides for your talks 🎨✨

This one is about presenting technical content and making dry and abstract topics more interesting. Featuring many examples, including talks by Vitaly Meursault and @sofie!

ines.io/blog/beautiful-slides-

Matthew Honnibalhonnibal@sigmoid.social
2024-07-17

spaCy and Prodigy started as indie projects, but in 2021 we decided to raise capital and have a larger team. We couldn’t make that configuration work, so we’re back to how we were before. I’ll be spending most of my time hands-on with spaCy again, and we have a lot of updates and improvements planned for Prodigy.

I hate how vaguely these things are usually discussed, so I also wrote a long post about it all: honnibal.dev/blog/back-to-our-

Matthew Honnibal boosted:
2024-07-17

Company update: We're going back to our roots!

We're back to running Explosion as a smaller, independent-minded and self-sufficient company. spaCy and Prodigy will stay stable and sustainable and we'll keep updating our stack with the latest technologies, without changing its core identity or purpose 💙

explosion.ai/blog/back-to-our-

Matthew Honnibal boosted:
2023-06-03

We are really excited to share that we have just released the alpha version of Prodigy v1.12! This includes LLM-assisted workflows for data annotation and prompt engineering as well as extended, fully customizable support for multi-annotator workflows.

support.prodi.gy/t/prodigy-1-1

Matthew Honnibal boosted:
2023-06-03

We present a brand new workflow for prompt engineering that allows you to compare the quality of several prompts in a tournament. The algorithm uses the Glico ranking system [en.wikipedia.org/wiki/Glicko_r] to select the best prompt.

future--prodi-gy.netlify.app/d

Matthew Honnibal boosted:
2023-06-03

Here are the slides for my #PyDataLondon keynote on LLMs from prototype to production ✨

Including:
◾ visions for NLP in the age of LLMS
◾ a case for LLM pragmatism
◾ solutions for structured data
◾ spaCy LLM + prodi.gy

speakerdeck.com/inesmontani/la

Matthew Honnibalhonnibal@sigmoid.social
2023-05-18

What will production NLP look like, once the dust settles around LLMs? One view is basically “prompts are all you need”. I disagree. I wrote a bit about this when we released #spaCy LLM last week, but the topic deserves its own post, so here it is.

explosion.ai/blog/against-llm-

Matthew Honnibal boosted:
2022-12-21

Hi #MastoCats! Let me introduce Rizhik and Alaska, our guest cats from Ukraine.

Ginger cat next to a tattoo of himselfDon sphynx kitten wearing a red sweater with a fluffy white collar
Matthew Honnibalhonnibal@sigmoid.social
2022-12-20

@kjr Good! We have several users using it with r2l and bidirectional text happily. Here's the config setting: prodi.gy/docs/install#config

Matthew Honnibalhonnibal@sigmoid.social
2022-12-19

If you don't have Prodigy, you can get a copy here: prodi.gy/buy

We sell Prodigy in a very old-school way, with a once-off fee for software you run yourself. There's no free download, but we're happy to issue refunds, and we can host trials for companies.

Matthew Honnibalhonnibal@sigmoid.social
2022-12-19

We didn't have to make any changes to Prodigy itself for this workflow — everything happens in the "recipe" script. You can build other things at least this complex for yourself, or you can start from one of our scripts and modify it according to your requirements.

Matthew Honnibalhonnibal@sigmoid.social
2022-12-19

The key to iteration speed is letting a small group of people — ideally just you! — annotate faster. That's where the scriptability comes in. Every problem is different, and we can't guess exactly what tool assistance or interface will be best. So we let you control that.

Matthew Honnibalhonnibal@sigmoid.social
2022-12-19

Modern neural networks are very sample efficient, because they use transfer learning to acquire most of their knowledge. You just need enough examples to define your problem. If annotation is mostly about problem definition, iteration is much more important than scaling.

Matthew Honnibalhonnibal@sigmoid.social
2022-12-19

I especially like this zero-shot learning workflow because it's a great example of what we've always set out to achieve with Prodigy. Two distinct features of Prodigy are its scriptability and the ease with which you can scale down to a single-person workflow.

Matthew Honnibalhonnibal@sigmoid.social
2022-12-19

This workflow looks pretty promising from initial testing. The model provides useful suggestions for categories like "ingredient", "dish" and "equipment" just from the labels, with no examples. And the precision isn't bad — I was impressed that it avoided marking "Goose" here.

Matthew Honnibalhonnibal@sigmoid.social
2022-12-19

So, let's compromise. We'll pipe our data through the OpenAI API, prompting it to suggest entities for us. But instead of just shipping whatever it suggested, we're going to go through and correct its annotations. Then we'll save those out and train a much smaller supervised model.

Matthew Honnibalhonnibal@sigmoid.social
2022-12-19

Machine learning is basically programming by example: instead of specifying a system's behaviour with code, you (imperfectly) specify the desired behaviour with training data.

Well, zero-shot learning is like that, but without the training data. That does have some advantages — you don't have to tell it much about what you want it to do. But it's also pretty limiting. You can't tell it much about what you want it to do.

Matthew Honnibalhonnibal@sigmoid.social
2022-12-19

So how can models like GPT3 help? One answer is zero- or few-shot learning: you prompt the model with something like "Annotate this text for these entities", and you append your text to the prompt. This works surprisingly well! It was an in the original paper.

However, zero-shot classifiers really aren't good enough for most applications. The prompt just doesn't give you enough control over the model's behaviour.

Matthew Honnibalhonnibal@sigmoid.social
2022-12-19

We've been working on new prodi.gy workflows that let you use the OpenAI API to kickstart your annotations, via zero- or few-shot learning. We've just published the first recipe, for NER annotation 🎉 github.com/explosion/prodigy-o

Here's what, why and how. 🧵

Let's say you want to do some 'traditional' NLP thing, like extracting information from text. The information you want to extract isn't on the public web — it's in this pile of documents you have sitting in front of you.

Client Info

Server: https://mastodon.social
Version: 2025.07
Repository: https://github.com/cyevgeniy/lmst