Jack Vanlightly

Messaging systems builder (RabbitMQ, Apache Pulsar, Apache BookKeeper, Apache Kafka). Also TLA+ nerd. Currently a researcher/adviser at Confluent.

2025-01-14

Another issue of Humans of the Data Sphere is out, with issue #6! In this issue we also dive into the world of AI agents including the promises and challenges ahead.

hotds.dev/p/humans-of-the-data

2024-12-22

It's the holidays, but I still carve out a little time of Kafka transactions. Diary entry 5 - adding AddPartitionsToTxn request/response, TC elections and producer fencing (TLA+).

jack-vanlightly.com/analyses/2

2024-12-10

Another issue of Humans of the Data Sphere is out! In issue #5 we also look at an interesting set of takes on the recent Cloudflare outage, and a review of data warehouse modeling methodologies.

hotds.dev/p/humans-of-the-data

2024-12-06

Every time I think I'm done exploring Fizzbee, it pulls me back in.

jack-vanlightly.com/blog/2024/

2024-12-05

Diary entry 4 of my Kafka transactions formal verification work. This entry is all about the initial Fizzbee spec and some benchmarks comparing it to TLC (the TLA+ model checker).

jack-vanlightly.com/analyses/2

2024-12-05

@asynchronaut Yes, I suppose they have the same goal. It's been a few years since I last studied boolean algebra though. But it is a fascinating and fun subject.

2024-12-05

Symmetry in TLA+ reduces the state space by treating interchangeable components like servers, nodes, or processes as equivalent. This post explores examples of how equivalent states get collapsed, to help build a mental model of symmetry when model checking.

jack-vanlightly.com/blog/2024/

2024-12-04

Missed one aspect of writing the TLA+ spec yesterday. How to gain confidence it is correct.

jack-vanlightly.com/analyses/2

2024-12-03

Diary entry 2 of formally verifying Kafka transactions. This entry covers my initial design decisions, and the nuances of liveness properties, with this first iteration of the TLA+ specification. Only read if you want to get down and dirty with TLA+!

jack-vanlightly.com/analyses/2

2024-12-02

I am trying something new, I'm writing a diary of my formal verification of Apache Kafka transactions, to track progress and discuss various aspects of the formal verification process (in both TLA+ and Fizzbee). To get the diary started, this is the first entry.

jack-vanlightly.com/analyses/2

2024-11-22

Another Humans of the Data Sphere is out!

In this issue, we also look at the topics of simulation, as well as Incremental View Maintenance.

hotds.dev/p/humans-of-the-data

2024-11-21

Just had my first discussion for planning my work on formally verifying the Kafka transaction protocol in TLA+, as well as a bunch of planned improvements. It would be nice to have it done by the holidays but we'll see.

2024-11-21

As a follow-up to my post on modeling and simulation, this post explores some additional concepts and reasons for using simulation to obtain statistical properties.

jack-vanlightly.com/blog/2024/

2024-11-19

New blog post on using modeling and simulation to obtain statistical properties of distributed system designs.

Shows a simple Python script, and its equivalent written in TLA+. TLA+ is usually used for safety properties but it can also be used for statistical properties.

jack-vanlightly.com/blog/2024/

2024-11-12

Another issue of Humans of the Data Sphere is out! In this issue we also explore compute-compute separation in modern distributed data systems.

hotds.dev/p/humans-of-the-data

2024-11-07

I'd love to see a paper or a blog post or something that explores wide events from an infrastructure cost perspective, discussing the design space, factors such as scale, cardinality, query patterns, and all the associated trade-offs.

2024-11-07

What I haven't seen yet is a good discussion of infrastructure costs. How this can be done without exploding costs? Have we designed observability data systems capable of storing huge amounts of structured wide events, with low latency queries and reasonable costs?

2024-11-07

Yes, wide events, canonical log lines are obviously a great user experience for the on-call engineer under pressure. I worked at Splunk for a while, I've seen what a powerful slice and dice machine over structured logs can do. But it is expensive!

2024-11-07

I've been an engineer on call in both complex, sprawling enterprise IT settings as well as much narrower settings such as running Apache Pulsar and BookKeeper at scale. What I'm reading about wide events, canonical log lines and what Charity Majors is called Observability 2.0 has me saying YES! But...

2024-11-05

I did this yesterday with my Apache Iceberg spec after being asked about a particular data conflict case, more just for me to remind myself of what bad scenarios the conflict check in question prevented. I disabled it and got a counterexample of an undelete. Then I disabled delete operations and got a lost update from two conflicting updates.

Client Info

Server: https://mastodon.social
Version: 2025.04
Repository: https://github.com/cyevgeniy/lmst