Lmst

It’s HACK FRIDAY, and BsidesPDX is dropping two must-watch talks:

A Decade of Low-Hanging Fruit in the Linux Kernel by Kees Cook
Now streaming: https://youtu.be/orO8czP5Bxw

Bastardo Finale by Bryan Hance
Premieres at 2:00 PM PT: https://youtu.be/znEzDrh2I4o

Subscribe to Bsides Portland on YouTube so you don’t miss out.

@corbden Interested in that, as well. I saw a great talk on how misinformation combatants were using #MISP to share threat actor info. I thought it was at either #pancakescon or #bsidespdx, but I can't find it online. This was 2-3years ago, I believe.

Mirroring known vulnerability data globally, for free. https://curiousity.ca/2024/mirroring-known-vulnerability-data-globally-for-free/ If you just want the answer to “where do I find a reliable global mirror of NVD vulnerability data?” the answer is https://cveb.in/ . It is being mirrored on the same servers used for major open source projects so you’re probably already trusting them. If you want to hear the story of how we got there, read on… … #BSidesPDX #infosec #OpenSource

A diagram from my talk about NVD mirroring. The top of the slide is labelled "2024 Corporate Security Policy_final_FINAL.doc" (which is a joke about filenames for things that undergo a lot of revisions). There are then three columns. The first is labelled Step 1 and there is text in a red box that reads "Scan components for vulnerabilities." Step 2 has an orange box which contains the text, "???" and Step 3 has two green boxes, one of which says "EVERYTHING IS SECURE" and has a picture of a closed lock. The second reads, "$$ Profit $$"

If you just want the answer to “where do I find a reliable global mirror of NVD vulnerability data?” or “Where should I get a list of CVEs if NVD is down?” the answer is https://cveb.in/ . It is being mirrored on the same servers used for major open source projects so you’re probably already trusting them, and they should be fast and may be very close to you. Please go ahead and use it and let us know how it works for you!

I co-presentented a talk about this work at BSidesPDX on Saturday, October 26, 2024:

The slides are here: https://docs.google.com/presentation/d/1IFqSrhfasoXsLKydJQPg_220XoU3eq8G7Kk3woJOn3Y/edit?usp=sharing
And I don’t know when/if they’ll be splitting out the recordings but you can find us mid-way through the live stream for Saturday. Our talk starts a bit after 3:24. https://www.youtube.com/live/gPYBXaua6xc?si=MhDh9-3nC86i-bcw&t=12249

Often when I write about talks I’ve given, I try to kind of recreate them in blog posts to be a bit of a director’s cut were I add in a bit of extra material that didn’t make the talk but they’re pretty similar to what I said on stage. This time, though, since I didn’t give the second half of the talk and John and I have very different ways of telling a story, I’m just gonna tell a story in this blog post and maybe toss in a few slides. If you want to watch us both tell the story from our own perspectives, check out the video. Although we collaborate on a lot of stuff it’s surprisingly rare for us to share a stage.

Still here and not going with the recording? Okay, let me tell you a story…

The US government DDOSed itself

Once upon a time, not so long ago, the US government decided it wanted to raise the bar for software security in their supply chain, and they wrote up an executive order on cybersecurity explaining how they wanted suppliers to do better, including a section on not shipping software with known vulnerabilities. Many other groups followed suit with similar recommendations or requirements.

As a result, a lot of organizations’ security plans started to look a lot like this:

Image Description: A diagram from my talk about NVD mirroring. The top of the slide is labelled “2024 Corporate Security Policy_final_FINAL.doc” (which is a joke about filenames for things that undergo a lot of revisions). There are then three columns. The first is labelled Step 1 and there is text in a red box that reads “Scan components for vulnerabilities.” Step 2 has an orange box which contains the text, “???” and Step 3 has two green boxes, one of which says “EVERYTHING IS SECURE” and has a picture of a closed lock. The second reads, “$$ Profit $$”

There is a lot to say about steps 2 and 3 here, but our problem starts at the beginning of Step 1. To scan for vulnerabilities, you need a list of software you’re providing (which is a whole talk in and of itself) and a list of known software vulnerabilities.

One of the biggest sources of vulnerability data actually comes from the US government: the NVD (National Vulnerability Database) provided by NIST (National Institute of Standards and Technology). It’s pretty great — they provide it fully free, publicly licensed. This is usually where you go to get information about CVEs (Common Vulnerabilities and Exposures).

But what do you think happens if every single US government supplier and indeed, many other software companies around the world, all try to grab this data at once? And more than that, many of them start enabling regular scanning so they’re grabbing it multiple times per day, or per hour?

Image Description: A slide from my BSidesPDX 2024 talk which reads “Distributed Denial of Service” and has photo I took of some street signs near the train tracks. The relevant one is a large yellow caution sign that shows a person with a bike getting a wheel stuck in the train tracks and the rider is being launched off the bike over the tracks.

So, yeah, the US government kind of started a denial of service attack against its own agency. And in case that wasn’t bad enough, we started seeing headlines like “NIST Struggles with NVD Backlog as 93% of Flaws Remain Unanalyzed ” where the stories talked about funding cuts at NIST.

The fine folk at NIST have been doing a hard job with not enough resources and some really unfortunate timing, so they’d already been working on keeping things from being overwhelmed. They had introduced rate limits per IP address/API key to keep rogue scanning jobs from ruining things for everyone, and they had started providing an API that allowed people to get just the newest data instead of having to download things every time. Unfortunately, the API combined with rate limits was pretty slow so getting the full database the first time using the API was onerous when it worked at all. Several of my colleagues in the UK and in India had such long delays that they had to give up and bootstrap the “old” way to get started. And a lot of people were running their scanning within ephemeral containers and just didn’t cache the copy of the database at all so they wanted to get all the data fresh with each new scan. When neither the rate limits nor the API was enough to address demand longer-term, and with budget cuts on the horizon, NIST turned to looking for industry partnerships and additional funding.

It was clear that this wasn’t a problem that was getting solved quickly.

That sounds bad, but how is that YOUR problem, Terri?

Why did I care? I mean, obviously I’m a security professional and things that stand in the way of good security choices are a problem for me in general. But in this case, my work open source project involves building a vulnerability scanner called cve-bin-tool: https://pypi.org/project/cve-bin-tool . It’s a free, open source software vulnerability scanner for binary files, git repos, and SBOMs.

(Quick reminder: This is my personal blog and as such, all opinions here are my own and do not necessarily reflect those of my employer.)

In the course of developing software to scan for vulnerabilities, we’d gotten a front row seat to all of the NVD changes: we’d had to start using API Keys and explaining them to our users, we’d had to handle new timeout messages and do appropriate backoffs and rate limits, and we’d started getting reports from users that updates were slow or not working. Many users and contributors located outside of the US were experiencing extensive delays.

Following NVD best practice had been making our code more complex, our software harder to use, and our users unhappy. It’s hard enough to get software developers to care about vulnerabilities, and it was getting uncomfortably hard to do something that had previously been pretty easy to install and try. But while we supported other data sources with vulnerability data, NVD was still the biggest one and the one people wanted the most.

How do we make vulnerability data available to everyone?

We probably could have solved the problem for cve-bin-tool similar to how commercial entities have handled it: make our own copy, query that, keep it updated separately. They often add proprietary data (such as the missing triage of new vulnerabilities) and then sell access to that data as part of their solution. We were already keeping a local copy of the data in github so our CI jobs would quit timing out at inopportune moments. But my goal has long been to make software more secure for everyone. What if we thought bigger than one python application? What if I built a solution that would help the whole world?

Image description: A slide from my BSidesPDX 2024 talk. On one side, it reads “what if we helped the *world* get vulnerability data?” and on the other side it has a screenshot of a tumblr post. The first post is from writing-prompt-s and reads “In a game with no consequences, why are you still playing the ‘Good’ side?”. The next post is from raphaeliscoolbutrude and says “Because being mean makes me feel bad.” The final post is from user deflare and reads, “Because my no-consequences power fantasy is *being able to help everyone*”

It might have been easy to lay a lot of the blame on people using “ephemeral” continuous integration jobs. They typically grab a mostly empty linux image, install/update some software, download the thing they want to scan, download the vulnerability data, store a report somewhere, then throw the rest of the thing away to start fresh next time. If they just cached the data instead of grabbing it every single time, we wouldn’t be in this mess.

But we could learn from what they were doing too: it was perfectly viable for them to download entire software binaries every single time, and no one batted an eye at that. Why was it easier and faster to get the software than to get meta data about the software? The answer, of course, is that we weren’t all trying to download from a single underfunded government agency. But instead we were downloading from… a bunch of underfunded open source hippies? How was that working but the government servers weren’t?

I am old enough that I knew the answer. Open source had solved their distribution problem by asking people to store a “mirror” (a copy of all the files) on their own servers, then building infrastructure to help people find the one closest to them. It all happened long before anyone had coined the term “cloud service provider” and it had happened on shoestring budgets with people donating a bit of space in a server rack and a bit of bandwidth. A lot of early mirrors were in universities or small internet service providers who had an open source enthusiast on staff. Get enough of them, and suddenly everyone gets software and no one gets stuck with a giant bill or an overloaded server.

It looked like neither government nor industry was going to solve this problem on the timeline I wanted and maybe never on the global scale that would make my life easier. But I have access to resources that a government agency maybe doesn’t: I know where one of the world’s leading experts on open source mirroring lives. It’s in my house. Because I married him. As well as having years of experience in multiple roles, he’s actively involved in running one of the larger open source content distribution networks in the world. So I had access to exactly what I needed to help everyone. I walked upstairs and said, “Hey John, if I wanted to mirror the NVD data on the micro mirrors, could we do that?” and then we figured out how to make it happen.

FCIX Micro Mirrors

This is the point at which I handed the talk over to John. But here’s my truncated version of his half of the story.

John builds infrastructure the way I knit: compulsively and constantly. And when he’s not actually doing something with his infrastructure there’s a good chance he’s thinking about it or talking about it. He hosts people’s websites and emails and mastodon accounts, he accidentally got involved in founding a whole internet exchange, and he’s forever automating and building backends for things in the house that I really wish weren’t internet-enabled. (Look, I’m a security professional, I’m allergic to too much internet.)

One day his friend Kenneth decided it would be fun to run a software mirror for their internet exchange, and he roped John into it, and then into this hare brained idea of maybe running a lot of mirrors on cheap hardware. John had previously run kernel.org and the associated linux mirrors there, and he had done so on big beefy servers with big beefy bandwidth, so he was skeptical that this would work. Still, not only was it cheap to try and see, but thanks to some donations they didn’t even have to lay out much of their own money to get it going. And long story short: it turns out it works incredibly well.

The deal is that they build up these cheap “thin client” boxes with a hard drive in them that have a copy of the data and are managed remotely by John and Kenneth. Then they offer them up to free to data centres who are willing to provide power and internet. It’s kind of a fully managed appliance, so the data centre gets blazing fast downloads of open source software for their customers and anyone else “nearby” and Kenneth and John get a dot on their map and the knowledge that they’re helping distribute open source software. (Also they get to run globally load bearing infrastructure for funsies. Which it really is for them.)

Here’s my favourite picture: since one of our contributors is based in the UK, we turned on the UK-based mirrors first, and one of them is a data center in a box in a field:

Image Description: A dark green utility box sitting in a beautiful field with yellow summer grass, green bushes, and green trees along the edges. There is a wedge of blue sky with clouds visible. One of the software mirrors is inside the green box.

What’s been amazing is that this little network of devices is now a major powerhouse of linux mirroring. They estimate that they’re providing 90% of the bandwidth used for VLC, so if you’ve downloaded that or anything else they serve, there’s a good chance you’ve already used one of these mirrors and not known it. https://mirror.fcix.net/ if you want to see the list of projects. Kenneth is giving a talk at SeaGL in November if you want to hear more about the micro mirror story.

Serving the right data: files are better than APIs

The key to using these tiny servers is basically “linux people optimized sending files in order to make mirrors work.” And they did that quite a while ago so it’s really stable and fast now. You might think “oh, couldn’t you use bittorrent?” but that adds a lot of overhead. (That paper is older, but the numbers haven’t made it look more appealing in the time since then.)

If we want to go with what works, then, we can’t mirror the NVD API — that would require processing and these mirrors are not that smart. But it turns out… people didn’t really love the NVD API. It definitely filled a need for some folk, but when they tried to turn off the old file-based data so many people protested that the original deadline for removing the files got pushed out and pushed out. So we can probably guess that many users would like the files as much or better than the API, assuming they could get them faster and without rate limits.

So here’s what it looks like:

We are running our own API crawler
Generating json files compatible to the original ones
Signing those files with possibly the sketchiest gpg key on the planet
Mirroring these files to a worldwide CDN we created
Literally solving the entire API / DDOS problem for… free?

Since cve-bin-tool has to speak API already, we can have cve-bin-tool output valid json files when needed. Although since NVD is still providing the json files at the time of this writing, we can (and do) get their files directly.

I should note that the technical implementation and testing in a live environment took a few months once we decided to do it. Much faster than waiting for funding!

Why should you trust us?

First: we are not affiliated with NIST, and they were not involved in any of this. Although I did email them so they knew who was behind it in case it came up and I got a nice email saying effectively they don’t officially endorse anything, which is fine. I want to joke that I’m the pirate radio of vuln data, but recall that the data is licensed public so there’s no piracy involved. Just fast and efficient transmission of perfectly allowed data.

So why should you trust some internet randos to get you vulnerability data? After all, the software security industry tries to tell you to stop downloading files served by random people on the internet! But these are the same servers that you’re probably using to get security updates, so… you probably already do trust them?

For a lot of the software on these mirrors, it’s a trust-but-verify solution where the packages are signed and package managers validate those so even if one of the data centres wanted to serve up malicious code, it wouldn’t get auto-installed unless they also compromised some build and signing servers. So you’re trusting not just the mirror, but the whole process to make sure the mirror serves up the right data.

If you’re going to build some similar verification into your tool that uses NVD data, you can verify our (sketchy) gpg signatures so you know it came from us, but you can also validate the data against NVD itself. For the json files they provide some metadata you can use. If we’re generating our own json (as we expect to do when they turn off theirs) then it might get a bit more complicated, but you can probably figure something out. For example, if validating all the data is impractical, you could have something that uses the API to double-check only the CVEs you care about. You can also always use us as your “seed” source and then update against NVD directly thus overwriting as needed.

(Incidentally, don’t bother trying to run a json schema check on the data as part of your checks unless you like noise. We did this in cve-bin-tool and had to turn it to just warn instead of halting because NVD themselves produce invalid json files frequently enough that it was a problem. Turns out keeping a giant database full of user-submitted data valid is hard.)

Using the mirror

The instructions are here: https://cveb.in/

Basically, go nuts. Those little thin clients can handle full fedora releases and don’t even max out on release day any more. Please use them! They should be fast, they are probably significantly less overloaded than the main NVD servers, and there’s no rate limits or API keys needed. Plus, you’ll make pretty marks on John’s graphs.

You can also use the mirror data as part of cve-bin-tool so you don’t have to build your own scanning service!

Get the package on pypi using pip install cve-bin-tool
Source code
Documentation

Conclusion

I noticed a problem where software vulnerability data about CVEs was getting harder and harder to access, and roped the fine folk of the FCIX Micro Mirror project into hosting a copy of this publicly available data on https://cveb.in/ which they are doing for free thanks to donations of time, money, and server rack space from a variety of folk. These mirrors are fast, available worldwide, not rate limited, and we would love it if you used them.

Contacting us

The comments for this post will turn off after a few weeks because I don’t feel like dealing with spam, feel free to hit me or John up with questions on the fediverse anytime! We’d also to love to hear how you use https://cveb.in/

Terri: https://social.afront.org/@terri
John: https://social.afront.org/@warthog9

Future work

I’m not actively working on mirroring anything else at the moment, but I *do* think it would be super cool if we could get the micro mirror system to help provide files for pypi / pip. So if you’ve got a lead there and global distribution of python packages sounds like a good idea, let us know! And if you’ve got any other way we could make the world a better place for free, that’s cool too.

https://curiousity.ca/2024/mirroring-known-vulnerability-data-globally-for-free/

#BSidesPDX #infosec #OpenSource

A slide from my BSidesPDX 2024 talk which reads "Distributed Denial of Service" and has photo I took of some street signs near the train tracks. The relevant one is a large yellow caution sign that shows a person with a bike getting a wheel stuck in the train tracks and the rider is being launched off the bike over the tracks.

A slide from my BSidesPDX 2024 talk. On one side, it reads "what if we helped the *world* get vulnerability data?" and on the other side it has a screenshot of a tumblr post. The first post is from writing-prompt-s and reads "In a game with no consequences, why are you still playing the 'Good' side?". The next post is from raphaeliscoolbutrude and says "Because being mean makes me feel bad." The final post is from user deflare and reads, "Because my no-consequences power fantasy is *being able to help everyone*"

A dark green utility box sitting in a beautiful field with yellow summer grass, green bushes, and green trees along the edges. There is a wedge of blue sky with clouds visible. One of the software mirrors is inside the green box.

I really appreciate that the @BSidesPDX crew found a way to get hacker hoodies for Sasquatch. (The f-bomb is a dog toy that just happened to be nearby 💣)

Had to bail on the conference early today because my kid decided he'd gotten enough badge candy and he was done, but we had a great time!

#infosec #BSidesPDX

#BSidesPDX team did a great job on all the logos, stickers, badge, and media this year! Love the fall timing and themes too.

@kees how dare you use Fred Rogers to manipulate me 😭
Good talk today #BSidesPDX

#BSidesPDX Yesterday's EDF workshop was awesome, thanks @Kwestin

My very first Linux malware reverse engineering class with @pinkflawd at #BSidesPDX! ❤️ https://blackhoodie.re

Bsides PDX is in full swing thanks to our sponsors, like #OrcaSecurity . If you missed the first day, there's still tomorrow and you can catch it streaming here: https://bit.ly/4e00GOT
#BsidesPDX #Cybersecurity #HackerCommunity

Made it to #BSidesPDX for today. I'm talking tomorrow but today I get to relax and knit and listen. Check out the great logo/badge! Looks beautiful even if I haven't turned it on yet.

#infosec

A pile of knitting and a bsidespdx badge on my lap. The badge has a sasquatch carrying a jack-o'-lantern full of candy.

Anyone else at #BSidesPDX today?

BSides has the vibe of record store day with attendees who collect/fix old bicycles and have probably volunteered for something in their lives.

SEA, VI and now happy to be back for #BSidesPDX !

Bsides PDX starts TOMORROW at Portland State University. We hope to see you all there! Shoutout to our sponsors for making this event possible. Register: https://bit.ly/3zXOG2t #ISSAPortland #NoStarchPress #IdentityTechnologies #BsidesPDX #Cybersecurity #TechCommunity

The clock is ticking—Bsides PDX kicks off this Friday. Thanks to our sponsors for their support. Their contributions allow us to bring together students, hackers, and security professionals.
@Hacker0x01

#BsidesPDX #InfosecCommunity #Cybersecurity

The clock is ticking—Bsides PDX kicks off this Friday. Thanks to our sponsors for their support. Their contributions allow us to bring together students, hackers, and security professionals.
@Hacker0x01
#BsidesPDX #InfosecCommunity #Cybersecurity

A special thank you to the sponsors behind Bsides PDX. Your support enables us to offer high-quality content and opportunities for everyone in the security community. We couldn’t do it without you. Thank you ISACA Portland and Eclypsium!
#Cybersecurity #BsidesPDX #TechCommunity

#BsidesPDX

Client Info