Lmst

🚀 Ah, the riveting saga of Argon2: the password hashing algorithm that's as widely adopted as a fax machine in 2023 📠. This academic snoozefest boldly explores what no one asked—just how effective is #Argon2 in the "real world" where nobody uses it anyway. 🤦‍♂️
https://arxiv.org/abs/2504.17121 #PasswordHashing #CyberSecurity #TechTrends #AcademicDiscussion #HackerNews #ngated

Step‑by‑Step Password Hashing in Python with Bcrypt – Secure & Practical

https://www.emilyselect.com/step%e2%80%91by%e2%80%91step-password-hashing-in-python-with-bcrypt-secure-practical/?feed_id=1522&_unique_id=68ac73e3e33f9&Mastodon

#bcryptpasswordhashing #bcryptpythonexample #bcrypttutorial #encryptpasswordspython #flaskbcrypt #hashpasswordspython #hashingpasswordssecurely #howbcryptworks #howtohashpasswords #loginsystempython #passwordhashing #passwordhashinginpython #pythonauthenticatio...

Beyond Bcrypt

In 2010, Coda Hale wrote How To Safely Store A Password which began with the repeated phrase, “Use bcrypt”, where the word bcrypt was linked to a different implementation for various programming languages.

This had two effects on the technology blogosphere at the time:

It convinced a lot of people that bcrypt was the right answer for storing a password.
It created a meme for how technology bloggers recommend specific cryptographic algorithms when they want attention from Hacker News.

At the time, it was great advice!

Credit: CMYKat

In 2010, bcrypt was the only clearly good answer for password hashing in most programming languages.

In the intervening almost fifteen years, we’ve learned a lot more about passwords, password cracking, authentication mechanism beyond passwords, and password-based cryptography.

If you haven’t already read my previous post about password-based cryptography, you may want to give that one a once-over before you continue.

But we’ve also learned a lot more about bcrypt, its limitations, the various footguns involved with using it in practice, and even some cool shit you can build with it.

In light of a recent discussion about switching WordPress’s password hashing algorithm from PHPass (which is based on MD5) to bcrypt, I feel now is the perfect time to dive into this algorithm and its implications on real-world cryptography.

Understanding Bcrypt in 2024

Bcrypt is a password hashing function, but not a password KDF or general-purpose cryptographic hash function.

If you’re using a sane password storage API, such as PHP’s password API, you don’t even need to think about salting your passwords, securely verifying passwords, or handling weird error conditions. Instead, you only need concern yourself with the “cost” factor, which exponentially increases the runtime of the algorithm.

There’s just one problem: bcrypt silently truncates after 72 characters (or rather, bytes, if you’re pedantic and assume non-ASCII passwords, such as emoji).

Here’s a quick script you can run yourself to test this:

<?php$example1 = str_repeat('A', 72);$example2 = $example1 . 'B';$hash = password_hash($example1, PASSWORD_BCRYPT);var_dump(password_verify($example2, $hash));

This may sound ludicrous (“who uses 72 character passwords anyway?”) until you see security advisories like this recent one from Okta.

The Bcrypt algorithm was used to generate the cache key where we hash a combined string of userId + username + password. Under a specific set of conditions, listed below, this could allow users to authenticate by providing the username with the stored cache key of a previous successful authentication.
(…)
The username is 52 characters or longer

The other thing to consider is that many people use passphrases, such as those generated from Diceware, which produce longer strings with less entropy per character.

If you use bcrypt as-is, you will inevitably run into this truncation at some point.

“Let’s pre-hash passwords!”

In response to this limitation, many developers will suggest pre-hashing the password with a general purpose cryptographic hash function, such as SHA-256.

And so, in pursuit of a way to avoid one footgun, developers introduced two more.

Truncation on NUL Bytes

If you use the raw binary output of a hash function as your password hash, be aware that bcrypt will truncate on NUL (0x00) bytes.

With respect to the WordPress issue linked above, the default for PHP’s hashing API is to output hexadecimal characters.

This is a bit wasteful. Base64 is preferable, although any isomorphism of the raw hash output that doesn’t include a 0x00 byte is safe from truncation.

Hash Shucking

When a system performs a migration from a cryptographic hash function (e.g., MD5) to bcrypt, they typically choose to re-hash the existing hash with bcrypt.

Because users typically reuse passwords, you can often take the fast, unsalted hashes from another breach and use it as your password dictionary for bcrypt.

If then you succeed in verifying the bcrypt password for a fast hash, you can then plug the fast hash into software like Hashcat, and then crack the actual password at a much faster rate (tens of billions of candidates/second, versus thousands per second).

This technique is called hash shucking (YouTube link).

You can avoid hash shucking by using HMAC with a static key–either universal for all deployments of your software, or unique per application.

It doesn’t really matter which you choose; all you really need from it is domain separation from naked hashes.

I frequently see this referred to as “peppering”, but the term “pepper” isn’t rigidly defined anywhere.

One benefit of using a per-application HMAC secret does make your hashes harder to crack if you don’t know this secret.

For balance, one downside is that your hashes are no longer portable across applications without managing this static key.

Disarming Bcrypt’s Footguns

Altogether, it’s quite straightforward to avoid bcrypt’s footguns, as I had recommended to WordPress last week.

Pre-hash with HMAC-SHA512.
Ensure the output of step 1 is base64-encoded.
Pass the output of step 2 to PHP’s password API.

Easy, straightforward, and uncontroversial. Right?

Objections to Bcrypt Disarmament

The linked discussion was tedious, so I will briefly describe the objections raised to my suggestion.

This is “rolling our own crypto”.
- Answer: No, it’s a well-understood pattern that’s been discussed in the PHP community for well over a decade.
Passwords over 72 characters are rare and not worthy of our consideration.
- Answer: No, this has bit people in unexpected ways before (see: Okta).
  When you develop a popular CMS, library, or framework, you cannot possibly know all the ways that your software will be used by others. It’s almost always better to be misuse-resistant.
Pre-hashing introduces a Denial-of-Service attack risk.
- Answer: No. Bcrypt with a cost factor of 10 is about 100,000 times as expensive as SHA2.
This introduces a risk of hash shucking.
- As demonstrated above, HMAC doesn’t suffer this problem (assuming the key is reasonably selected).
Base64 encoding reduces entropy.
- Answer: No, it’s isomorphic.
Base64 with the 72 character truncation reduces entropy.
- Answer: We’re still truncating SHA-512 to more than 256 bits of its output, so this doesn’t actually matter for any practical security reason.
This would necessitate a special prefix (e.g. $2w$ ) to distinguish disarmed bcrypt from vanilla bcrypt that PHP’s password API wouldn’t know what to do with.
- This is a trivial concern, for which the fix is also trivial:
  After password_hash(), modify the prefix with a marker to indicate pre-hashing.
  Before password_verify(), swap the original prefix back in.

There were some other weird arguments (such as “Bcrypt is approved by NIST for FIPS”, which is just plain false).

Why Bcrypt Truncating SHA-512 Doesn’t Matter

If you have a random secret key, HMAC-SHA-512 is a secure pseudorandom function that you can treat as a Random Oracle.

Because it’s HMAC, you don’t have to worry about Length Extension Attacks at all. Therefore, the best known attack strategy is to produce a collision.

The raw binary output of SHA-512 is 64 characters, but may contain NUL characters (which would truncate the hash). To avoid this, we base64-encode the output.

When you base64-encode a SHA-512 hash, the output is 88 characters (due to base64 padding). This is longer than the 72 characters supported by bcrypt, so it will truncate silently after 72 characters.

This is still secure, but to prove this, I need to use math.

First, let’s assume you’re working with an extremely secure, high-entropy password, and might be negatively impacted by this truncation. How bad is the damage in this extreme case?

There are 64 possible characters in the base64 alphabet. That’s tautology, after all.

If you have a string of length 72, for which each character can be one of 64 values, you can represent the total probability space of possible strings as .

If you know that , you can do a little bit of arithmetic and discover this quantity equal to .

As I discussed in my deep dive on the birthday bound, you can take the cube root of this number to find what I call the Optimal Birthday Bound.

This works out to samples in order to find a probability of a single collision.

This simply isn’t going to happen in our lifetimes.

2^-144 is about 17 trillion times less likely than 2^-100.

The real concern is the entropy of the actual password, not losing a few bits from a truncated hash.

After all, even though the outputs of HMAC-SHA512 are indistinguishable from random when you don’t know the HMAC key, the input selection is entirely based on the (probably relatively easy-to-guess) password.

“Why not just use Argon2 or Scrypt?”

Argon2 and scrypt don’t have the bcrypt footguns. You can hash passwords of arbitrary length and not care about NUL characters. They’re great algorithms.

Several people involved in the Password Hashing Competition (that selected Argon2 as its winner) have since lamented the emphasis on memory-hardness over cache-hardness. Cache-hardness is more important for short run-times (i.e., password-based authentication), while memory-hardness is more important for longer run-times (i.e., key derivation).

As Sc00bz explains in the GitHub readme for his bscrypt project:

Cache hard algorithms are better than memory hard algorithms at shorter run times. Basically cache hard algorithms forces GPUs to use 1/4 to 1/16 of the memory bandwidth because of the large bus width (commonly 256 to 1024 bits). Another way to look at it is memory transactions vs bandwidth. Also the low latency of L2 cache on CPUs and the 8 parallel look ups let’s us make a lot of random reads. With memory hard algorithms, there is a point where doubling the memory quarters a GPU attacker’s speed. There then is a point at which a memory hard algorithm will overtake a cache hard algorithm. Cache hard algorithms don’t care that GPUs will get ~100% utilization of memory transactions because it’s already very limiting.

Ironically, bcrypt is cache-hard, while scrypt and the flavors of Argon2 that most people use are not.

Most of my peers just care that you use a password hashing algorithm at all. They don’t particularly care which. The bigger, and more common, vulnerability is not using one of them in the first place.

I’m mostly in agreement with them, but I would prefer that anyone that chooses bcrypt takes steps to disarm its footguns.

Turning Bcrypt Into a KDF

Earlier, I noted that bcrypt is not a password KDF. That doesn’t mean you can’t make one out of bcrypt. Ryan Castellucci is an amazing hacker; they managed to do just that.

To understand why this is difficult, and why Ryan’s hack works, you need to understand what bcrypt actually is.

Bcrypt is a relatively simple algorithm at its heart:

Run the Blowfish key schedule, several times, over both the password and salt.
Encrypt the string "OrpheanBeholderScryDoubt" 64 times in ECB mode using the expanded key from step 1.

Most of the heavy work in bcrypt is actually done in the key schedule; the encryption of three blocks (remember, Blowfish is a 64-bit block cipher) just ensures you need the correct resultant key from the key schedule.

“So how do you get an encryption key out of bcrypt?”

It’s simple: we, uh, hash the S-box.

static void BF_kwk(struct BF_data *data, uint8_t kwk[BLAKE2B_KEYBYTES]) {  BF_word *S = (BF_word *)data->ctx.S;  BF_htobe(S, 4*256);  // it should not be possible for this to fail...  int ret = blake2b_simple(kwk, BLAKE2B_KEYBYTES, S, sizeof(BF_word)*4*256);  assert(ret == 0);  BF_betoh(S, 4*256);}

Using BLAKE2b to hash the S-box from the final Blowfish key expansion yields a key-wrapping key that can be used to encrypt whatever data is being protected.

The only feasible way to recover this key is to provide the correct password and salt to arrive at the same key schedule.

Any attack against the selection of S implies a cryptographic weakness in bcrypt, too. (I’ve already recommended domain separation in a GitHub issue.)

CMYKat

It’s worth remembering that Ryan’s design is a proof-of-concept, not a peer-reviewed design ready for production. Still, it’s a cool hack.

It’s also not the first of its kind (thanks, Damien Miller).

If anyone was actually considering using this design, first, they should wait until it’s been adequately studied. Do not pass Go, do not collect $200.

Additionally, the output of the BLAKE2b hash should be used as the input keying material for, e.g., HKDF. This lets you split the password-based key into multiple application-specific sub-keys without running the password KDF again for each derived key.

Wrapping Up

Although bcrypt is still an excellent cache-hard password hashing function suitable for interactive logins, it does have corner cases that sometimes cause vulnerabilities in applications that misuse it.

If you’re going to use bcrypt, make sure you use bcrypt in line with my recommendations to WordPress: HMAC-SHA-512, base64 encode, then bcrypt.

Here’s a quick proof-of-concept for PHP software:

<?phpdeclare(strict_types=1);class SafeBcryptWrapperPoC{  private $staticKey;  private $cost = 12;  public function __construct(    #[\SensitiveParameter]    string $staticKey,    int $cost = 12  ) {    $this->staticKey = $staticKey;    $this->cost = $cost;  }    /**   * Generate password hashes here   */  public function hash(    #[\SensitiveParameter]    string $password  ): string {    return \password_hash(      $this->prehash($password),      PASSWORD_BCRYPT,      ['cost' => $this->cost]    );  }  /**   * Verify password here   */  public function verify(    #[\SensitiveParameter]    string $password,    #[\SensitiveParameter]    string $hash  ): bool {    return \password_verify(      $this->prehash($password),      $hash    );  }  /**   * Pre-hashing with HMAC-SHA-512 here   *   * Note that this prefers the libsodium base64 code, since   * it's implemented in constant-time   */  private function prehash(    #[\SensitiveParameter]    string $password  ): string {    return \sodium_bin2base64(      \hash_hmac('sha512', $password, $this->staticKey, true),      \SODIUM_BASE64_VARIANT_ORIGINAL_NO_PADDING    );  }}

You can see a modified version of this proof-of-concept on 3v4l, which includes the same demo from the top of this blog post to demonstrate the 72-character truncation bug.

If you’re already using bcrypt in production, you should be cautious with adding this pre-hashing alternative. Having vanilla bcrypt and non-vanilla bcrypt side-by-side could introduce problems that need to be thoroughly considered.

I can safely recommend it to WordPress because they weren’t using bcrypt before. Most of the people reading this are probably not working on the WordPress core.

Addendum (2024-11-28)

More of the WordPress team has chimed in to signal support for vanilla bcrypt, rather than disarming the bcrypt footgun.

The reason?

That would result in maximum compatibility for existing WordPress users who use the Password hashes outside of WordPress, while also not introducing yet-another-custom-hash into the web where it’s not overly obviously necessary, but while still gaining the bcrypt advantages for where it’s possible.
dd32

The hesitance to introduce a custom hash construction is understandable, but the goal I emphasized with bold text is weird and not a reasonable goal for any password storage system.

It’s true that the overwhelming non-WordPress code written in PHP is just using the password hashing API. But that means they aren’t compatible with WordPress today. PHP’s password hashing API doesn’t implement phpass, after all.

In addition to being scope creep for a secure password storage strategy, it’s kind of a bonkers design constraint to expect password hashes be portable. Why are you intentionally exposing hashes unnecessarily?

CMYKat

At this point, it’s overwhelmingly likely that WordPress will choose to not disarm the bcrypt footguns, and will just ship it.

That’s certainly not the worst outcome, but I do object to arriving there for stupid reasons, and that GitHub thread is full of stupid reasons and misinformation.

The most potent source of misinformation also barked orders at me and then tried to dismiss my technical arguments as the concerns of “the hobbyist community”, which was a great addition to my LinkedIn profile.

If WordPress’s choice turns out to be a mistake–that is to say, that their decision for vanilla bcrypt introduces a vulnerability in a plugin or theme that uses their password hashing API for, I dunno, API keys?–at least I can say I tried.

Additionally, WordPress cannot say they didn’t know the risk existed, especially in a courtroom, since me informing them of it is so thoroughly documented (and archived).

CMYKat

Here’s to hoping the risk never actually manifests. Saying “I told you so” is more bitter than sweet in security. Happy Thanksgiving.

Header image: Art by Jim and CMYKat; a collage of some DEFCON photos, as well as Creative Commons photos of Bruce Schneier (inventor of the Blowfish block cipher) and Niels Provos (co-designer of bcrypt, which is based on Blowfish).

#bcrypt #cryptography #passwordHashing #passwords #SecurityGuidance

Key Transparency and the Right to be Forgotten

This post is the first in a new series covering some of the reasoning behind decisions made in my project to build end-to-end encryption for direct messages on the Fediverse.

(Collectively, Fedi-E2EE.)

Although the reasons for specific design decisions should be immediately obvious from reading the relevant specification (and if not, I consider that a bug in the specification), I believe writing about it less formally will improve the clarity behind the specific design decisions taken.

In the inaugural post for this series, I’d like to focus on how the Fedi-E2EE Public Key Directory specification aims to provide Key Transparency and an Authority-free PKI for the Fediverse without making GDPR compliance logically impossible.

CMYKat‘s art, edited by me.

Background

Key Transparency

For a clearer background, I recommend reading my blog post announcing the focused effort on a Public Key Directory, and then my update from August 2024.

If you’re in a hurry, I’ll be brief:

The goal of Key Transparency is to ensure everyone in a network sees the same view of who has which public key.

How it accomplishes this is a little complicated: It involves Merkle trees, digital signatures, and a higher-level protocol of distinct actions that affect the state machine.

If you’re thinking “blockchain”, you’re in the right ballpark, but we aren’t propping up a cryptocurrency. Instead, we’re using a centralized publisher model (per Public Key Directory instance) with decentralized verification.

Add a bit of cross-signing and replication, and you can stitch together a robust network of Public Key Directories that can be queried to obtain the currently-trusted list of public keys (or other auxiliary data) for a given Fediverse user. This can then be used to build application-layer protocols (i.e., end-to-end encryption with an identity key more robust than “trust on first use” due to the built-in audit trail to Merkle trees).

I’m handwaving a lot of details here. The Architecture and Specification documents are both worth a read if you’re curious to learn more.

Harubaki

Right To Be Forgotten

I am not a lawyer, nor do I play one on TV. This is not legal advice. Other standard disclaimers go here.

Okay, now that we’ve got that out of the way, Article 17 of the GDPR establishes a “Right to erasure” for Personal Data.

What this actually means in practice has not been consistently decided by the courts yet. However, a publicly readable, immutable ledger that maps public keys (which may be considered Personal Data) with Actor IDs (which includes usernames, which are definitely Personal Data) goes against the grain when it comes to GDPR.

It remains an open question of there is public interest in this data persisting in a read-only ledger ad infinitum, which could override the right to be forgotten. If there is, that’s for the courts to decide, not furry tech bloggers.

I know it can be tempting, especially as an American with no presence in the European Union, to shrug and say, “That seems like a them problem.” However, if other folks want to be able to use my designs within the EU, I would be remiss to at least consider this potential pitfall and try to mitigate it in my designs.

So that’s exactly what I did.

Almost Contradictory

At first glance, the privacy goals of both Key Transparency and the GDPR’s Right To Erasure are at odds.

One creates an immutable, append-only history.
The other establishes a right for EU citizens’ history to be selectively censored, which means history has to be mutable.

However, they’re not totally impossible to reconcile.

An untested legal theory circulating around large American tech companies is that “crypto shredding” is legally equivalent to erasure.

Crypto shredding is the act of storing encrypted data, and then when given a legal takedown request from an EU citizen, deleting the key instead of the data.

This works from a purely technical perspective: If the data is encrypted, and you don’t know the key, to you it’s indistinguishable from someone who encrypted the same number of NUL bytes.

In fact, many security proofs for encryption schemes are satisfied by reaching this conclusion, so this isn’t a crazy notion.

Is Crypto Shredding Plausible?

In 2019, the European Parliamentary Research Service published a lengthy report titled Blockchain and the General Data Protection Regulation which states the following:

Before any examination of whether blockchain technology is capable of complying with Article 17 GDPR; it must be underscored that the precise meaning of the term ‘erasure’ remains unclear.
Article 17 GDPR does not define erasure, and the Regulation’s recitals are equally mum on how this term should be understood. It might be assumed that a common-sense understanding of this terminology ought to be embraced. According to the Oxford English Dictionary, erasure means ‘the removal or writing, recorded material, or data’ or ‘the removal of all traces of something: obliteration’.494
From this perspective, erasure could be taken to equal destruction. It has, however, already been stressed that the destruction of data on blockchains, particularly these of a public and permissionless nature, is far from straightforward.
There are, however, indications that the obligation inherent to Article 17 GDPR does not have to be interpreted as requiring the outright destruction of data. In Google Spain, the delisting of information from research results was considered to amount to erasure. It is important to note, however, that in this case, this is all that was requested of Google by the claimant, who did not have control over the original data source (an online newspaper publication). Had the claimant wished to obtain the outright destruction of the relevant data it would have had to address the newspaper, not Google. This may be taken as an indication that what the GDPR requires is that the obligation resting on data controllers is to do all they can to secure a result as close as possible to the destruction of their data within the limits of [their] own factual possibilities.
Dr Michèle Finck, Blockchain and the General Data Protection Regulation, pp. 75-76

From this, we can kind of intuit that the courts aren’t pedantic: The cited Google Spain case was satisfied by merely delisting the content, not the erasure of the newspaper’s archives.

The report goes on to say:

As awareness regarding the tricky reconciliation between Article 17 GDPR and distributed ledgers grows, a number of technical alternatives to the outright destruction of data have been considered by various actors. An often-mentioned solution is that of the destruction of the private key, which would have the effect of making data encrypted with a public key inaccessible. This is indeed the solution that has been put forward by the French data protection authority CNIL in its guidance on blockchains and the GDPR. The CNIL has suggested that erasure could be obtained where the keyed hash function’s secret key is deleted together with information from other systems where it was stored for processing.
Dr Michèle Finck, Blockchain and the General Data Protection Regulation, pp. 76-77

That said, I cannot locate a specific court decision that affirms that crypto erasure is legally sufficient for complying with data erasure requests (nor any that affirm that it’s necessary).

I don’t have a crystal ball that can read the future on what government compliance will decide, nor am I an expert in legal matters.

Given the absence of a clear legal framework, I do think it’s totally reasonable to consider crypto-shredding equivalent to data erasure. Most experts would probably agree with this. But it’s also possible that the courts could rule totally stupidly on this one day.

Therefore, I must caution anyone that follows a similar path: Do not claim GDPR compliance just because you implement crypto-shredding in a distributed ledger. All you can realistically promise is that you’re not going out of your way to make compliance logically impossible. All we have to go by are untested legal hypotheses, and very little clarity (even if the technologists are near-unanimous on the topic!).

Towards A Solution

With all that in mind, let’s start with “crypto shredding” as the answer to the GDPR + transparency log conundrum.

This is only the start of our complications.

CMYKat

Protocol Risks Introduced by Crypto Shredding

Before the introduction of crypto shredding, the job of the Public Key Directory was simple:

Receive a protocol message.
Validate the protocol message.
Commit the protocol message to a transparency log (in this case, Sigsum).
Retrieve the protocol message whenever someone requests it to independently verify its inclusion.
Miscellaneous other protocol things (cross-directory checkpoint commitment, replication, etc.).

Point being: there was very little that the directory could do to be dishonest. If they lied about the contents of a record, it would invalidate the inclusion proofs of every successive record in the ledger.

In order to make a given record crypto-shreddable without breaking the inclusion proofs for every record that follows, we need to commit to the ciphertext, not the plaintext. (And then, when a takedown request comes in, wipe the key.)

Now, things are quite more interesting.

Do you…

…Distribute the encryption key alongside the ciphertext and let independent third parties decrypt it on demand?
…OR…
Decrypt the ciphertext and serve plaintext through the public API, keeping the encryption key private so that it may be shredded later?

The first option seems simple, but runs into governance issues: How do you claim the data was crypto-shredded if countless individuals have a copy of the encryption key, and can therefore recover the plaintext from the ciphertext?

I don’t think that would stand up in court.

CMYKat

Clearly, your best option is the second one.

Okay, so how does an end user know that the ciphertext that was committed to the transparency ledger decrypts to the specific plaintext value served by the Public Key Directory? How do users know it’s not lying?

Quick aside: This question is also relevant if you went with the first option and used a non-committing AEAD mode for the actual encryption scheme.
In that scenario, a hostile nation state adversary could pressure a Public Key Directory to selectively give one decryption key to targeted users, and another to the rest of the Internet, in order to perform a targeted attack against citizens they’d rather didn’t have civil rights.
My entire goal with introducing key transparency to my end-to-end encryption proposal is to prevent these sorts of attacks, not enable them.

There are a lot of avenues we could explore here, but it’s always worth outlining the specific assumptions and security goals of any design before you start perusing the literature.

Assumptions

This is just a list of things we assume are true, and do not need to prove for the sake of our discussion here today. The first two are legal assumptions; the remainder are cryptographic.

Ask your lawyer if you want advice about the first two assumptions. Ask your cryptographer if you suspect any of the remaining assumptions are false.

Crypto-shredding is a legally valid way to provide data erasure (as discussed above).
EU courts will consider public keys to be Personal Data.
The SHA-2 family of hash functions is secure (ignoring length-extension attacks, which won’t matter for how we’re using them).
HMAC is a secure way to build a MAC algorithm out of a secure hash function.
HKDF is a secure KDF if used correctly.
AES is a secure 128-bit block cipher.
Counter Mode (CTR) is a secure way to turn a block cipher into a stream cipher.
AES-CTR + HMAC-SHA2 can be turned into a secure AEAD mode, if done carefully.
Ed25519 is a digital signature algorithm that provides strong security against existent forgery under a chosen-message attack (SUF-CMA).
Argon2id is a secure, memory-hard password KDF, when used with reasonable parameters. (You’ll see why in a moment.)
Sigsum is a secure mechanism for building a transparency log.

This list isn’t exhaustive or formal, but should be sufficient for our purposes.

Security Goals

The protocol messages stored in the Public Key Directory are accompanied by a Merkle tree proof of inclusion. This makes it append-only with an immutable history.
The Public Key Directory cannot behave dishonestly about the decrypted plaintext for a given ciphertext without clients detecting the deception.
Whatever strategy we use to solve this should be resistant to economic precomputation and brute-force attacks.

Can We Use Zero-Knowledge Proofs?

At first, this seems like an ideal situation for a succinct, non-interactive zero-knowledge proof.

After all, you’ve got some secret data that you hold, and you want to prove that a calculation is correct without revealing the data to the end user. This seems like the ideal setup for Schnorr’s identification protocol.

CMYKat

Unfortunately, the second assumption (public keys being considered Personal Data by courts, even though they’re derived from random secret keys) makes implementing a Zero-Knowledge Proof here very challenging.

First, if you look at Ed25519 carefully, you’ll realize that it’s just a digital signature algorithm built atop a Schnorr proof, which requires some sort of public key (even an ephemeral one) to be managed.

Worse, if you try to derive this value solely from public inputs (rather than creating a key management catch-22), the secret scalar your system derives at will have been calculated from the user’s Personal Data–which only strengthens a court’s argument that the public key is therefore personally identifiable.

CMKat

There may be a more exotic zero-knowledge proof scheme that might be appropriate for our needs, but I’m generally wary of fancy new cryptography.

Here are two rules I live by in this context:

If I can’t get the algorithms out of the crypto module for whatever programming language I find myself working with, it may as well not even exist.
- Corollary: If libsodium bindings are available, that counts as “the crypto module” too.
If a developer needs to reach for a generic Big Integer library (e.g., GMP) for any reason in the course of implementing a protocol, I do not trust their implementation.

Unfortunately, a lot of zero-knowledge proof designs fail one or both of these rules in practice.

(Sorry not sorry, homomorphic encryption enthusiasts! The real world hasn’t caught up to your ideas yet.)

What About Verifiable Random Functions (VRFs)?

It may be tempting to use VRFs (i.e., RFC 9381), but this runs into the same problem as zero-knowledge proofs: we’re assuming that an EU court would deem public keys Personal Data.

But even if that assumption turns out false, the lifecycle of a protocol message looks like this:

User wants to perform an action (e.g., AddKey).
Their client software creates a plaintext protocol message.
Their client software generates a random 256-bit key for each potentially-sensitive attribute, so it can be shredded later.
Their client software encrypts each attribute of the protocol message.
The ciphertext and keys are sent to the Public Key Directory.
For each attribute, the Public Key Directory decrypts the ciphertext with the key, verifies the contents, and then stores both. The ciphertext is used to generate a commitment on Sigsum (signed by the Public Key Directory’s keypair).
The Public Key Directory serves plaintext to requestors, but does not disclose the key.
In the future, the end user can demand a legal takedown, which just wipes the key.

Let’s assume I wanted to build a VRF out of Ed25519 (similar to what Signal does with VXEdDSA). Now I have a key management problem, which is pretty much what this project was meant to address in the first place.

VRFs are really cool, and more projects should use them, but I don’t think they will help me.

CMYKat

Soatok’s Proposed Solution

If you want to fully understand the nitty-gritty implementation details, I encourage you to read the current draft specification, plus the section describing the encryption algorithm, and finally the plaintext commitment algorithm.

Now that we’ve established all that, I can begin to describe my approach to solving this problem.

First, we will encrypt each attribute of a protocol message, as follows:

For subkey derivation, we use HKDF-HMAC-SHA512.
For encrypting the actual plaintext, we use AES-256-CTR.
For message authentication, we use HMAC-SHA512.
Additional associated data (AAD) is accepted and handled securely; i.e., we don’t use YOLO as a hash construction.

This prevents an Invisible Salamander attack from being possible.

This encryption is performed client-side, by each user, and the symmetric key for each attribute is shared with the Public Key Directory when publishing protocol messages.
If they later issue a legal request for erasure, they can be sure that the key used to encrypt the data they previously published isn’t secretly the same key used by every other user’s records.
They always know this because they selected the key, not the server. Furthermore, everyone can verify that the hash published to the Merkle tree matches a locally generated hash of the ciphertext they just emitted.
This provides a mechanism to keep everyone honest. If anything goes wrong, it will be detected.

Next, to prevent the server from being dishonest, we include a plaintext commitment hash, which is included as part of the AAD (alongside the attribute name).

(Implementing crypto-shredding is straightforward: simply wipe the encryption keys for the attributes of the records in scope for the request.)

If you’ve read this far, you’re probably wondering, “What exactly do you mean by plaintext commitment?”

Art by Scruff.

Plaintext Commitments

The security of a plaintext commitment is attained by the Argon2id password hashing function.

By using the Argon2id KDF, you can make an effective trapdoor that is easy to calculate if you know the plaintext, but economically infeasible to brute-force attack if you do not.

However, you need to do a little more work to make it safe.

Harubaki

The details here matter a lot, so this section is unavoidably going to be a little dense.

Pass the Salt?

Argon2id expects both a password and a salt.

If you eschew the salt (i.e., zero it out), you open the door to precomputation attacks (see also: rainbow tables) that would greatly weaken the security of this plaintext commitment scheme.

You need a salt.

If you generate the salt randomly, this commitment property isn’t guaranteed by the algorithm. It would be difficult, but probably not impossible, to find two salts (, ) such that .

Deriving the salt from public inputs eliminates this flexibility.

By itself, this reintroduces the risk of making salts totally deterministic, which reintroduces the risk of precomputation attacks (which motivated the salt in the first place).

If you include the plaintext in this calculation, it could also create a crib that gives attackers a shortcut for bypassing the cost of password hashing.

Furthermore, any two encryptions operations that act over the same plaintext would, without any additional design considerations, produce an identical value for the plaintext commitment.

CMYKat

Public Inputs for Salt Derivation

The initial proposal included the plaintext value for Argon2 salt derivation, and published the salt and Argon2 output next to each other.

Hacker News comex pointed out a flaw with this technique, so I’ve since revised how salts are selected to make them independent of the plaintext.

The public inputs for the Argon2 salt are now:

The version identifier prefix for the ciphertext blob.
The 256-bit random value used as a KDF salt (also stored in the ciphertext blob).
A recent Merkle tree root.
The attribute name (prefixed by its length).

These values are all hashed together with SHA-512, and then truncated to 128 bits (the length required by libsodium for Argon2 salts).

This salt is not stored, but can deterministically be calculated from public information.

Crisis Averted?

This sure sounds like we’ve arrived at a solution, but let’s also consider another situation before we declare our job done.

High-traffic Public Key Directories may have multiple users push a protocol message with the same recent Merkle root.

This may happen if two or more users query the directory to obtain the latest Merkle root before either of them publish their updates.

Later, if both of these users issue a legal takedown, someone might observe that the recent-merkle-root is the same for two messages, but their commitments differ.

Is this enough leakage to distinguish plaintext records?

In my earlier design, we needed to truncate the salt and rely on understanding the birthday bound to reason about its security. This is no longer the case, since each salt is randomized by the same random value used in key derivation.

Choosing Other Parameters

As mentioned a second ago, we set the output length of the Argon2id KDF to 32 bytes (256 bits). We expect the security of this KDF to exceed , which to most users might as well be infinity.

With apologies to Filippo.

The other Argon2id parameters are a bit hand-wavey. Although the general recommendation for Argon2id is to use as much memory as possible, this code will inevitably run in some low-memory environments, so asking for several gigabytes isn’t reasonable.

For the first draft, I settled on 16 MiB of memory, 3 iterations, and a parallelism degree of 1 (for widespread platform support).

Plaintext Commitment Algorithm

With all that figured out, our plaintext commitment algorithm looks something like this:

Calculate the SHA512 hash of:
- A domain separation constant
- The header prefix (stored in the ciphertext)
- The randomness used for key-splitting in encryption (stored in the ciphertext)
- Recent Merkle Root
- Attribute Name Length (64-bit unsigned integer)
- Attribute Name
Truncate this hash to the rightmost 16 bytes (128 bits). This is the salt.
Calculate Argon2id over the following inputs concatenated in this order, with an output length of 32 bytes (256 bits), using the salt from step 2:
- Recent Merle Root Length (64-bit unsigned integer)
- Recent Merkle Root
- Attribute Name Length (64-bit unsigned integer)
- Attribute Name
- Plaintext Length (64-bit unsigned integer)
- Plaintext

The output (step 3) is included as the AAD in the attribute encryption step, so the authentication tag is calculated over both the randomness and the commitment.

To verify a commitment (which is extractable from the ciphertext), simply recalculate the commitment you expect (using the recent Merkle root specified by the record), and compare the two in constant-time.

If they match, then you know the plaintext you’re seeing is the correct value for the ciphertext value that was committed to the Merkle tree.

If the encryption key is shredded in the future, an attacker without knowledge of the plaintext will have an enormous uphill battle recovering it from the KDF output (and the salt will prove to be somewhat useless as a crib).

Caveats and Limitations

Although this design does satisfy the specific criteria we’ve established, an attacker that already knows the correct plaintext can confirm that a specific record matches it via the plaintext commitment.

This cannot be avoided: If we are to publish a commitment of the plaintext, someone with the plaintext can always confirm the commitment after the fact.

CMYKat

Whether this matters at all to the courts is a question for which I cannot offer any insight.

Remember, we don’t even know if any of this is actually necessary, or if “moderation and platform safety” is a sufficient reason to sidestep the right to erasure.
If the courts ever clarify this adequately, we can simply publish the mapping of Actor IDs to public keys and auxiliary data without any crypto-shredding at all.

Trying to attack it from the other direction (download a crypto-shredded record and try to recover the plaintext without knowing it ahead of time) is attack angle we’re interested in.

Herd Immunity for the Forgotten

Another interesting implication that might not be obvious: The more Fediverse servers and users publish to a single Public Key Directory, the greater the anonymity pool available to each of them.

Consider the case where a user has erased their previous Fediverse account and used the GDPR to also crypto-shred the Public Key Directory entries containing their old Actor ID.

To guess the correct plaintext, you must not only brute-force guessing possible usernames, but also permute your guesses across all of the instances in scope.

The more instances there are, the higher the cost of the attack.

CMYKat

Recap

I tasked myself with designing a Key Transparency solution that doesn’t make complying with Article 17 of the GDPR nigh-impossible. To that end, crypto-shredding seemed like the only viable way forward.

A serialized record containing ciphertext for each sensitive attribute would be committed to the Merkle tree. The directory would store the key locally and serve plaintext until a legal takedown was requested by the user who owns the data. Afterwards, the stored ciphertext committed to the Merkle tree is indistinguishable from random for any party that doesn’t already know the plaintext value.

I didn’t want to allow Public Key Directories to lie about the plaintext for a given ciphertext, given that they know the key and the requestor doesn’t.

After considering zero-knowledge proofs and finding them to not be a perfect fit, I settled on designing a plaintext commitment scheme based on the Argon2id password KDF. The KDF salts can be calculated from public inputs.

Altogether, this meets the requirements of enabling crypto-shredding while keeping the Public Key Directory honest. All known attacks for this design are prohibitively expensive for any terrestrial threat actors.

As an added bonus, I didn’t introduce anything fancy. You can build all of this with the cryptography available to your favorite programming language today.

CMYKat

Closing Thoughts

If you’ve made it this far without being horribly confused, you’ve successfully followed my thought process for developing message attribute shreddability in my Public Key Directory specification.

This is just one component of the overall design proposal, but one that I thought my readers would enjoy exploring in greater detail than the specification needed to capture.

Header art: Harubaki, CMYKat.

(This post was updated on 2024-11-22 to replace the incorrect term “PII” with “personal data”. Apologies for the confusion!)

#Argon2 #crypto #cryptography #E2EE #encryption #FederatedPKI #fediverse #passwordHashing #symmetricCryptography

Key Transparency and the Right to be Forgotten

"Crypto means cryptography," the dhole insists.

Brute force password cracking takes longer than ever, according to Hive Systems' latest audit. #PasswordCracking #BruteForceAttacks #HiveSystems #PasswordHashing #CyberSecurity #bcrypt #MD5
thttps://www.blogger.com/blog/post/edit/2393063772924596666/7373948891148112675

@valorin

Indeed! I know I'm preaching to the choir, but for those playing along at home:

Selecting a bcrypt cost falls into two classes of use case:

individual UX (personal/per-user), and
aggregate UX (for non-trivial numbers of concurrent auths and/or thundering herds / reauth storms).

For self-contained / standalone, and smaller user populations, the first case is primary. Admins can maximize bcrypt cost based solely on whether their (small group of users) can tolerate X milliseconds of delay - without regard to how users will impact each other. Bcrypt cost 13 - or even higher - can be feasible here.

In the second case (non-trivial user populations) - the game is to balance both use cases - individual UX, but adding in auth-per-second statistics, thundering herd scenarios, sufficient hardware budget to support robust user password protection, etc.

And if you're maintaining a general framework that needs to support either use case ... the best thing for the ecosystem may be to guide the downstream admin towards the best choice for them (explanatory comment in a config file, interactive + informed selection during an install, etc.).

The nice thing about most bcrypt implementations is that they're forward- and backward-compatible, supporting multiple costs simultaneously (with all new users, and all password resets, following the new default). So for larger user populations, the admin can shape their aggregate load over time, to manage performance "forward" to increase bcrypt cost apace with CPU speed increases, to keep per-auth speed as close to that "500ms to 1 second" UX window as possible over time.

#bcrypt #passwords #passwordhashing

@timwolla

Bitwarden’s been throwing warnings on my phone telling me to scale back my hashing parameters because they might fail on this device.
Of course, now that I post about it, it’s not doing it so I can’t screenshot it…
#Bitwarden #PasswordHashing #Infosec

Is there a hash function that's more expensive for an attacker than for the server?

https://crypto.stackexchange.com/questions/104251/is-there-a-hash-function-thats-more-expensive-for-an-attacker-than-for-the-serv

#passwordhashing #keyderivation #hash

How is the Unix / PostgreSQL crypt function a trapdoor function?

https://crypto.stackexchange.com/questions/103766/how-is-the-unix-postgresql-crypt-function-a-trapdoor-function

#passwordhashing #terminology #database #trapdoor

Ever since the famous “Open Sesame” line from One Thousand and One Nights, humanity was doomed to suffer from the scourge of passwords.

Courtesy of SwiftOnSecurity

Even in a world where we use hardware tokens with asymmetric cryptography to obviate the need for passwords in modern authentication protocols, we’ll still need to include “something you know” for legal and political reasons.

In the United States, we have the Fifth Amendment to the US Constitution, which prevents anyone from being forced to testify against oneself. This sometimes includes revealing a password, so it is imperative that passwords remain a necessary component of some encryption technologies to prevent prosecutors from side-stepping one’s Fifth Amendment rights. (Other countries have different laws about passwords. I’m not a lawyer.)

If you’ve ever read anything from the EFF, you’re probably keenly aware of this, but the US government loves to side-step citizens’ privacy rights in a court of law. They do this not out of direct malice towards civil rights (generally), but rather because it makes their job easier. Incentives rule everything around me.

Thus, even in a passwordless future, anyone who truly cares about civil liberties will not want to dispense with them entirely. Furthermore, we’ll want these password-based technologies to pass the Mud Puddle test, so that we aren’t relying on large tech companies to protect us from government backdoors.

Given all this, most security professionals will agree that passwords suck. Not only are humans bad sources of randomness, but we’re awful at memorizing a sequence of characters or words for every application, website, or operating system that demands a password.

None of what I’ve written so far is the least bit novel or interesting. But here’s a spicy take:

The only thing that sucks worse than passwords is the messaging around password-based cryptographic functions.

Password-Based Cryptographic Functions

That isn’t a term of the art, by the way. I’m kind of coining it as a superset that includes both Password-Based Key Derivation Functions and Password Hashing Functions.

You might be wondering, “Aren’t those two the same thing?” No, they are not. I will explain in a moment.

The intersection of human-memorable secrets (passwords) and cryptographic protocols manifests in a lot of needless complexity and edge-cases that, in order to sufficiently explain anything conversationally, will require me to sound either drunk or like a blue deck player in Magic: The Gathering.

Counterspell!
Art: CMYKat

To that end, what I’m calling Password-Based Cryptographic Functions (PBCFs) includes all of the following:

Password-Hashing Functions
- Bcrypt
Password-Based Key Derivation Functions
- Scrypt
- Argon2
- PBKDF2
- bscrypt
Balanced Password Authenticated Key Exchanges
- CPace
Augmented Password Authenticated Key Exchanges
- SRP
- AuCPace
- BS-SPEKE
Doubly-Augmented Password Authenticated Key Exchanges
- OPAQUE
- Double BS-SPEKE

If you tried to approach categorization starting with simply “Key Derivation Functions”, you’d quickly run into a problem: What about HKDF? Or any of the NIST SP 800-108 KDFs for that matter?

If you tighten it to “Password-Based KDFs” or “Key-stretching functions”, that doesn’t include bcrypt. Bcrypt is a password hashing function, but it’s not suitable for deriving secret keys. I’ve discussed these nuances previously.

And then you have some password KDF and/or hashing algorithms that are memory-hard, some that are cache-hard.

And then some Password Authenticated Key Exchanges (PAKEs) are quantum-annoying, while others are not.

Art: CMYKat

To make heads or tails of the different algorithms, their properties, and when and where you should use them, you’d either need to secure postdoc funding for cryptography research or spend a lot of time reading and watching passwords-focused conference talks.

It helps if you’re friends with a Password Hashing Competition judge.
(Selfie taken by Sc00bz (left) at DEFCON 30 in the Quantum Village.)

So let’s talk about these different algorithms, why they exist to begin with, and some of their technical details.

Art: Harubaki

Why Does Any of This Matter?

The intersection of passwords and cryptography is one of the foundations of modern society, and one that most Internet users experience everywhere they go.

You’ll know you have succeeded in designing a cryptographic algorithm when the best way to attack it is to try every possible key. This is called an exhaustive search, or a brute-force attack, depending on the disposition of the author.

Remember earlier when I said passwords suck because humans are bad at creating or remembering strong, unique passwords for every website they visit?

Well, it turns out, that some passwords are extremely common. And even worse, people who choose common passwords tend to reuse them everywhere.

This presented a challenge to early web applications: In order to have authenticated users, you need to collect a password from the user and check it on subsequent visits. If you ever get hacked, then it’s likely that all of your users will be impacted on other websites they used the same passwords for.

Including their bank accounts.

So the challenge for any password-based scheme is simply stated: How can you safely authenticate users based on a secret they know?

Enter: Hashing

Storing a hash of a password is an obvious solution to this predicament. Early solutions involved using the user’s password as an encryption key, rather than building atop cryptographic hash functions.

However, it’s important not to call it “password encryption”.

The reason isn’t merely pedantic; it’s meant to prevent a disaster from repeating: Adobe famously encrypted passwords with Triple-DES instead of hashing them.

In early UNIX-based operating systems, your password hashes were stored in a file called /etc/passwd, along with other user account data. These days, your actual password hashes are kept in a second file, /etc/shadow, which is only readable by root.

Unfortunately, the tool used to create password hashes was called crypt, so the “encryption” lingo stuck.

Art: CMYKat

When Hashing Isn’t Enough

Although encrypting passwords is bad, using a fast cryptographic hash function (e.g. SHA256 or BLAKE2) is also bad.

LinkedIn learned this lesson the hard way in 2012, when an attacker leaked 117 million users’ hashed passwords. It turns out, they used SHA1 as their password hashing function.

Hash functions are deterministic: If you feed them the same input, you will always get the same output.

When your password hashing strategy is simply “just use SHA1”, two users with the same password will have an identical password hash.

People are bad at creating, or remembering, unpredictable passwords.

When you combine these observations, there are some extremely obvious candidates (123456, password, etc.) to prioritize when guessing.

Additionally, you could precompute these hashes once and build a reverse index of the passwords that hash to a specific output. This is called a rainbow table.

Unlike most of symmetric cryptography, where the goal ends at forcing the attacker to perform an exhaustive brute-force search of every possible key, password-based cryptography has to go the extra mile to prevent weak secrets from being compromised; a very tall order for anyone to contend with.

Towards Modern Password Hashing

None of this was a surprise to security experts in 2012. There were a couple generally known best practices since I first learned programming in the 2000s:

Salt your passwords (to mitigate rainbow tables)
Use an iterated hashing scheme, rather than a fast hash (to make each step of an exhaustive search slower)

PBKDF2, the first widely used password hashing and key derivation function, did exactly that:

PBKDF2 was designed to support randomly-generated per-user salts.
It used HMAC in a loop to force attackers to spend extra CPU cycles per guess. If an attacker could guess 10 million passwords per second against SHA-256, then using 100,000 iterations of PBKDF2 slowed them down to less than 100 guesses per second (due to HMAC calling the hash function twice).

And that might sound like the end of the story (“PBKDF2 wins”), but the password cracking community is extremely clever, so it’s only just begun.

Parallel Attacks and GPUs

Since your CPU can only guess a few hashes per second, PBKDF2 is a thorny problem to contend with.

However, there’s an easy way for attackers to use commodity hardware to bypass this limitation: Graphics cards are specially designed to perform a lot of low-memory calculations in parallel.

If you can write your password cracking code to leverage the benefits of GPUs instead of CPUs, then PBKDF2 becomes easy potatoes.

The other secure password hashing algorithm in use at the time, bcrypt, had only a linear improvement over PBKDF2’s security against GPU attacks.

Memory-Hard Password Hashing Algorithms

In 2009, Colin Percival proposed scrypt to mitigate this risk. Scrypt (pronounced “ess crypt”) builds atop PBKDF2-SHA256 by hashing psuedorandom regions of memory with Salsa20/8 (hence the S in Scrypt) in a way that makes it difficult to precompute.

Scrypt hashes require large amounts of memory to compute (at least 16 megabytes, in the recommended minimum configuration, versus the few kilobytes of incumbent password hashes).

While GPUs (and FPGAs, and ASICs, and …) are great for cracking password hashes that use CPU-bounded algorithms, algorithms that access large amounts of memory are difficult to attack effectively.

It’s still possible for clever programmers to work around this memory/bandwidth limitation, but to do so, you must compute more operations, which makes it not worthwhile.

Cryptographers and the password-cracking community call this expensive time-memory trade-off memory hardness. In 2016, it was determined that scrypt is maximally memory-hard.

Password Hashing Competition

The Password Hashing Competition (PHC) was kicked off by JP Aumasson in 2012 to develop a new standard for password hashing and password-based key derivation.

In 2015, they selected Argon2 as its recommendation, with four finalists receiving special recognition (Catena, Lyra2, Makwa, and yescrypt–which is based on scrypt).

Cache-Hardness

In the years since the PHC concluded, the advances in password cracking techniques have not slowed down.

One of the PHC judges recently proposed a new algorithm called bscrypt which purports a property called “cache hard” rather than “memory hard”. The reasoning is that, for shorter run times, memory bandwidth and small CPU caches is a tighter bottleneck than overall memory requirements.

In fact, there’s an Argon2 variant (Argon2ds) which offers cache-hardness.

Two Hard Choices

So which of the two should you care about, as a defender?

If you’re validating password hashes in an online service, a cache-hard algorithm might make more sense. You’re incentivized to have shorter server-side run times to avoid DoS attacks, so the benefits of cache-hardness seem more impactful than memory-hardness.

If you’re deriving a sensitive cryptography key from a password and can afford to take several seconds of wall-clock time, a memory-hard algorithm is likely to be a better fit for your threat model.

(Or, like, run both a cache-hard and memory-hard algorithm and use a fast KDF, such as HKDF, to mix the two into your final cryptography key.)

Of course, the story doesn’t end here. Password Hashing and Password-Based Key Derivation continues to mature as the password cracking community discovers new attacks and engineers defenses against them.

Art: ScruffKerfluff

A Strange Game; The Only Winning Move is to PAKE

Password hashing is a defense-in-depth against a system compromise that enables attackers to perform offline attacks against a cryptographic output to determine a user’s password.

However, for many applications, the bigger question is, “Why even play this game in the first place?”

Password-Authenticated Key Exchanges

A password authenticated key exchange (PAKE) algorithm is an interactive protocol to establish keys based on at least one party’s knowledge of a password.

PAKEs are what enable you to log into encrypted WiFi connections and encrypt your traffic. PAKEs prevent you from having to reveal the password (or a password-equivalent value, such as a password hash) to the other party.

Although there are a lot of proposed PAKE algorithms, the one that most people implemented was SRP (“Secure Remote Password”), which was intuitively a lot like Diffie-Hellman but also not Diffie-Hellman (since SRP used addition).

For a good teardown on SRP, Matthew Green’s blog is highly recommended.

There are a few real-world cryptography problems with using SRP:

You need to use a special kind of prime number for your protocol. The standard Diffie-Hellman moduli are not suitable for SRP; you want a safe prime (i.e. a number of the form N = 2q + 1, where q is also prime).
One of the steps in SRP, client-side, is to compute A = g^a (mod N). Here, A is a public value while a is a secret (usually the SHA1 hash of the username and pasword).
If your software implementation of modular exponentiation (a^x mod P) isn’t constant-time, you leak a, which is a password-equivalent value.

Additionally, it’s not possible to use SRP with elliptic curve groups. (Matthew Green’s post I linked above covers this in detail!)

Thus, SRP is generally considered to be deprecated by security experts, in favor of other PAKEs.

IETF, CFRG, PAKE Selection

As early as IETF 104, the Crypto Forum Research Group (CFRG) began exploring different PAKE algorithms to select for standardization.

One of the motivators for this work is that the WiFi alliance had shoehorned a new PAKE called Dragonfly into their WPA3, which turned out to be badly designed.

The CFRG’s candidate algorithms and reviews were publicly tracked on GitHub, if you’re curious.

TL;DR – They settled on recommending OPAQUE as an augmented PAKE and CPace as a balanced PAKE.

Sc00bz has done multiple talks at security conferences over the years to talk about PAKEs:

Quantum Annoyance

The PAKEs we’ve discussed involved a cyclic group (and the newer ones involved an elliptic curve group). The security of these schemes is dependent on the hardness of a discrete logarithm problem.

If a cryptography relevant quantum computer (CRQC) is developed, discrete logarithm problems will cease to be hard.

Some PAKEs fall apart the moment a discrete log is solvable. This is par for the course for Diffie-Hellman and elliptic curve cryptography.

Others require an attacker to solve a discrete log for every guess in an offline attack (after capturing a successful exchange). This makes them annoying for quantum attackers.

While they don’t provide absolute security like post-quantum cryptography, they do make attackers armed with a CRQC work harder.

OPAQUE isn’t quantum annoying. Simply observe all of the packets from making an online guess, solve the DLP offline, and you’re back to the realm of classical offline guessing.

Art: Swizz

The State of the Art

At this point, you may feel somewhat overwhelmed by all of this information. It’s very tempting to simplify all of this historical context and technical detail.

You might be telling yourself, “Okay, Scrypt, Argon2, CPace, OPAQUE. Got it. That’s all I need to remember. Everything else is questionable at best. Ask a cryptographer. I’m bailing out, yo!”

But the story gets worse on two fronts: Real-world operational requirements, and compliance.

Your Hands Are Tied, Now Swim

If you’re selling a product or service to the US government that uses cryptography, you need to first clear a bare minimum bar called FIPS 140.

FIPS 140 is opinionated about which algorithms you use. For password hashing and password-based key derivation, FIPS defers to NIST SP 800-209, which means ~~you’re only permitted to use PBKDF2~~ in any FIPS modules (with a SHA2- family hash function). (Update: See below.)

So all of the information about memory-hard and cache-hard password hashing algorithms? This is useless for you if you ever try to sell to the US government.

An open question is whether Scrypt is FIPSable due to it building atop PBKDF2. To be blunt: I’m not sure. Ask a NIST Cryptographic Module Validation Program lab for their opinion.

Update (2022-01-07): A Brief Correction

Thanks to FreakLegion for pointing this out.

According to NIST SP 800-63B, memory-hard hash functions are recommended.

Examples of suitable key derivation functions include Password-based Key Derivation Function 2 (PBKDF2) [SP 800-132] and Balloon [BALLOON]. A memory-hard function SHOULD be used because it increases the cost of an attack.
NIST SP 800-63B

Don’t use Balloon, though. It’s under-specified.

FreakLegion indicates in their Hacker News comment that scrypt and yescrypt are both acceptable.

End Update

In the realm of PAKEs, both CPace and OPAQUE are frameworks that can be used with multiple elliptic curve groups.

Both frameworks support NIST P-256, but CPace supports X25519 and OPAQUE supports Ristretto255; these are currently not FIPSable curves.

“I don’t care about FIPS. Argon2 all the things!”

JavaScript runtimes don’t provide a built-in Argon2 implementation. This poses several challenges.

Do you care about iOS users? Lockdown mode prevents you from using WebAssembly, even in a browser extension.
Trying to polyfill scrypt or Argon2 in a scripting language is a miserable experience. We’re talking quadratic performance penalties here. Several minutes of CPU time to calculate a hash that is far too weak to be acceptable in any context.

Consequently, not a single password manager I’ve evaluated that provides a browser extension uses a memory-hard algorithm for deriving encryption keys.

Update: I’ve been told Dashlane uses Argon2.

Art: CMYKat

This Is Frustrating

When I write about cryptography, my goal is always to be clear, concise, and helpful. I want whoever reads my writing to, regardless of their level of expertise, walk away more comfortable with the subject matter.

At minimum, I want you to be able to intuit when you need to ask an expert for help, and have a slightly better understanding of how to word your questions to get the answer you need. In an ideal world, you’d even be able to spot the same kind of weaknesses I do in cryptography designs and implementations after reading enough of this blog.

I can’t do that with password-based cryptography. The use-cases and threat models are too varied to make a clear-cut recommendation, and it’s too complicated to parcel out in a way that doesn’t feel reductive or slightly contradictory. This leads to too many caveats or corner cases.

Passwords suck.

If you ask a password cracking expert and describe your use case, you’ll likely get an opinionated and definitive answer. But every time I’ve asked generally about this subject, without a specific use case in mind, I got an opinionated answer followed by a long chain of caveats and alternatives.

With that in mind, here’s a few vague guidelines that are up-to-date as of the end of 2022.

Art: Harubaki

Recommendations for Password-Based Cryptography in 2023

Password Hashing

If you’re authenticating users in an online service (storing the hashes in a database), use any of the following algorithms:

bcrypt with a cost factor appropriate to take about 100 milliseconds to compute on your server hardware (typically at least 12)
scrypt with N = 32768, r = 8, p = 1 and a random salt at least 128 bits in length (256 preferred)
Argon2id with a memory cost of 64 MiB, ops cost of 3, and parallelism of 1

If you’re forced to use PBKDF2 for whatever dumb reason, the parameters you want are at least:

SHA-512 with 210,000 rounds (preferred)
SHA-256 with 600,000 rounds
SHA-1 (if you must) with 1,300,000 rounds

These numbers are based on constraining attackers to less than 10,000 hashes per second, per GPU.

I’m not currently recommending any algorithms deliberately designed for cache-hardness because they need further analysis from other cryptographers. (Bcrypt is minimally cache-hard, but that wasn’t a stated design goal.)

I didn’t evaluate the other PHC finalists nor other designs (Balloon hashing, Ball and Chain, etc.). Ask your cryptographer if those are appropriate instead.

Password-Based Key Derivation

If you’re deriving an encryption key from a password, use any of the following algorithms:

scrypt with N = 1048576, r = 8, p = 1 and a random salt at least 128 bits in length (256 preferred)
Argon2id with a memory cost of 1024 MiB, ops cost of 4, and parallelism of 1

If you’re forced to use PBKDF2 for whatever dumb reason, the parameters you want should target between 1 and 10 seconds on the defender’s hardware. If this is somehow slower than the numbers given for password hashing above, go with the password hashing benchmarks instead.

Here’s a dumb PHP script you can use to quickly find some target numbers.

<?php/* Dumb PBKDF2 benchmarking script by Soatok. 2022-12-29 */$salt = random_bytes(32);$password = 'correct horse battery staple';$c = '';$start = $end = microtime(true);foreach(['sha1', 'sha256', 'sha512'] as $alg) {foreach ([1 << 14,1 << 15,1 << 16,1 << 17,1 << 18,1 << 19,1 << 20,1200000,1 << 21,2500000,3000000,3500000,1 << 22,5000000,1 << 23,1 << 24,1 << 25,] as $n) {   $start = microtime(true);   $c ^= hash_pbkdf2($alg, $password, $salt, $n, 32, true);   $end = microtime(true);      $time = $end - $start;   echo str_pad($n, 12, ' ', STR_PAD_LEFT),            " iterations ({$alg}) -> \t",           $time,            PHP_EOL;      if ($time > 5.5) {   echo PHP_EOL;   break;   }}}

On my laptop, targeting 5.0 seconds means at least 4,000,000 rounds of PBKDF2 with SHA1, SHA256, or SHA512 (with SHA256 coming closer to 5,000,000).

If you’re aiming for less than 1,000 hashes per second per GPU, against an attacker armed with an RTX 4090:

SHA-512 with 3,120,900 iterations (preferred)
SHA256 with 8,900,000 iterations
SHA-1 with 20,000,000 iterations

Sc00bz tells me that this generation of GPU has better performance/cost than previous generations (which had seen diminishing returns), but requires 1.5x the power, 1.78x the price, and 2x the physical space. Sc00bz refers to this as “1.5 GPUs”.

If you adjust for these factors, your final PBKDF2 target becomes:

SHA-512 with 2,100,000 iterations (preferred)
SHA-256 with 6,000,000 iterations
SHA-1 with 13,000,000 iterations

Password Authenticated Key Exchanges

In a client-server model, where the server isn’t meant to see password-equivalent data, you want an augmented PAKE (or perhaps a doubly-augmented PAKE).

You’re overwhelmingly more likely to find OPAQUE support in the immediate future, due to the direction the IETF CFRG is moving today.

In a more peer-to-peer model, you want a balanced PAKE. CPace is the best game in town.

Don’t use SRP if you can help it.

If you can’t help it, make sure you use one of the groups defined in RFC 5054, rather than a generic Diffie-Hellman group. You’ll also want an expert to review your implementation to make sure your BigInt library isn’t leaking your users’ private exponents.

Also, feel free to deviate from SHA1 for calculating the private exponent from their username and password. SHA1 sucks. Nothing is stopping you from using a password KDF (or, at worst, SHA256) here, as long as you do so consistently.

(But as always, ask your cryptography expert first.)

Art: CMYKat

TL;DR

Password-Based Cryptography is a mess.

If you’re not aware of the history of the field and the technical details that went into the various cryptographic algorithms that experts recommend today, it’s very easy to say something wrong that sounds correct to an untrained ear.

At the end of the day, the biggest risk you face with password-based cryptography is not using a suitable algorithm in the first place, rather than using a less optimal algorithm.

Experts have good reasons to hate PBDKF2 (slow for defenders, fast for attackers) and SRP (we have better options today, which also come with actual security proofs), but they certainly hate unsalted SHA1 and Triple-DES more.

My only real contribution to this topic is an effort to make it easier to talk about to non-experts by offering a new term for the superset of all of these password-based algorithms: Password-Based Cryptographic Functions.

Finally, the password cracking community is great. There are a lot of smart and friendly people involved, and they’re all working to make this problem easier to get right. If you’re not sure where to start, check out PasswordsCon.

Art: CMYKat

https://soatok.blog/2022/12/29/what-we-do-in-the-etc-shadow-cryptography-with-passwords/

#cryptographicHashFunction #cryptographicProtocols #OPAQUE #passwordHashing #PBKDF2 #SecureRemotePasswordProtocol #SecurityGuidance