Lmst

One of the lessons I learned during my time at AWS Cryptography (and particularly as an AWS Crypto Bar Raiser) is that the threat model for Encryption At Rest is often undefined.

Prior to consulting cryptography experts, most software developers do not have a clear and concise understanding of the risks they’re facing, let alone how or why the encrypting data at rest would help protect their customers.

Unsurprisingly, I’ve heard a few infosec thought leader types insist that encryption-at-rest is security theater over the years. I disagree with this assessment in the absolute terms, but there is a nugget of truth in that assertion.

The million dollar question.

Let’s explore this subject in a little more detail.

Why should we listen to you about this topic?

(If you don’t need any convincing, feel free to skip this section.)

Encryption at rest is a particular hobby horse of mine. I previously wrote on this blog about the under-celebrated design decisions in the AWS Database Encryption SDK and the need for key-committing AEAD modes in multi-tenant data lakes.

Before my time at Amazon, I had also designed a PHP library called CipherSweet that offers a limited type of Searchable Encryption. The goal of CipherSweet was to improve the cryptography used by SuiteCRM. (The library name is, of course, a pun.)

I’ve also contributed a ton of time making cryptography easy-to-use and hard to misuse outside of the narrow use-case that is at-rest data encryption. To that end, I designed PASETO as a secure-by-default alternative to JSON Web Tokens.

I also have a lot of skin in the game when it comes to developer comprehension: I was the first Stack Overflow user with a gold badge for both [security] and [encryption], largely due to the effort I put into cleaning up the bad cryptography advice for the PHP ecosystem.

I have spent the past decade or so trying to help teams avoid security disasters in one form or another.

Why should we not listen to you about this topic?

If you happen to know a cryptography expert you trust more than some Internet stranger with a blog, I implore you to listen to them if we disagree on any point. They may know something I don’t. (That said, I’m always happy to learn something new!)

I also do not have a college degree in Cryptography, nor have I published any papers in prestigious academic journals. If you care very much about this sort of pedigree, you will likely find my words easily discarded. If this describes your situation, no hard feelings.

Why and How to use Encryption At Rest to Protect Sensitive Data

Important: I’m chiefly interested in discussing one use-case, and not focusing on other use cases. Namely, I’m focusing on encryption-at-rest in the narrow context of web applications and/or cloud services.
This is not a comprehensive blog post covering every possible use case or threat model relating to encryption at rest. Those other use cases are certainly interesting, but this post is already long enough with a narrower focus.

If you’re only interested in compliance requirements, you can probably just enable Full Disk Encryption and call it a day. Then, if your server’s hard drive grows legs and walks out of the data center, your users’ most sensitive data will remain confidential.

Unfortunately, for the server-side encryption at rest use case, that’s basically all that Disk Encryption protects against.

If your application or database software is online and an attacker gains access to it (e.g., through SQL injection), with full disk encryption, it might as well be plaintext to the attacker.

It do be like that.

Therefore, if you find yourself reaching for Encryption At Rest to mitigate the impact of the kind of vulnerability that would leak the contents of your database or filesystem to an attacker, you’re probably unwittingly engaging in security theater.

Disk Encryption is important for disk disposal and mitigating hardware theft, not preventing data leakage to online attackers.

So the next logical thing to do is draw a box around the system or component that stores a lot of data and never let plaintext cross that boundary.

Client-Side Encryption

Note: The naming here is a little imprecise. It is client-side encryption with respect to your data warehouse (i.e. SQL database), but not with respect to the user experience of a web application. In those cases, client-side would mean on the actual end user’s device.

Instead, client-side encryption is the generic buzz-word to mean that you’re encrypting data outside of the box you drew in your system architecture. Generally, this means that you have an application server that’s acting as the “client” for the purpose of bulk data encryption.

There are a lot of software projects that aim to provide client-side encryption for data stored in a database or filesystems; e.g., in Amazon S3 buckets.

This is a step in the right direction, but implementation details matter a lot.

Quick aside: For the remainder of this blog post, I’m going to assume an architecture that looks like a traditional web application, for simplicity.
The assumed architecture looks vaguely like this:
User Agents (e.g., web browsers) that communicate with the application server.
Application Server(s) respond to HTTP requests from user agents, manages key material using KMS, encrypts / decrypts records stored in the database.
Database Server(s) which store ciphertext on behalf of the application server.
This is an abstract design, so the actual implementation details you encounter in the real world may be simpler or more complex in different respects.
There are other interesting design considerations for OS-level end-user device encryption that I’m not going to explore today. For example: Adiantum is extremely cool.
I’m also not going to dive deep into laptop theft or the importance of Full Disk Encryption as a mechanism for ensuring data is erased from solid state hard drives, or the activities of hostile nation states. That’s a separate discussion entirely.

Security Considerations for Client-Side Encryption

The first question to answer when data is being encrypted is, “How are the keys being managed?” This is a very deep rabbit hole of complexity, but one good answer for a centralized service is, “Cloud-based key management service with audit logging”; i.e. AWS KMS, Google CloudKMS, etc.

Next, you have to understand how the data is being encrypted in the first place.

Bad answer: AES in CBC mode without HMAC.

Worse answer: AES in ECB mode.

Generally, you’re going to want to use an AEAD construction, such as AES-GCM or XChaCha20-Poly1305.

You’ll also want key-commitment if you’re storing data for multiple customers in the same hardware. You can get this property by stapling HKDF onto your protocol (once for key derivation, again for commitment). See also: PASETO v3 and v4.

It may be tempting to build a committing AEAD scheme out of, e.g., AES-CTR and HMAC, but take care that you don’t introduce canonicalization risks in your MAC.

Is Your Deputy Confused?

Even if you’re using IND-CCA secure encryption and managing your keys securely, there is still a very stupid attack against many data-at-rest encryption schemes.

To understand the attack, first consider this sort of scenario:

Alice and Bob use the same health insurance provider, whom is storing sensitive medical records for both parties. Bob works as a database administrator for the insurance company he and Alice both use. One day, he decides to snoop on her private medical history.
Fortunately, the data is encrypted at the web application, so all of the data Bob can access is indistinguishable from random. He can access his own account and see his data through the application, but he cannot see Alice’s data from his vantage point on the database server.

Here’s the stupid simple attack that works in far too many cases: Bob copies Alice’s encrypted data, and overwrites his records in the database, then accesses the insurance provider’s web app.

Bam! Alice’s plaintext recovered.

What’s happening here is simple: The web application has the ability to decrypt different records encrypted with different keys. If you pass records that were encrypted for Alice to the application to decrypt it for Bob, and you’re not authenticating your access patterns, Bob can read Alice’s data by performing this attack.

In this setup, the application is the Deputy, and you can easily confuse it by replaying an encrypted blob in the incorrect context.

The mitigation is simple: Use the AAD mechanism (part of the standard AEAD interface) to bind a ciphertext to its context. This can be a customer ID, each row’s value for the primary key of the database table, or something else entirely.

If you’re using AWS KMS, you can also use Encryption Context for this exact purpose.

The Curious Case of CipherSweet

The first release of CipherSweet mitigated most of this risk by construction: Each field uses a different encryption key, through a key derivation scheme.

Since CipherSweet’s inception, if you try to replace Alice’s encrypted zip code with Alice’s encrypted social security number, the keys would be wrong, so it would lead to a decryption failure.

Or so I thought!

As I mentioned in my blog post about multi-tenancy and confused deputy attacks, if your AEAD mode doesn’t commit to the key used, it’s possible to craft a single (ciphertext, tag) that decrypts to two different plaintext values under two different keys.

This violated the Principle of Least Astonishment and motivated the development of a new algorithm suite called BoringCrypto, which used BLAKE2b-MAC instead of Poly1305. This change was released in version 3.0.0 in June 2021.

However, even as of 3.0.0, this only mitigated most of the issue by construction. The last mile of complexity here is that each field must also be bound to a primary key or foreign key.

Encrypting with AAD has been possible since a very early release of CipherSweet, but being possible to use securely is not sufficient. It should be easy to use securely.

CipherSweet Version 4.7.0, which was released last month, now only requires a code change that looks like this in order to mitigate confused deputies in an application:

  $multiRowEncryptor = new EncryptedMultiRows($engine);  $multiRowEncryptor+     ->setAutoBindContext(true)+     ->setPrimaryKeyColumn('table2', 'id')      ->addTextField('table1', 'field1')

This is in addition to the new Enhanced AAD feature, which allows for flexible and powerful context binding based on other fields and/or string literals.

(In fact, this new convenience feature actually uses Enhanced AAD under-the-hood.)

As you can see, mitigating confused deputies in an encryption library (without making it unwieldy) requires a painstaking attention to detail to get right.

As Avi Douglen says, “Security at the cost of usability comes at the cost of security.”

Given the prevalence of client-side encryption projects that just phone it in with insecure block cipher modes (or ECB, which is the absence of a block cipher mode entirely), it’s highly doubtful that most of them will ever address confused deputy attacks. Even I didn’t get it right at first when I made CipherSweet back in 2018.

What about non-databases?

Everything I mentioned in the previous section was focused on confused deputy attacks against client-side encryption for information that is stored in a database, but it’s a general problem with encrypting data at rest.

If you’re storing encrypted data in an S3 bucket, you still need some form of context-binding to stop the dumb and obvious attack from working against a deputy that reads data from said S3 bucket.

Why aren’t things better already?

As with most things in software security, the problem is either not widely known, or is not widely understood.

Unknown unknowns tend to fester, untreated, across the entire ecosystem.

Misunderstood issues often lead to an incorrect solution.

In this case, at-rest encryption is mostly in Column B, and confused deputy attacks are mostly in Column A.

The most pronounced consequence of this is, when tasked with building at-rest data encryption in an application, most software developers do not have a cohesive threat model in mind (let alone a formal one).

This leads to disagreement between stakeholders about what the security requirements actually are.

How can I help improve things somewhat?

Most importantly, spread awareness of the nuances of encryption at-rest.

This blog post is intended to be a good conversation starter, but there are other resources to consider, too. I’ve linked to many of them throughout this post already.

If you’re paying for software to encrypt data at rest, ask your vendor how they mitigate the risk of confused deputy attacks. Link them to this blog post if they’re not sure what you mean.

If said vendor responds, “this risk is outside of our threat model,” ask to see their formal threat model document. If it exists and doesn’t align with your application’s threat model, maybe consider alternative solutions that provide protection against more attack classes than Full Disk Encryption would.

Finally, gaining experience with threat modeling is a good use of every developer’s time. Adam Caudill has an excellent introductory blog post on the subject.

Closing Thoughts

Despite everything I’ve written here today, I do not claim to have all the answers for encryption at rest.

However, you can unlock a lot of value just by asking the right questions. My hope is that anyone that reads this post is now capable of asking those questions.

Addendum (2024-06-03)

After I published this, the r/netsec subreddit has expressed disappointment that this blog post had “no mention of” consumer device theft or countries experiencing civil unrest and pulling hard drives from data centers.

You could make a congruent complaint that it also had no mention of Batman.

To be clear, I’m not saying that the use cases and risks Reddit cares about are off-topic to any discussion of full-disk encryption. They matter.

Rather, it’s that they’re not relevant to the specific point I am making: Even in the simplest use case, far from the annoying details of end user hardware or the whims of nation states, encryption-at-rest is poorly understood by most developers, and should be thought through carefully.

Your threat model is not my threat model, and vice versa.

I never advertised this blog post as a comprehensive and complete guide to the entire subject of encryption-at-rest. If you too felt under-served by this blog post for not addressing the corner cases that really matter to you, I hope this addendum makes it clearer why I didn’t cover them.

Finally, if you feel that there’s an aspect of the encryption-at-rest topic that really warrants further examination, I invite you to blog about it.

If your blog post is interesting enough, I’ll revise this post and link to it here.

https://scottarc.blog/2024/06/02/encryption-at-rest-whose-threat-model-is-it-anyway/

#Cryptography #cybersecurity #encryption #encryptionAtRest #security #symmetricCryptography #technology

#encryptionAtRest

Client Info