The Vectors of Database Encryption
One of the most requested features by Crunchy Data customers using modern enterprise database environments is some form of data encryption. However, nailing down exactly what someone means when they say "We need our data encrypted" is often a challenge due to the actual requirements not being fully clarified or even understood. So, before anyone tries to implement database encryption it is critically important to understand what needs to be encrypted and what benefit is actually gained by the methods that are employed. This blog post is not going to discuss any deep technical implementations of encryption. Instead, let's discuss what vectors of attack any given encryption method will mitigate since that will greatly influence which method is effective before you even reach any sort of development or deployment phases.
The application of encryption to a database environment can be broken down into three different methods:
- Data-At-Rest
- Data-In-Transit
- Data-In-Use
Data-At-Rest
Data-At-Rest is probably one of the most talked about and requested methods of encryption when talking about databases, so let's talk about that one first. What vectors of attack is this method effective for?
- Attack vector of concern is when data is not in use
- Data must remain encrypted at all times while not in use
- ALL data must be encrypted
- Physical access to hardware is a perceived threat
When is this method of encryption ineffective?
- Attack vector of concern is a fully compromised host
- Attack vector of concern is not physical access
- Attack vector of concern is data transmission
The solutions to Data-At-Rest encryption can shed some light on why it is either effective or ineffective based on the above statements. The most common solution to this is full disk encryption, which is completely independent of any RDBMS or application in use. This can be done either at the hardware or software level and the client accessing that data typically never knows it was encrypted in the first place and rarely has to do anything on their end to either encrypt or decrypt the data. This is also why a fully compromised host, be it the server where the data is stored or a client accessing that data, completely compromises all the protection that many people want that encryption for in the first place.
Another common method to implement Data-At-Rest is Transparent Data Encryption (TDE). Similar to full disk encryption, this handles it at either the filesystem, hardware, or database level and, again, the client is completely unaware of encryption being in use. This has similar benefits and issues as full disk encryption mentioned above, however, TDE at the database level can provide some additional protections where general filesystem level encryption cannot. Depending on the database level TDE in play, the data may only be available directly through the database but not from the system level.
As of version 14, the community release of PostgreSQL does not have TDE built in, but it is currently under development for a future release. TDE is currently available through Crunchy Hardened Postgres.
If your main concern is someone walking into your office or data center and walking out with your hard drives or servers, this method can be effective against that. However, there are additional issues with Data-At-Rest encryption, and solutions to those issues, so lets move on to the discussing the other two methods mentioned above.
Data-In-Transit
One of the easiest methods of encryption to implement, but unfortunately one that is not brought up as often as it should be during the discussion of database encryption, is securing the data while it is being transferred to or away from the database. When is this method effective?
- Attack vector of concern is data visibility during transit
- Attack vector of concern is local network compromise
- Attack vector of concern is data transmission over the Internet
When is this method ineffective?
- Attack vector of concern is a fully compromised host
- Attack vector of concern is physical access
- Attack vector of concern is data visibility in storage
Note these vectors are ineffective when Data-In-Transit is the only encryption method put into place. When combined with Data-At-Rest or Data-In-Use solutions, the physical access and data visibility in storage concerns can be addressed.
The most common solution to Data-In-Transit is securing your network traffic with TLS certificate management. And thankfully almost all modern database servers, including PostgreSQL, and clients have methods for implementing that certificate management.
Data-in-Use
This final method is frequently not the one mentioned often, but is actually the most effective method for securing your data in most situations.
- Attack vector of concern is data visibility inside the database
- Access to unencrypted data is privilege based
- Application controls the encryption/decryption process
- Only specific items need encryption (Ex. per-column)
- Can possibly address both Data-At-Rest and Data-In-Transit concerns
It can still be ineffective when:
- Attack vector of concern is a fully compromised client host
- Attack vector of concern is physical access to the application server
Data-In-Use encryption is best handled at the application layer because then the private key is never, at any point, anywhere on the database system. The most common method for doing this is some sort of vaulted credentials system where the application requests access to the decryption key at the time it needs the data. The data is then decrypted on the application server, or some server between the database and client.
This can completely mitigate one of the biggest vulnerabilities with Data-At-Rest encryption since the very nature of that encryption method requires that either the private key or password exist somewhere on the database server to allow the transparent decryption of that data. While in-memory attacks are rare and difficult, they are most certainly still possible. So if that vector of attack is a concern, it can be completely eliminated if the data is never decrypted on the database server itself. And for that reason, it also solves many Data-At-Rest concerns. And while it technically can solve many Data-In-Transit concerns as far as the data itself, it's still best to implement some sort of end-to-end TLS solution, especially if that data is traveling across the Internet.
The main areas of concern left for Data-In-Use encryption are compromise of the client or application server. However, these are also still concerns with Data-At-Rest and Data-In-Transit as well. So solving that problem is typically outside the scope of solving the actual process of encryption itself.
While Data-In-Use solutions can take more planning and development time to implement, if the security of your data is paramount, this is by far one of the most effective means of actually keeping your data secure.
Backups
And lastly, one of the other places that database encryption can often be overlooked is the backups. All three above methods need to be addressed for your backups as well.
- Data-At-Rest
- Filesystem backups could backup unencrypted versions of files
- Encrypt the backups independent of the Data-At-Rest system
- pgBackRest supports S3/Azure/GCS client-side encryption
- Data-In-Transit
- Use ssh to transmit both backups and archived WAL files
- Data-In-Use
- Filesystem backups of the database server should generally not compromise this encryption method since the data itself is never decrypted on the database server.
- Logical backups could potentially backup unencrypted data if they are using the decryption system to dump out unencrypted versions of the data.
Conclusion
There are many different solutions out there for actually implementing these different methods of encryption. Crunchy Data products on Cloud, Kubernetes, or VMs all offer encryption at-rest and in-transit. We also offer Crunchy Hardened Postgres with Transparent Data Encryption (TDE).
We hope this overview of encryption methods for your database helps you when planning out which of those solutions works best for your environment and actually provides the security you are looking to achieve.
Related Articles
- Postgres Tuning & Performance for Analytics Data
19 min read
- Running an Async Web Query Queue with Procedures and pg_cron
6 min read
- Name Collision of the Year: Vector
9 min read
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read