Security Mindset
Threat Modeling Basics
Common Backend Vulnerabilities
Handling Sensitive Data
Hashing vs. Encryption
Authentication & Authorization
Spring Security
Practice
Assignment
Back end Track
Under construction
<aside>
🚧
This page is currently under construction. Please check back later.
</aside>
Introduction
In an era of constant cyber threats and data breaches, robust application security relies on cryptographic techniques and tools to ensure both confidentiality and integrity in compliance with regulatory standards. People often mixing the terms "hashing" and "encryption" when talking about cryptography, but they are fundamentally different and serve distinct purposes. Choosing the wrong one can lead to a major security vulnerability.
Data Encryption?
Data encryption is a security process that converts readable data into an unreadable code, known as cipher-text, using a specific algorithm and a secret key. This ensures that only authorized individuals with the correct key can decipher the information, protecting sensitive data from unauthorized access, both when it is being stored and when it is being transmitted.
How does it work:
Data encryption works by applying a mathematical algorithm, along with a secret key, to transform the original plaintext data into an unreadable ciphertext. The same key is then used to then reverse the process, “decrypting the ciphertext,” and revealing the original plaintext. Without the correct key, the ciphertext remains unintelligible and thus inaccessible to unauthorized individuals.
Types of Data Encryption
There are two broad categories of data encryption:
- Symmetric Encryption: Symmetric encryption uses the same key for both encryption and decryption processes. It’s quicker and less demanding on resources, ideal for encrypting large data volumes.
- Asymmetric Encryption: Also known as public-key encryption, asymmetric encryption uses two keys: one for encryption and another for decryption. The data is encrypted with the public key, and its decryption is facilitated by the private key. This process eliminates the need to share keys, thereby increasing security, but it’s more demanding on resources than symmetric encryption.
The type of encryption chosen depends on the specific needs and resources of the situation.
Common Data Encryption Algorithms
Data encryption algorithms convert plaintext data into ciphertext, ensuring data confidentiality and integrity. Each method has a compelling use case and times when it is not sufficient. Before implementing one into your business, consider how each might benefit or hinder your data privacy goals.
- Data Encryption Standard (DES): DES is a symmetric key algorithm that was more popular in the past. However, due to its short key length (56 bits), it is no longer a preferred encryption method.
- Rivest-Shamir-Adelman (RSA): RSA is one of the most popular asymmetric (public-key) encryption algorithms. It is based on factoring large prime numbers and is commonly used for secure data transmission and key exchange.
- Advanced Encryption Standard (AES): AES is a symmetric key algorithm. It is approved by the U.S. government and is widely used in various applications due to its strong security, performance, and flexibility (with key lengths of 128, 192, or 256 bits). To many, AES is the gold standard in encryption.
Benefits of Data Encryption
- Ensures Data Security: Data encryption is the fundamental mechanism that secures data from unauthorized access, eavesdropping, and cyber-attacks. By transforming plaintext data into an unintelligible ciphertext format, encryption makes it extremely difficult for attackers to gain access to sensitive information, even if they manage to intercept the encrypted data.
- Maintains Integrity of Data: Encryption prevents unauthorized modifications or tampering of data during transmission or storage. Any alterations to the encrypted data will result in decryption failures, allowing the detection of data corruption or tampering attempts.
- Ensures Compliance with Regulations: Many industries and sectors have strict regulations and compliance requirements mandating the use of encryption for sensitive data, such as personal identifiable information (PII), financial data, and healthcare records. Implementing robust encryption measures helps organizations remain compliant and avoid hefty fines or legal consequences.
- Protects Data in Transit: Encryption is crucial for protecting data as it moves across untrusted networks, such as the internet. Protocols like TLS/SSL and VPNs use encryption to secure data in transit, preventing eavesdropping and ensuring the confidentiality of online transactions, communications, and data transfers.
Challenges in Implementing Data Encryption
Implementing data encryption involves three main challenges. First, complex key management is required to securely generate, store, and rotate cryptographic keys, as improper handling can make the encryption worthless. Second, encryption can create a performance impact, slowing down systems and forcing a trade-off between security and application speed. Finally, organizations face the difficulty of ensuring compliance with diverse and evolving industry and regional data protection regulations.
Hashing
Hashing is a one-way cryptographic process that transforms any input data into a fixed-size string of characters, known as a hash value or digest. Unlike encryption, which is a two-way process designed to be reversed (decrypted), hashing is irreversible. You cannot retrieve the original data from its hash.
The primary purpose of hashing is not to hide data, but to verify its integrity and authenticity. By comparing the hash of a piece of data at two different points in time, you can instantly know if it has been altered in any way.
How does it work?
Hashing works by feeding data of any size—from a single word to a large file—into a mathematical hash function. This function processes the input and produces a unique, fixed-length output. For example, the SHA-256 algorithm will always produce a 256-bit (64-character) hash, regardless of whether the input is "hello" or the entire text of a book.
The process is deterministic, meaning the same input will always generate the exact same hash value. However, a tiny change in the input (like changing a single letter) will produce a completely different hash.
Key Properties of a Good Hash Function
A secure hashing algorithm is built on several key principles:
- One-Way: It is computationally infeasible to reverse the process and derive the original input from its hash. This is the most fundamental difference from encryption.
- Deterministic: The same input data will consistently produce the same hash output. This is essential for verification.
- Fast to Compute: The hash function should be able to generate a hash value quickly for any given input.
- Avalanche Effect: A small change to the input data (e.g., changing one bit) should result in a completely different hash. This prevents attackers from guessing inputs by looking for similar hashes.
- Collision Resistant: It should be extremely difficult to find two different inputs that produce the same hash value. A "collision" would undermine the integrity-checking capability of the hash.
Common Hashing Algorithms
Just like with encryption, different hashing algorithms have been developed over the years, with varying levels of security.
- MD5 (Message Digest 5): An older algorithm that was once widely used. However, it is now considered broken and insecure because "collisions" can be easily found, making it unsuitable for security purposes.
- SHA (Secure Hash Algorithm) Family: A family of algorithms developed by the NSA.
- SHA-1: More secure than MD5 but is also now considered weak and is being phased out.
- SHA-2 (SHA-256, SHA-512): Currently the industry standard for many applications like data integrity verification and blockchain technology. They are considered secure and reliable for these purposes.
- Password Hashing Functions (bcrypt, scrypt, Argon2): While algorithms like SHA-256 are excellent for data integrity, they are too fast for storing passwords. Specialized, deliberately slow algorithms were created to solve this problem, as we will see next.
Where is Hashing Used?
Hashing is a workhorse in modern computing and is used in many scenarios:
- Secure Password Storage: To store user credentials without ever holding the actual password, preventing a data breach from exposing plain-text passwords.
- Data Integrity Verification: To ensure a file has not been altered during download or transfer. You often see this with software downloads, where the provider publishes a checksum (hash) for you to verify.
- Digital Signatures: Hashing is a key component of digital signatures. A document is hashed, and the hash is then encrypted with the sender's private key to prove authenticity and integrity.
- Database Indexing: Hashing is used in data structures like hash tables to allow for fast data lookup and retrieval (a non-security application).
Encryption vs. Hashing: A Quick Summary
| Feature |
Encryption |
Hashing |
| Purpose |
Confidentiality (to keep data secret) |
Integrity (to verify data hasn't changed) |
| Function Type |
Two-way (Encrypt & Decrypt) |
One-way (Irreversible) |
| Output |
Variable length (related to input) |
Fixed length |
| Key |
Requires a secret key to reverse |
No key is used |
| Primary Use Case |
Securing data in transit and at rest |
Storing passwords, verifying file integrity |
Now that we understand the fundamental difference and know that hashing is the correct tool for passwords, let's explore how to do it correctly.
Secure Password Storage with bcrypt
We know now we must hash passwords, not encrypt them, the next question is: how do we hash them securely?
You will think let’s just use a standard hashing algorithm like SHA-256 to hash the password and store it in the database. It's one-way, so it's secure, right?
Well, not so fast. There are so many lists nowadays of the 10 million most common passwords in the internet. Attackers can calculate the SHA-256 hash for all of them and stored them in a giant lookup table “called a rainbow table”. Then they take the hashes from the compromised database, look them up in their table, and instantly find the original password for thousands of users.
Simple Hashing is Not Enough
A simple hash is deterministic: the same input always produces the same output. hash("password123") will always result in the same hash value. This is what allows attackers to use pre-computed rainbow tables to crack passwords in seconds. So what we can do? Salt and Slowing Down
To defeat rainbow tables, we introduce randomness.
-
Salting: A salt is a random string of data that is unique to each user. We append this salt to the password before hashing it.
- User A's password: hash("password123" + "random_salt_A") -> hash_A
- User B's password: hash("password123" + "random_salt_B") -> hash_B
Now, even though both users have the same weak password, their stored hashes are completely different. The attacker's rainbow table is useless! The salt is stored in the database alongside the hashed password.
-
Slowing Down: Attackers can still try to crack one password at a time (a "brute-force" attack). To make this impractical, we should use a hashing algorithm that is intentionally slow. If it takes a fraction of a second to check one password, trying billions of combinations becomes impossibly expensive for the attacker.
The Standard - bcrypt**:**
Fortunately, we don't have to build this ourselves. We use algorithms designed for this purpose. The industry standard is bcrypt.
Why bcrypt is the right tool:
- It's slow by design: It has a configurable "work factor" (or cost) that determines how slow it is. You can increase the cost as computers get faster.
- It includes a salt automatically: You don't have to generate or manage the salt yourself. bcrypt handles it for you.
Practical Implementation in Spring Boot