Data Exfiltration Detection

What Data Exfiltration Is and Why It Is the Hardest Phase to Detect

Data exfiltration is the unauthorized transfer of data from within the organization to an external destination controlled by the attacker. MITRE ATT&CK maps exfiltration to T1048 (Exfiltration Over Alternative Protocol) and T1052 (Exfiltration Over Physical Medium).

The detection challenge: Legitimate data also leaves the network. Employees email files to partners, upload documents to cloud storage, push code to GitHub, and stream video. Distinguishing malicious exfiltration from legitimate business use requires understanding normal behavioral baselines — volume, timing, protocol, and destination.

Most ransomware attacks now include data exfiltration before encryption (double extortion), and many intrusions never deploy ransomware — the data itself is the objective (intellectual property theft, espionage, credential harvesting).

Exfiltration Method 1: HTTP/HTTPS POST to External Server

How it works: The attacker packages stolen data as a POST request body to a server they control. This is the most common exfiltration method because HTTPS encrypts the payload — the analyst sees a TLS connection but cannot read the contents.

Detection logic:

Large POST request bodies (multiple MB) from a host that doesn’t normally make large uploads
POST requests to a newly registered domain or an IP address not associated with any known service
Consistent request size across multiple POSTs (not typical of web traffic)
POST requests occurring after hours or from a user who is not normally active at that time

Logs to check:

Proxy logs: Look for POST requests with large Content-Length headers, especially to destinations that are not known CDNs or cloud services
Firewall logs: Large outbound TCP connections to a single external IP
Sysmon Event ID 3: Identify which process is making the connection — a non-browser process making HTTPS POSTs is a strong indicator

SIEM query (SPL):

index=proxy sourcetype=access_combined
method=POST status=200
| eval size_mb = bytes/1048576
| where size_mb > 5
| stats sum(bytes) as total_bytes by src_ip, dest_host, user
| where total_bytes > 100000000
| sort - total_bytes

Shows all hosts sending > 100MB via POST requests — candidates for exfiltration.

Exfiltration Method 2: DNS Tunneling

How it works: The attacker encodes stolen data into DNS queries — typically as subdomains of a domain they control. Since DNS is almost never blocked at firewalls, this method can bypass network controls entirely. The attacker’s DNS server receives the query, decodes the subdomain to extract the data, and sends back a DNS response (possibly containing further commands).

Detection logic:

High TXT query volume. TXT records have the largest payload capacity of any DNS record type and are the most common carrier for DNS tunneling.
Unusually long subdomains. A typical subdomain is under 20 characters. DNS tunneling subdomains are often 30-50+ characters of random-looking text (Base64-encoded data).
High query rate from a single host. A single machine making hundreds of DNS queries per minute to the same domain is suspicious.
Aberrant record types. Any machine making many TXT, CNAME, or ANY queries is worth investigating — normal clients almost exclusively make A and AAAA queries.

Logs to check:

DNS query logs: Analyze by QueryName length, QueryType distribution, and query rate per client
Sysmon Event ID 22: Shows DNS queries per-process — a non-system process making DNS queries is suspicious

SIEM query (SPL):

index=dns sourcetype=dns
| eval subdomain_len = len(split(query, ".")[0])
| where subdomain_len > 30
| stats count by src_ip, query, query_type
| where count > 10
| sort - count

Shows hosts with long subdomain queries (>30 chars) — DNS tunneling indicator.

Triage decision:

Is the destination domain registered to your organization? Legitimate CDN/DNS services may use long subdomains
Does the same host also show large TXT query volume?
Is the process making these queries a known browser or system service?

Exfiltration Method 3: Cloud API Uploads

How it works: The attacker uses compromised credentials to upload data to cloud storage services (AWS S3, Azure Blob, Google Drive, SharePoint, Dropbox) via API calls. Since these are legitimate services used for business purposes, the traffic blends in.

Detection logic:

API calls from an unexpected source. A user who never accesses cloud storage via API suddenly making PutObject or Upload API calls.
Volume anomaly. A single user uploading more data in one hour than the entire department does in a day.
New API keys. Cloud API calls using an access key that was just created (CreateAccessKey followed by data upload) is a strong indicator.
After-hours uploads. Bulk uploads starting at 2 AM from a user who never works late.

Logs to check:

AWS CloudTrail: PutObject, CopyObject, GetObject (mass reads count as recon for exfiltration)
Azure Activity Log: Storage Blob Upload, File Upload
GCP Cloud Audit Logs: storage.objects.create
SaaS audit logs: Google Workspace audit (Drive downloads), Office 365 audit (SharePoint file access)

SIEM query (SPL) — AWS example:

index=cloudtrail eventName=PutObject
| stats count, sum(bytes) as total_bytes by sourceIPAddress, userIdentity.arn, eventSource
| where total_bytes > 50000000
| sort - total_bytes

Shows all PutObject API calls > 50MB — potential cloud exfiltration.

Triage decision:

Is the API call from a production automation account or a user account? Automation expected; user account doing bulk uploads is not.
Was the access key recently created? Check CloudTrail for CreateAccessKey events from the same user in the past 24 hours.
Is the bucket/container public or private? Upload to a public bucket is especially concerning.

Exfiltration Method 4: Email Attachments

How it works: The attacker sends data to an external email address via attachment. This is a common exfiltration method because email is expected and rarely blocked outright.

Detection logic:

Large outbound attachments. Emails with attachments > 10MB sent to external recipients.
New external recipients. A user sending data to an external domain they have never emailed before.
After-hours email sends. Bulk email activity at unusual times.
Zip archives. Password-protected zip files in email bypass DLP scanning.

Logs to check:

Email gateway logs: Attachment size, recipient domain, sender-user, timestamp
DLP alerts: Data classification match, sensitive content detection
Proxy logs: User accessing webmail services (Gmail, Outlook.com) from corporate device — potential unauthorized data transfer

SIEM query (SPL):

index=email sender_domain=yourcompany.com recipient_domain!=yourcompany.com
attachment_size > 10000000
| stats count, sum(attachment_size) as total_bytes by sender, recipient_domain
| sort - total_bytes

Shows all emails with attachments > 10MB sent to external domains.

Exfiltration Method 5: SSH/SFTP/SCP

How it works: The attacker uses SSH or related protocols (SFTP, SCP) to transfer data to an external server. Since SSH is commonly used by IT teams for legitimate administration, detecting malicious use requires contextual analysis.

Detection logic:

SSH from a server to the internet. SSH outbound from a server that should never initiate SSH connections to external hosts.
Data volume via SSH. SFTP transfers of unusually large files.
Long-lived SSH sessions. SSH sessions from internal hosts to external IPs lasting hours.
SSH to known-bad IPs. Destination IP associated with threat intelligence feeds.

Logs to check:

Sysmon Event ID 3: Process making network connections — sshd.exe, ssh.exe, scp.exe, or sftp.exe connecting to an external IP
Firewall logs: Outbound SSH (port 22) traffic to external IPs
Windows Event 5156: Connection permitted by Windows Filtering Platform

Exfiltration Method 6: ICMP and Other Protocol Tunneling

How it works: Hide data in protocol fields that are not normally inspected — ICMP echo request payloads, HTTP headers, or custom protocols. Tools like pingtunnel or iodine implement this.

Detection logic:

Large ICMP packets. Normal ping: 32-64 bytes. ICMP tunneling: 1472+ bytes (fragmented).
High ICMP traffic volume. A host sending hundreds of ping packets to the same external IP.
Unusual ICMP payload content. Nmap NSE can detect ICMP tunnels by analyzing payload entropy.

Logs to check:

Firewall logs: ICMP traffic with unusually large packet sizes
Zeek conn.log: ICMP protocol connections — high count from a single host
NetFlow/IPFIX: ICMP traffic volume analysis

Exfiltration Detection — Triaging the Finding

When you detect a potential exfiltration event, follow this decision tree:

Potential exfiltration detected
    │
    ├─ What type of data is at risk?
    │   ├─ Regulated (PII, PHI, PCI) → Mandatory breach notification check. Escalate to legal.
    │   ├─ Proprietary (source code, trade secrets) → IP theft. Involve executive leadership.
    │   └─ Operational (credentials, configs) → Account compromise. Reset affected credentials.
    │
    ├─ What is the volume?
    │   ├─ > 1GB → Significant exfiltration. Assume data lost. Begin breach notification process.
    │   ├─ 100MB-1GB → Moderate. Deep investigation required. Check historical baseline.
    │   └─ < 100MB → Investigate further. May be test exfiltration or false positive.
    │
    ├─ What is the timing?
    │   ├─ After hours/weekend → Higher confidence. Most exfiltration happens off-hours.
    │   └─ Business hours → Check if the user was actually working. Phish-compromised accounts exfiltrate during normal hours.
    │
    └─ Containment actions:
        ├─ Block the destination IP/domain at the firewall
        ├─ Isolate the source host
        ├─ Disable the compromised account
        └─ Begin forensic collection on the source system

Prevention — Controls That Stop Exfiltration

Control	What It Blocks	Effectiveness
Egress filtering	Blocks outbound traffic to unauthorized ports/protocols. Stop SSH (22) and DNS (53) except to authorized servers.	High — makes DNS tunneling and SSH exfiltration much harder
DNS sinkhole	Blocks resolution of known-malicious domains. Prevents exfiltration to C2 domains.	Medium — only works against known-bad domains
DLP (Data Loss Prevention)	Scans outbound traffic for sensitive data patterns (credit cards, SSN, source code).	Medium — bypassed by encryption, compression, and encoding
Cloud CASB	Monitors cloud API calls. Alerts on abnormal cloud access patterns.	High for cloud exfiltration specifically
UEBA (User Entity Behavior Analytics)	Baselines normal behavior and flags anomalies — unusual data volume, off-hours access, new destinations.	High — catches novel exfiltration methods
Network baselining	Know what normal network traffic looks like so anomalies stand out.	Foundational — required for all other controls

Kill Chain — covers the kill chain concepts
MITRE ATT&CK for Triage — covers the mitre att&ck for triage concepts
Insider Threat — detection and response for T1078 techniques
Cobalt Strike — Detection and Beacon Analysis — detection and response for T1055, T1572, T1071 techniques
Active Directory Compromise Response — detection and response for T1558 techniques

What Data Exfiltration Is and Why It Is the Hardest Phase to Detect

Exfiltration Method 1: HTTP/HTTPS POST to External Server

Exfiltration Method 2: DNS Tunneling

Exfiltration Method 3: Cloud API Uploads

Exfiltration Method 4: Email Attachments

Exfiltration Method 5: SSH/SFTP/SCP

Exfiltration Method 6: ICMP and Other Protocol Tunneling

Exfiltration Detection — Triaging the Finding

Prevention — Controls That Stop Exfiltration

Related

Sources