Fundamentals
TA0040SIEM Log Management
A practical guide to SIEM log management for SOC analysts — syslog setup, Windows Event Forwarding, log rotation, parsing, normalization, correlation rules, and retention strategies.
View on Graph
What SIEM Log Management Covers and Why It Matters
- A SIEM (Security Information and Event Management) system is only as useful as the logs it ingests — garbage in, garbage out applies more to SIEM than almost any other security tool.
- Log management encompasses the full pipeline: generation, collection, transport, parsing, normalization, correlation, storage, and retirement.
- The three most common transport mechanisms are syslog (for network devices, Linux hosts, and appliances), Windows Event Forwarding (WEF, for Windows endpoints), and API pulls (for cloud services like AWS CloudTrail, Azure AD, and SaaS platforms).
- MITRE ATT&CK maps log collection and analysis as a detection control spanning multiple techniques — consistent logging is the foundation of
T1562.001(Impair Defenses: Disable or Modify Tools) detection.
Syslog — The Universal Log Transport
Syslog Formats
Syslog comes in two major RFC standards. Most devices support one or the other:
| RFC | Name | Header Format | Example |
|---|---|---|---|
| RFC 3164 | BSD syslog | <PRI>TIMESTAMP HOSTNAME MSG | <134>Oct 11 22:14:15 myhost Failed password for root |
| RFC 5424 | Structured syslog | <PRI>VERSION TIMESTAMP HOSTNAME APP PROCID MSGID SD-ELEMENTS MSG | <134>1 2026-05-23T19:00:00Z myhost sshd 1234 ID47 [example@0 severity="error"] Failed password |
Key difference: RFC 5424 adds structured data elements (SD-ELEMENTS in square brackets), which allows key-value pairs without relying on regex parsing. RFC 3164 is simpler but harder to parse reliably.
Syslog Facilities and Severities
Syslog messages include a facility code (what generated it) and a severity level:
| Severity | Code | Meaning |
|---|---|---|
| Emergency | 0 | System is unusable |
| Alert | 1 | Immediate action required |
| Critical | 2 | Critical condition |
| Error | 3 | Error condition |
| Warning | 4 | Warning condition |
| Notice | 5 | Normal but significant |
| Informational | 6 | Informational message |
| Debug | 7 | Debug-level message |
Priority formula: PRI = (facility * 8) + severity. So facility 4 (auth) + severity 3 (error) = PRI 35.
Configuring rsyslog Forwarding
# On the log source — /etc/rsyslog.conf or /etc/rsyslog.d/forward.conf
# Forward all auth messages to SIEM
auth.* @siem.internal.example.com:514
auth.* @@siem.internal.example.com:6514 # TLS
# Forward specific facility/severity combos
*.info;mail.none;authpriv.none;cron.none @siem.internal.example.com:514
# Template for RFC 5424 format
$template ForwardFormat,"<%PRI%>1 %TIMESTAMP:::date-rfc3339% %HOSTNAME% %APP-NAME% %PROCID% %MSGID% [example@1 logsource=\"%HOSTNAME%\" facility=\"%syslogfacility-text%\"] %msg%\n"
*.* @@siem.internal.example.com:6514;ForwardFormat
Windows Event Forwarding (WEF)
WEF uses Windows’ built-in WinRM to forward events from endpoints to a central Windows Event Logging & Audit Policy collector server.
WEF Subscription Types
| Type | Description | Use Case |
|---|---|---|
| Source-initiated | Endpoints push events to the collector | Best for domain-joined machines at scale — no collector discovery needed |
| Collector-initiated | Collector pulls events from endpoints | For workgroups or non-domain environments |
Essential Event IDs to Forward
| Event ID | Log Name | What It Detects |
|---|---|---|
| 4624 | Security | Logon success |
| 4625 | Security | Logon failure |
| 4634 | Security | Logoff |
| 4648 | Security | Logon with explicit credentials (RunAs) |
| 4663 | Security | Object access attempt |
| 4672 | Security | Admin logon (special privileges assigned) |
| 4688 | Security | Process creation |
| 4698 | Security | Scheduled task creation |
| 4719 | Security | Audit policy change |
| 4720 | Security | User account created |
| 4722 | Security | User account enabled |
| 4728 | Security | Security-enabled global group member added |
| 4732 | Security | Security-enabled local group member added |
| 4740 | Security | User account locked out |
| 4776 | Security | Credential validation (NTLM) |
| 4799 | Security | Security-enabled local group membership enumerated |
| 5156 | Security | Windows Filtering Platform connection |
| 5157 | Security | Windows Filtering Platform connection blocked |
| 7045 | System | Service installed |
SPL — Verify WEF Health
index=windows sourcetype=WinEventLog:Security
| stats count, dc(Computer) as Endpoints by date_mday
| eval expected_endpoints = 500
| eval coverage_pct = (Endpoints / expected_endpoints) * 100
| table date_mday, count, Endpoints, coverage_pct
| where coverage_pct < 90
Log Rotation — Protect Against Disk Full and Data Loss
Log rotation prevents disks from filling up and ensures old logs are archived or purged. A failed log rotation can cause a gap in visibility during an incident.
Logrotate Configuration
# /etc/logrotate.d/syslog
/var/log/syslog
/var/log/auth.log
/var/log/kern.log
{
rotate 7
daily
compress
delaycompress
missingok
notifempty
postrotate
/usr/lib/rsyslog/rsyslog-rotate
endscript
}
| Directive | What It Does | Setting |
|---|---|---|
rotate N | Keep N archives before deleting | 7 (one week of daily) |
daily / weekly / monthly | Rotation interval | Match retention requirements |
compress / delaycompress | Compress old logs, delay by one cycle | Usually enabled |
maxage N | Delete logs older than N days | Hard cutoff, overrides rotate count |
size N | Rotate when file exceeds N bytes | Use with caution — may rotate too frequently |
missingok | Don’t error if log file is missing | Always enable for resilience |
Parsing and Normalization — Making Logs Queryable
Raw logs are unstructured text. Parsing extracts fields (timestamp, source IP, dest IP, user, action), and normalization maps those fields to a common schema.
Parsing Challenges
| Challenge | Example | Solution |
|---|---|---|
| Vendor variations | Cisco ASA vs PAN firewall — different fields, different formats | Vendor-specific parsing rules, then normalize to common |
| Inconsistent timestamps | RFC 3164 (no year), UTC vs local, epoch vs human-readable | TIMESTAMP normalizer in SIEM pipeline |
| Multiline events | Stack traces, Windows events with descriptions | Multiline parser with header-detect |
| Encoding issues | UTF-8, UTF-16, ASCII, EBCDIC from legacy systems | Charset detector or forced UTF-8 conversion |
Normalization Schema Example (ECS — Elastic Common Schema)
| Raw Field (Windows) | Normalized Field (ECS) | Raw Field (Linux) |
|---|---|---|
EventID | event.code | N/A (Syslog PRI or specific) |
Computer | host.name | $hostname |
IpAddress | source.ip | src_ip (custom) |
Account Name | user.name | $username |
Process ID | process.pid | PID |
CommandLine | process.command_line | CMDLINE (auditd) |
Why normalization matters:
- Correlation rules that work across sources: “detect failed logins” works the same whether the source is Windows Security (4625), Linux auth.log, or VPN logs — because they all map to the same
event.codeanduser.namefields - Dashboards work across data sources without per-vendor field references
- EDR-to-SIEM field mapping works when endpoints and network logs use the same namespace
Correlation Rules — Turning Logs into Alerts
Correlation rules are the core value of a SIEM — they aggregate and analyze logs to find patterns that indicate threats.
Rule Design Patterns
| Pattern | Description | Example |
|---|---|---|
| Threshold | Alert when count exceeds a threshold in a time window | 10+ failed logins in 5 minutes |
| Temporal sequence | Alert when event A happens followed by event B within a timeframe | Process creation (4688) followed by outbound connection (5156) in 60s |
| Geo-anomaly | Alert on out-of-pattern geographical access | User logs in from US, then China within 2 hours |
| Statistical baseline | Alert on deviation from learned behavior | Outbound data volume > 3 standard deviations from baseline |
| Reference set | Alert on match against known-bad indicators | Source IP matches threat intel feed |
SPL — Correlation Rule for Impossible Travel
index=* sourcetype=WinEventLog:Security EventCode=4624
| search Account_Name!="SYSTEM" AND Account_Name!="ANONYMOUS LOGON"
| iplocation Client_IP
| streamstats time_window=2h last(Country) as prev_country, last(City) as prev_city by Account_Name
| where prev_country != "" AND Country != prev_country
| eval travel_time = _time - prev_time
| eval travel_speed_kmh = distance(prev_lat, prev_lon, Latitude, Longitude, "k") / (travel_time / 3600)
| where travel_speed_kmh > 900
| eval alert = "IMPOSSIBLE TRAVEL: " . Account_Name . " logged in from " . prev_city . "," . prev_country . " then " . City . "," . Country . " in " . round(travel_time/60, 1) . " min (" . round(travel_speed_kmh, 0) . " km/h)"
| table _time, Account_Name, Workstation_Name, Client_IP, Country, City, prev_country, prev_city, travel_speed_kmh, alert
Log Retention — Storage vs Compliance vs Investigation
Every log has a cost — storage, indexing, query performance. Retention policies balance investigation needs against operational costs.
| Log Type | Active Retention (Hot) | Archive Retention (Cold) | Typical Reason |
|---|---|---|---|
| Windows Security Events | 90 days | 1 year | Compliance (PCI, SOC 2), forensic investigation |
| Linux auth.log | 90 days | 1 year | Login forensics, breach timeline |
| DNS queries | 30 days | 6 months | C2 detection, DNS tunneling investigation |
| HTTP/Proxy logs | 30 days | 6 months | Malware download investigation |
| Firewall/Netflow | 30 days | 90 days | Threat hunting, lateral movement analysis |
| Cloud audit logs | 90 days | 1 year | Cloud incident response, compliance |
| EDR telemetry | 30 days | 90 days | Process execution, file creation forensics |
| Authentication logs | 90 days | 1 year | Account compromise investigation |
SPL — Monitor SIEM log ingestion volume:
index=*
| stats sum(b) as total_bytes by source, index
| eval total_gb = round(total_bytes / 1073741824, 2)
| eval total_ingest_gb = round(sum(total_gb), 2)
| table source, index, total_gb
| addcoltotals
Common Log Management Mistakes
| Mistake | Why It Hurts | Fix |
|---|---|---|
| Collecting everything with no retention tiering | Storage costs explode, hot indexes become unsearchable | Tier logs: hot (30d), warm (90d), cold (1yr), frozen (delete) |
| No timestamp normalization | Correlation rules break when timestamps are in different timezones or formats | Always convert to UTC at the collector |
| Skipping WEF for workgroup machines | Critical endpoints (kiosks, lab machines) are dark | Use collector-initiated subscriptions for non-domain machines |
| No log rotation on log hosts | Disk fills up, SIEM pipeline breaks on the most important source | Monitor /var/log and %SystemRoot%\System32\winevt\Logs disk usage |
| Ignoring log source health | A non-reporting endpoint looks like a quiet endpoint, not a compromised one | Build a heartbeat dashboard — every source should report at least once per hour |
| Over-indexing verbose fields | Full-text indexing of verbose event data doubles storage | Selective indexing: index key fields, store raw event for re-extraction |
Related
- Indicators: IoC, IoA, and TTP — covers the indicators: ioc, ioa, and ttp concepts
- Digital Forensics & Live Response — detection and response for T1562.001 techniques
- Log Sources Overview — covers the log sources overview concepts
