SIEM Log Management

What SIEM Log Management Covers and Why It Matters

A SIEM (Security Information and Event Management) system is only as useful as the logs it ingests — garbage in, garbage out applies more to SIEM than almost any other security tool.
Log management encompasses the full pipeline: generation, collection, transport, parsing, normalization, correlation, storage, and retirement.
The three most common transport mechanisms are syslog (for network devices, Linux hosts, and appliances), Windows Event Forwarding (WEF, for Windows endpoints), and API pulls (for cloud services like AWS CloudTrail, Azure AD, and SaaS platforms).
MITRE ATT&CK maps log collection and analysis as a detection control spanning multiple techniques — consistent logging is the foundation of T1562.001 (Impair Defenses: Disable or Modify Tools) detection.

Syslog — The Universal Log Transport

Syslog Formats

Syslog comes in two major RFC standards. Most devices support one or the other:

RFC	Name	Header Format	Example
RFC 3164	BSD syslog	`<PRI>TIMESTAMP HOSTNAME MSG`	`<134>Oct 11 22:14:15 myhost Failed password for root`
RFC 5424	Structured syslog	`<PRI>VERSION TIMESTAMP HOSTNAME APP PROCID MSGID SD-ELEMENTS MSG`	`<134>1 2026-05-23T19:00:00Z myhost sshd 1234 ID47 [example@0 severity="error"] Failed password`

Key difference: RFC 5424 adds structured data elements (SD-ELEMENTS in square brackets), which allows key-value pairs without relying on regex parsing. RFC 3164 is simpler but harder to parse reliably.

Syslog Facilities and Severities

Syslog messages include a facility code (what generated it) and a severity level:

Severity	Code	Meaning
Emergency	0	System is unusable
Alert	1	Immediate action required
Critical	2	Critical condition
Error	3	Error condition
Warning	4	Warning condition
Notice	5	Normal but significant
Informational	6	Informational message
Debug	7	Debug-level message

Priority formula: PRI = (facility * 8) + severity. So facility 4 (auth) + severity 3 (error) = PRI 35.

Configuring rsyslog Forwarding

# On the log source — /etc/rsyslog.conf or /etc/rsyslog.d/forward.conf

# Forward all auth messages to SIEM
auth.* @siem.internal.example.com:514
auth.* @@siem.internal.example.com:6514  # TLS

# Forward specific facility/severity combos
*.info;mail.none;authpriv.none;cron.none @siem.internal.example.com:514

# Template for RFC 5424 format
$template ForwardFormat,"<%PRI%>1 %TIMESTAMP:::date-rfc3339% %HOSTNAME% %APP-NAME% %PROCID% %MSGID% [example@1 logsource=\"%HOSTNAME%\" facility=\"%syslogfacility-text%\"] %msg%\n"
*.* @@siem.internal.example.com:6514;ForwardFormat

Windows Event Forwarding (WEF)

WEF uses Windows’ built-in WinRM to forward events from endpoints to a central Windows Event Logging & Audit Policy collector server.

WEF Subscription Types

Type	Description	Use Case
Source-initiated	Endpoints push events to the collector	Best for domain-joined machines at scale — no collector discovery needed
Collector-initiated	Collector pulls events from endpoints	For workgroups or non-domain environments

Essential Event IDs to Forward

Event ID	Log Name	What It Detects
4624	Security	Logon success
4625	Security	Logon failure
4634	Security	Logoff
4648	Security	Logon with explicit credentials (RunAs)
4663	Security	Object access attempt
4672	Security	Admin logon (special privileges assigned)
4688	Security	Process creation
4698	Security	Scheduled task creation
4719	Security	Audit policy change
4720	Security	User account created
4722	Security	User account enabled
4728	Security	Security-enabled global group member added
4732	Security	Security-enabled local group member added
4740	Security	User account locked out
4776	Security	Credential validation (NTLM)
4799	Security	Security-enabled local group membership enumerated
5156	Security	Windows Filtering Platform connection
5157	Security	Windows Filtering Platform connection blocked
7045	System	Service installed

SPL — Verify WEF Health

index=windows sourcetype=WinEventLog:Security
| stats count, dc(Computer) as Endpoints by date_mday
| eval expected_endpoints = 500
| eval coverage_pct = (Endpoints / expected_endpoints) * 100
| table date_mday, count, Endpoints, coverage_pct
| where coverage_pct < 90

Log Rotation — Protect Against Disk Full and Data Loss

Log rotation prevents disks from filling up and ensures old logs are archived or purged. A failed log rotation can cause a gap in visibility during an incident.

Logrotate Configuration

# /etc/logrotate.d/syslog
/var/log/syslog
/var/log/auth.log
/var/log/kern.log
{
    rotate 7
    daily
    compress
    delaycompress
    missingok
    notifempty
    postrotate
        /usr/lib/rsyslog/rsyslog-rotate
    endscript
}

Directive	What It Does	Setting
`rotate N`	Keep N archives before deleting	7 (one week of daily)
`daily` / `weekly` / `monthly`	Rotation interval	Match retention requirements
`compress` / `delaycompress`	Compress old logs, delay by one cycle	Usually enabled
`maxage N`	Delete logs older than N days	Hard cutoff, overrides rotate count
`size N`	Rotate when file exceeds N bytes	Use with caution — may rotate too frequently
`missingok`	Don’t error if log file is missing	Always enable for resilience

Parsing and Normalization — Making Logs Queryable

Raw logs are unstructured text. Parsing extracts fields (timestamp, source IP, dest IP, user, action), and normalization maps those fields to a common schema.

Parsing Challenges

Challenge	Example	Solution
Vendor variations	Cisco ASA vs PAN firewall — different fields, different formats	Vendor-specific parsing rules, then normalize to common
Inconsistent timestamps	RFC 3164 (no year), UTC vs local, epoch vs human-readable	TIMESTAMP normalizer in SIEM pipeline
Multiline events	Stack traces, Windows events with descriptions	Multiline parser with header-detect
Encoding issues	UTF-8, UTF-16, ASCII, EBCDIC from legacy systems	Charset detector or forced UTF-8 conversion

Normalization Schema Example (ECS — Elastic Common Schema)

Raw Field (Windows)	Normalized Field (ECS)	Raw Field (Linux)
`EventID`	`event.code`	N/A (Syslog PRI or specific)
`Computer`	`host.name`	`$hostname`
`IpAddress`	`source.ip`	`src_ip` (custom)
`Account Name`	`user.name`	`$username`
`Process ID`	`process.pid`	`PID`
`CommandLine`	`process.command_line`	`CMDLINE` (auditd)

Why normalization matters:

Correlation rules that work across sources: “detect failed logins” works the same whether the source is Windows Security (4625), Linux auth.log, or VPN logs — because they all map to the same event.code and user.name fields
Dashboards work across data sources without per-vendor field references
EDR-to-SIEM field mapping works when endpoints and network logs use the same namespace

Correlation Rules — Turning Logs into Alerts

Correlation rules are the core value of a SIEM — they aggregate and analyze logs to find patterns that indicate threats.

Rule Design Patterns

Pattern	Description	Example
Threshold	Alert when count exceeds a threshold in a time window	10+ failed logins in 5 minutes
Temporal sequence	Alert when event A happens followed by event B within a timeframe	Process creation (4688) followed by outbound connection (5156) in 60s
Geo-anomaly	Alert on out-of-pattern geographical access	User logs in from US, then China within 2 hours
Statistical baseline	Alert on deviation from learned behavior	Outbound data volume > 3 standard deviations from baseline
Reference set	Alert on match against known-bad indicators	Source IP matches threat intel feed

SPL — Correlation Rule for Impossible Travel

index=* sourcetype=WinEventLog:Security EventCode=4624
| search Account_Name!="SYSTEM" AND Account_Name!="ANONYMOUS LOGON"
| iplocation Client_IP
| streamstats time_window=2h last(Country) as prev_country, last(City) as prev_city by Account_Name
| where prev_country != "" AND Country != prev_country
| eval travel_time = _time - prev_time
| eval travel_speed_kmh = distance(prev_lat, prev_lon, Latitude, Longitude, "k") / (travel_time / 3600)
| where travel_speed_kmh > 900
| eval alert = "IMPOSSIBLE TRAVEL: " . Account_Name . " logged in from " . prev_city . "," . prev_country . " then " . City . "," . Country . " in " . round(travel_time/60, 1) . " min (" . round(travel_speed_kmh, 0) . " km/h)"
| table _time, Account_Name, Workstation_Name, Client_IP, Country, City, prev_country, prev_city, travel_speed_kmh, alert

Log Retention — Storage vs Compliance vs Investigation

Every log has a cost — storage, indexing, query performance. Retention policies balance investigation needs against operational costs.

Log Type	Active Retention (Hot)	Archive Retention (Cold)	Typical Reason
Windows Security Events	90 days	1 year	Compliance (PCI, SOC 2), forensic investigation
Linux auth.log	90 days	1 year	Login forensics, breach timeline
DNS queries	30 days	6 months	C2 detection, DNS tunneling investigation
HTTP/Proxy logs	30 days	6 months	Malware download investigation
Firewall/Netflow	30 days	90 days	Threat hunting, lateral movement analysis
Cloud audit logs	90 days	1 year	Cloud incident response, compliance
EDR telemetry	30 days	90 days	Process execution, file creation forensics
Authentication logs	90 days	1 year	Account compromise investigation

SPL — Monitor SIEM log ingestion volume:

index=*
| stats sum(b) as total_bytes by source, index
| eval total_gb = round(total_bytes / 1073741824, 2)
| eval total_ingest_gb = round(sum(total_gb), 2)
| table source, index, total_gb
| addcoltotals

Common Log Management Mistakes

Mistake	Why It Hurts	Fix
Collecting everything with no retention tiering	Storage costs explode, hot indexes become unsearchable	Tier logs: hot (30d), warm (90d), cold (1yr), frozen (delete)
No timestamp normalization	Correlation rules break when timestamps are in different timezones or formats	Always convert to UTC at the collector
Skipping WEF for workgroup machines	Critical endpoints (kiosks, lab machines) are dark	Use collector-initiated subscriptions for non-domain machines
No log rotation on log hosts	Disk fills up, SIEM pipeline breaks on the most important source	Monitor /var/log and %SystemRoot%\System32\winevt\Logs disk usage
Ignoring log source health	A non-reporting endpoint looks like a quiet endpoint, not a compromised one	Build a heartbeat dashboard — every source should report at least once per hour
Over-indexing verbose fields	Full-text indexing of verbose event data doubles storage	Selective indexing: index key fields, store raw event for re-extraction

Indicators: IoC, IoA, and TTP — covers the indicators: ioc, ioa, and ttp concepts
Digital Forensics & Live Response — detection and response for T1562.001 techniques
Log Sources Overview — covers the log sources overview concepts