Fundamentals

TA0040

SIEM Log Management

A practical guide to SIEM log management for SOC analysts — syslog setup, Windows Event Forwarding, log rotation, parsing, normalization, correlation rules, and retention strategies.

View on Graph

What SIEM Log Management Covers and Why It Matters

  • A SIEM (Security Information and Event Management) system is only as useful as the logs it ingests — garbage in, garbage out applies more to SIEM than almost any other security tool.
  • Log management encompasses the full pipeline: generation, collection, transport, parsing, normalization, correlation, storage, and retirement.
  • The three most common transport mechanisms are syslog (for network devices, Linux hosts, and appliances), Windows Event Forwarding (WEF, for Windows endpoints), and API pulls (for cloud services like AWS CloudTrail, Azure AD, and SaaS platforms).
  • MITRE ATT&CK maps log collection and analysis as a detection control spanning multiple techniques — consistent logging is the foundation of T1562.001 (Impair Defenses: Disable or Modify Tools) detection.

Syslog — The Universal Log Transport

Syslog Formats

Syslog comes in two major RFC standards. Most devices support one or the other:

RFCNameHeader FormatExample
RFC 3164BSD syslog<PRI>TIMESTAMP HOSTNAME MSG<134>Oct 11 22:14:15 myhost Failed password for root
RFC 5424Structured syslog<PRI>VERSION TIMESTAMP HOSTNAME APP PROCID MSGID SD-ELEMENTS MSG<134>1 2026-05-23T19:00:00Z myhost sshd 1234 ID47 [example@0 severity="error"] Failed password

Key difference: RFC 5424 adds structured data elements (SD-ELEMENTS in square brackets), which allows key-value pairs without relying on regex parsing. RFC 3164 is simpler but harder to parse reliably.

Syslog Facilities and Severities

Syslog messages include a facility code (what generated it) and a severity level:

SeverityCodeMeaning
Emergency0System is unusable
Alert1Immediate action required
Critical2Critical condition
Error3Error condition
Warning4Warning condition
Notice5Normal but significant
Informational6Informational message
Debug7Debug-level message

Priority formula: PRI = (facility * 8) + severity. So facility 4 (auth) + severity 3 (error) = PRI 35.

Configuring rsyslog Forwarding

# On the log source — /etc/rsyslog.conf or /etc/rsyslog.d/forward.conf

# Forward all auth messages to SIEM
auth.* @siem.internal.example.com:514
auth.* @@siem.internal.example.com:6514  # TLS

# Forward specific facility/severity combos
*.info;mail.none;authpriv.none;cron.none @siem.internal.example.com:514

# Template for RFC 5424 format
$template ForwardFormat,"<%PRI%>1 %TIMESTAMP:::date-rfc3339% %HOSTNAME% %APP-NAME% %PROCID% %MSGID% [example@1 logsource=\"%HOSTNAME%\" facility=\"%syslogfacility-text%\"] %msg%\n"
*.* @@siem.internal.example.com:6514;ForwardFormat

Windows Event Forwarding (WEF)

WEF uses Windows’ built-in WinRM to forward events from endpoints to a central Windows Event Logging & Audit Policy collector server.

WEF Subscription Types

TypeDescriptionUse Case
Source-initiatedEndpoints push events to the collectorBest for domain-joined machines at scale — no collector discovery needed
Collector-initiatedCollector pulls events from endpointsFor workgroups or non-domain environments

Essential Event IDs to Forward

Event IDLog NameWhat It Detects
4624SecurityLogon success
4625SecurityLogon failure
4634SecurityLogoff
4648SecurityLogon with explicit credentials (RunAs)
4663SecurityObject access attempt
4672SecurityAdmin logon (special privileges assigned)
4688SecurityProcess creation
4698SecurityScheduled task creation
4719SecurityAudit policy change
4720SecurityUser account created
4722SecurityUser account enabled
4728SecuritySecurity-enabled global group member added
4732SecuritySecurity-enabled local group member added
4740SecurityUser account locked out
4776SecurityCredential validation (NTLM)
4799SecuritySecurity-enabled local group membership enumerated
5156SecurityWindows Filtering Platform connection
5157SecurityWindows Filtering Platform connection blocked
7045SystemService installed

SPL — Verify WEF Health

index=windows sourcetype=WinEventLog:Security
| stats count, dc(Computer) as Endpoints by date_mday
| eval expected_endpoints = 500
| eval coverage_pct = (Endpoints / expected_endpoints) * 100
| table date_mday, count, Endpoints, coverage_pct
| where coverage_pct < 90

Log Rotation — Protect Against Disk Full and Data Loss

Log rotation prevents disks from filling up and ensures old logs are archived or purged. A failed log rotation can cause a gap in visibility during an incident.

Logrotate Configuration

# /etc/logrotate.d/syslog
/var/log/syslog
/var/log/auth.log
/var/log/kern.log
{
    rotate 7
    daily
    compress
    delaycompress
    missingok
    notifempty
    postrotate
        /usr/lib/rsyslog/rsyslog-rotate
    endscript
}
DirectiveWhat It DoesSetting
rotate NKeep N archives before deleting7 (one week of daily)
daily / weekly / monthlyRotation intervalMatch retention requirements
compress / delaycompressCompress old logs, delay by one cycleUsually enabled
maxage NDelete logs older than N daysHard cutoff, overrides rotate count
size NRotate when file exceeds N bytesUse with caution — may rotate too frequently
missingokDon’t error if log file is missingAlways enable for resilience

Parsing and Normalization — Making Logs Queryable

Raw logs are unstructured text. Parsing extracts fields (timestamp, source IP, dest IP, user, action), and normalization maps those fields to a common schema.

Parsing Challenges

ChallengeExampleSolution
Vendor variationsCisco ASA vs PAN firewall — different fields, different formatsVendor-specific parsing rules, then normalize to common
Inconsistent timestampsRFC 3164 (no year), UTC vs local, epoch vs human-readableTIMESTAMP normalizer in SIEM pipeline
Multiline eventsStack traces, Windows events with descriptionsMultiline parser with header-detect
Encoding issuesUTF-8, UTF-16, ASCII, EBCDIC from legacy systemsCharset detector or forced UTF-8 conversion

Normalization Schema Example (ECS — Elastic Common Schema)

Raw Field (Windows)Normalized Field (ECS)Raw Field (Linux)
EventIDevent.codeN/A (Syslog PRI or specific)
Computerhost.name$hostname
IpAddresssource.ipsrc_ip (custom)
Account Nameuser.name$username
Process IDprocess.pidPID
CommandLineprocess.command_lineCMDLINE (auditd)

Why normalization matters:

  • Correlation rules that work across sources: “detect failed logins” works the same whether the source is Windows Security (4625), Linux auth.log, or VPN logs — because they all map to the same event.code and user.name fields
  • Dashboards work across data sources without per-vendor field references
  • EDR-to-SIEM field mapping works when endpoints and network logs use the same namespace

Correlation Rules — Turning Logs into Alerts

Correlation rules are the core value of a SIEM — they aggregate and analyze logs to find patterns that indicate threats.

Rule Design Patterns

PatternDescriptionExample
ThresholdAlert when count exceeds a threshold in a time window10+ failed logins in 5 minutes
Temporal sequenceAlert when event A happens followed by event B within a timeframeProcess creation (4688) followed by outbound connection (5156) in 60s
Geo-anomalyAlert on out-of-pattern geographical accessUser logs in from US, then China within 2 hours
Statistical baselineAlert on deviation from learned behaviorOutbound data volume > 3 standard deviations from baseline
Reference setAlert on match against known-bad indicatorsSource IP matches threat intel feed

SPL — Correlation Rule for Impossible Travel

index=* sourcetype=WinEventLog:Security EventCode=4624
| search Account_Name!="SYSTEM" AND Account_Name!="ANONYMOUS LOGON"
| iplocation Client_IP
| streamstats time_window=2h last(Country) as prev_country, last(City) as prev_city by Account_Name
| where prev_country != "" AND Country != prev_country
| eval travel_time = _time - prev_time
| eval travel_speed_kmh = distance(prev_lat, prev_lon, Latitude, Longitude, "k") / (travel_time / 3600)
| where travel_speed_kmh > 900
| eval alert = "IMPOSSIBLE TRAVEL: " . Account_Name . " logged in from " . prev_city . "," . prev_country . " then " . City . "," . Country . " in " . round(travel_time/60, 1) . " min (" . round(travel_speed_kmh, 0) . " km/h)"
| table _time, Account_Name, Workstation_Name, Client_IP, Country, City, prev_country, prev_city, travel_speed_kmh, alert

Log Retention — Storage vs Compliance vs Investigation

Every log has a cost — storage, indexing, query performance. Retention policies balance investigation needs against operational costs.

Log TypeActive Retention (Hot)Archive Retention (Cold)Typical Reason
Windows Security Events90 days1 yearCompliance (PCI, SOC 2), forensic investigation
Linux auth.log90 days1 yearLogin forensics, breach timeline
DNS queries30 days6 monthsC2 detection, DNS tunneling investigation
HTTP/Proxy logs30 days6 monthsMalware download investigation
Firewall/Netflow30 days90 daysThreat hunting, lateral movement analysis
Cloud audit logs90 days1 yearCloud incident response, compliance
EDR telemetry30 days90 daysProcess execution, file creation forensics
Authentication logs90 days1 yearAccount compromise investigation

SPL — Monitor SIEM log ingestion volume:

index=*
| stats sum(b) as total_bytes by source, index
| eval total_gb = round(total_bytes / 1073741824, 2)
| eval total_ingest_gb = round(sum(total_gb), 2)
| table source, index, total_gb
| addcoltotals

Common Log Management Mistakes

MistakeWhy It HurtsFix
Collecting everything with no retention tieringStorage costs explode, hot indexes become unsearchableTier logs: hot (30d), warm (90d), cold (1yr), frozen (delete)
No timestamp normalizationCorrelation rules break when timestamps are in different timezones or formatsAlways convert to UTC at the collector
Skipping WEF for workgroup machinesCritical endpoints (kiosks, lab machines) are darkUse collector-initiated subscriptions for non-domain machines
No log rotation on log hostsDisk fills up, SIEM pipeline breaks on the most important sourceMonitor /var/log and %SystemRoot%\System32\winevt\Logs disk usage
Ignoring log source healthA non-reporting endpoint looks like a quiet endpoint, not a compromised oneBuild a heartbeat dashboard — every source should report at least once per hour
Over-indexing verbose fieldsFull-text indexing of verbose event data doubles storageSelective indexing: index key fields, store raw event for re-extraction

Sources