Playbooks

SOC Analyst Skills

A practical playbook for SOC analysts covering triage frameworks, note-taking methodology, escalation procedures, preventing alert fatigue, writing incident reports, and building a sustainable investigation workflow.

View on Graph

What SOC Analyst Skills Cover and Why They Matter

  • Technical tools (SIEM, EDR, network monitoring) are only as effective as the analyst using them. The soft skills of triage, note-taking, escalation, and report writing determine whether an investigation is thorough or sloppy.
  • MITRE ATT&CK maps the SOC operational process under the detection and response lifecycle — the framework that an analyst uses to process alerts and make decisions is as important as the detection rules themselves.
  • The most common SOC failures are not technical — they are triage mistakes (misprioritizing alerts), knowledge loss (shift handoff gaps), escalation failures (not calling the right person at the right time), burnout (alert fatigue), and documentation gaps (reports that don’t support the incident narrative).

Triage Frameworks — Processing Alerts Efficiently

The 5-5-5 Triage Framework

When a new alert arrives, spend no more than 5 minutes per phase:

PhaseTimeQuestionAction
Phase 1: Initial Triage5 minutesIs this a confirmed threat or a probable false positive?Check the alert. Review the rule that fired. Check source/destination reputation.
Phase 2: Scope Assessment5 minutesWhat is affected and how serious is it?Check affected host(s), user(s), data classification, and network segment.
Phase 3: Action Decision5 minutesDo I escalate, contain, or close?Apply the decision matrix below.

Alert Disposition Matrix

IndicatorLikelyAction
Known-bad IOC + EDR detects executionConfirmed threatEscalate immediately. Contain host.
Known-bad IOC but no execution evidenceSuspicious — investigateDeep dive on alert context. Check host logs around the IOC timestamp.
Behavioral alert + unusual process treeSuspicious — investigateCheck parent-child process relationships. Correlate with network logs.
Behavioral alert + normal process + known admin userLikely false positiveVerify with the user/system admin. Document and close.
Threshold-based alert (X failed logins)InvestigativeDoes the account exist? Is it a service account? Was it a brute force or user error?
Threat intel match + no log contextLow priorityReview threat intel source quality. Check for log gaps. Document.

The Pyramid of Triage

When investigating any alert (e.g., a Cobalt Strike beacon detected by RITA), work through these layers:

  1. User context: Who is the user? Admin or standard? Service or human? Known incident history?
  2. Host context: Is this host critical? Internet-facing? DMZ? Internal only?
  3. Process context: What process triggered the alert? What was its parent? What did it spawn?
  4. Network context: Where was it connecting? Was the connection successful? Encrypted?
  5. Timeline context: When did it start? Are there related events before/after?
  6. Correlation context: Does any other data source confirm or contradict this alert?

Note-Taking — Structured Documentation That Survives Shift Handoff

The IR Note Template

Every investigation gets a structured note document:

# ALERT INVESTIGATION NOTE
# ========================
Alert ID:        SOC-2026-05-23-001
Trigger Rule:    Beacon Detection — High Score
Source SIEM:     Splunk ES
Status:          [Open | Escalated | Closed | False Positive]

## TIMELINE
2026-05-23 14:00:00 — Alert fired (beacon score 0.93 from HOST-01 to 185.220.101.45:443)
2026-05-23 14:02:00 — Initial triage began
2026-05-23 14:05:00 — Checked EDR on HOST-01 — discovered svchost.exe spawning cmd.exe
2026-05-23 14:08:00 — Confirmed destination IP: risk level HIGH on AbuseIPDB
2026-05-23 14:12:00 — Escalated to incident lead (John Doe)
2026-05-23 14:15:00 — Host isolated

## ARTIFACTS
- SHA256: parent.exe
- C2 IP: 185.220.101.45
- C2 port: 443
- JA3: 6734f37431670b3ab4292b8f60f29984

## ANALYSIS NOTES
- Beacon interval: ~60s, jitter ~2.3s (classic Cobalt Strike)
- EDR confirms: suspicious svchost.exe (not running from System32)
- Parent process: winword.exe (phishing delivery likely)
- No other affected hosts found via RITA correlation

## DECISION
Escalated to Incident Response Lead at 14:12 UTC.
System isolated. IR case INC-2026-05-23-001 opened.

## NEXT STEPS (for next analyst / IR lead)
1. Full memory capture of HOST-01
2. Network log analysis for C2 timeline
3. Email logs for phishing email identification
4. User interview to confirm phishing click

Why Structured Notes Matter

Without Structured NotesWith Structured Notes
”I looked at the server. It was suspicious.”Records: timeline, artifacts found, data sources checked, decision rationale
Next analyst must redo the investigationNext analyst picks up from the Decision section — can see exactly what was found
Cannot be used for incident reportingNotes feed directly into incident report

Escalation Procedure — When and How to Escalate

Escalation Criteria

Escalate immediately when any of these conditions are met:

ConditionWhat It MeansEscalate To
Confirmed C2 beaconHost is communicating with known C2 infrastructureIncident Lead
Ransomware encryption detectedActive destruction of dataIR Lead + CISO (potential crisis)
PII/exfiltration confirmedData leaving the environmentPrivacy Officer + IR Lead
Domain admin credential theftAttacker has or pivoting toward domain-level accessIR Lead + AD team
Rootkit/bootkitPersistence at kernel or boot levelIR Lead + Forensics team
Zero-day exploitationUnknown vulnerability usedIR Lead + Threat Intel
Physical access breachAttacker has physical access to systemsSecurity + Legal
Critical system impactProduction, revenue-generating, or safety-critical systemOperations + IR Lead

Escalation Communication Template

## ESCALATION NOTIFICATION
Priority: [P1/P2/P3]
Alert ID: SOC-2026-05-23-001
Analyst: [Name]

### What happened (30-second summary)
[Write what a non-technical manager needs to know in 2 sentences.]

### Evidence summary
[List the key findings with data source — not speculation.]

### Affected assets
[List hosts, users, data classifications.]

### Actions taken so far
[List triage steps already completed.]

### Actions needed
[What do you need from the person you are escalating to?]

### Current risk assessment
[Your best assessment: is this contained or escalating?]

Preventing Alert Fatigue

Alert fatigue occurs when analysts are overwhelmed by alerts, leading to burnout, missed critical alerts, and high turnover. It is a systemic problem, not an individual weakness.

Signs of Alert Fatigue

  • “The SIEM is just noise” — analysts stop taking alerts seriously
  • High number of alerts closed as false positive without investigation
  • Dwell time increases — analysts take longer to respond
  • Analysts stop checking correlated data sources
  • Increase in missed alerts (alerts that were clearly actionable but ignored)

Mitigations — What Analysts Can Do

TechniqueHow It HelpsHow to Do It
Batch processingReduce context-switching overheadSet aside dedicated blocks (e.g., 45 min triage, 15 min documentation)
Known-false-positive playbookReduce decision fatigueCreate a pinned doc of common FPs and their reasons
Triage timerPrevent spending too long on one alertUse a physical timer — 5 minutes, then decide
Alert quality feedbackImprove rule quality over timeWhen you close an FP, note why. Submit rule improvement requests to detection engineering.
Team huddlesShare cognitive loadEnd-of-shift 5-minute standup to surface tricky alerts

Mitigations — What SOC Management Can Do

TechniqueDescription
Tune rules quarterlyReview false positive rate for every rule. Suppress or rewrite rules with > 20% FP.
Alert priority tiersTier 1 (auto-close known FPs), Tier 2 (investigate within 30 min), Tier 3 (investigate within 2h)
Alert volume capsNo analyst should see more than 100 actionable alerts per shift
Automated enrichmentAuto-tag alerts with host criticality, user role, and IOC reputation before analyst sees them
Burnout indicatorsTrack average alert processing time, missed alerts, and sick days per analyst

Writing Incident Reports

The 5-Part Incident Report

Every incident that requires escalation gets a written report. Use this structure:

## 1. Executive Summary (1 paragraph)
What happened, what was affected, whether it was contained, and the current risk level.
Write this for C-level readers who need a decision without reading the full report.

## 2. Timeline
Chronological sequence of events: initial detection, analyst actions, escalation, containment, eradication, recovery.

## 3. Investigation Details
Full technical narrative: what data sources were checked, what was found, what was ruled out.
This is the analyst's story of the investigation.

## 4. Indicators of Compromise
Bulleted list of all IOCs with types: IPs, domains, hashes, registry keys, filenames, mutexes.

## 5. Recommendations
What should change to prevent recurrence: rule tuning, additional logging, configuration changes, process improvements.

Report Writing Principles

PrincipleBad ExampleGood Example
Be specific”The system was compromised""HOST-01 established 142 connections to 185.220.101.45:443 at ~60s intervals between 12:00-14:30 UTC”
Distinguish evidence from speculation”The attacker likely used phishing""Evidence: winword.exe was the parent of svchost.exe (phishing likely, but email logs are being retrieved for confirmation)“
Date/time everything”This happened yesterday""The first beacon was observed at 2026-05-23T12:00:00 UTC”
Write for multiple audiencesFull technical detail onlyUse executive summary + technical appendix
No editorializing”The attacker was clever""The attacker used process hollowing to inject shellcode into a legitimate svchost.exe process”

Shift Handoff Best Practices

DoDon’t
Write handoff notes as you investigateLeave voice notes or mental notes
Flag alerts that need follow-upAssume the next analyst will check everything
Note which data sources are downOnly mention what you checked
Include the current status of every open caseSay “nothing much happened”
Note pending escalations or awaiting repliesClose out before verifying resolution

Handoff Template

# SHIFT HANDOFF — [Shift Name, e.g., Day/Mid/Overnight]
Date: 2026-05-23
Analyst offboarding: [Name] → Analyst onboarding: [Name]

## OPEN ALERTS
1. SOC-2026-05-23-001 — C2 beacon — ESCALATED to IR Lead at 14:12 UTC
   Status: Host isolated, memory capture pending
   Next step: IR lead will handle from here

2. SOC-2026-05-23-002 — Failed logins on LEGACY-SRV — INVESTIGATING
   Status: Account exists, appears to be user error (stale password)
   Next step: Check with user, close if confirmed. ~5 min work.

## DATA SOURCE ISSUES
- Elastic cluster re-indexing until 01:00 UTC (slow queries expected)
- AWS CloudTrail is 30 minutes behind

## TO WATCH
- CVE-2026-NNNNN active exploitation reports — no hits in our env yet
- Phishing campaign targeting finance team — yesterday's lures were reported

## PRIORITIES FOR NEXT SHIFT
1. Confirm closure of SOC-2026-05-23-002
2. Check for CVE-2026-NNNNN exploitation attempts
3. Verify Elastic re-index completed

Sources