Playbooks
SOC Analyst Skills
A practical playbook for SOC analysts covering triage frameworks, note-taking methodology, escalation procedures, preventing alert fatigue, writing incident reports, and building a sustainable investigation workflow.
View on Graph
What SOC Analyst Skills Cover and Why They Matter
- Technical tools (SIEM, EDR, network monitoring) are only as effective as the analyst using them. The soft skills of triage, note-taking, escalation, and report writing determine whether an investigation is thorough or sloppy.
- MITRE ATT&CK maps the SOC operational process under the detection and response lifecycle — the framework that an analyst uses to process alerts and make decisions is as important as the detection rules themselves.
- The most common SOC failures are not technical — they are triage mistakes (misprioritizing alerts), knowledge loss (shift handoff gaps), escalation failures (not calling the right person at the right time), burnout (alert fatigue), and documentation gaps (reports that don’t support the incident narrative).
Triage Frameworks — Processing Alerts Efficiently
The 5-5-5 Triage Framework
When a new alert arrives, spend no more than 5 minutes per phase:
| Phase | Time | Question | Action |
|---|---|---|---|
| Phase 1: Initial Triage | 5 minutes | Is this a confirmed threat or a probable false positive? | Check the alert. Review the rule that fired. Check source/destination reputation. |
| Phase 2: Scope Assessment | 5 minutes | What is affected and how serious is it? | Check affected host(s), user(s), data classification, and network segment. |
| Phase 3: Action Decision | 5 minutes | Do I escalate, contain, or close? | Apply the decision matrix below. |
Alert Disposition Matrix
| Indicator | Likely | Action |
|---|---|---|
| Known-bad IOC + EDR detects execution | Confirmed threat | Escalate immediately. Contain host. |
| Known-bad IOC but no execution evidence | Suspicious — investigate | Deep dive on alert context. Check host logs around the IOC timestamp. |
| Behavioral alert + unusual process tree | Suspicious — investigate | Check parent-child process relationships. Correlate with network logs. |
| Behavioral alert + normal process + known admin user | Likely false positive | Verify with the user/system admin. Document and close. |
| Threshold-based alert (X failed logins) | Investigative | Does the account exist? Is it a service account? Was it a brute force or user error? |
| Threat intel match + no log context | Low priority | Review threat intel source quality. Check for log gaps. Document. |
The Pyramid of Triage
When investigating any alert (e.g., a Cobalt Strike beacon detected by RITA), work through these layers:
- User context: Who is the user? Admin or standard? Service or human? Known incident history?
- Host context: Is this host critical? Internet-facing? DMZ? Internal only?
- Process context: What process triggered the alert? What was its parent? What did it spawn?
- Network context: Where was it connecting? Was the connection successful? Encrypted?
- Timeline context: When did it start? Are there related events before/after?
- Correlation context: Does any other data source confirm or contradict this alert?
Note-Taking — Structured Documentation That Survives Shift Handoff
The IR Note Template
Every investigation gets a structured note document:
# ALERT INVESTIGATION NOTE
# ========================
Alert ID: SOC-2026-05-23-001
Trigger Rule: Beacon Detection — High Score
Source SIEM: Splunk ES
Status: [Open | Escalated | Closed | False Positive]
## TIMELINE
2026-05-23 14:00:00 — Alert fired (beacon score 0.93 from HOST-01 to 185.220.101.45:443)
2026-05-23 14:02:00 — Initial triage began
2026-05-23 14:05:00 — Checked EDR on HOST-01 — discovered svchost.exe spawning cmd.exe
2026-05-23 14:08:00 — Confirmed destination IP: risk level HIGH on AbuseIPDB
2026-05-23 14:12:00 — Escalated to incident lead (John Doe)
2026-05-23 14:15:00 — Host isolated
## ARTIFACTS
- SHA256: parent.exe
- C2 IP: 185.220.101.45
- C2 port: 443
- JA3: 6734f37431670b3ab4292b8f60f29984
## ANALYSIS NOTES
- Beacon interval: ~60s, jitter ~2.3s (classic Cobalt Strike)
- EDR confirms: suspicious svchost.exe (not running from System32)
- Parent process: winword.exe (phishing delivery likely)
- No other affected hosts found via RITA correlation
## DECISION
Escalated to Incident Response Lead at 14:12 UTC.
System isolated. IR case INC-2026-05-23-001 opened.
## NEXT STEPS (for next analyst / IR lead)
1. Full memory capture of HOST-01
2. Network log analysis for C2 timeline
3. Email logs for phishing email identification
4. User interview to confirm phishing click
Why Structured Notes Matter
| Without Structured Notes | With Structured Notes |
|---|---|
| ”I looked at the server. It was suspicious.” | Records: timeline, artifacts found, data sources checked, decision rationale |
| Next analyst must redo the investigation | Next analyst picks up from the Decision section — can see exactly what was found |
| Cannot be used for incident reporting | Notes feed directly into incident report |
Escalation Procedure — When and How to Escalate
Escalation Criteria
Escalate immediately when any of these conditions are met:
| Condition | What It Means | Escalate To |
|---|---|---|
| Confirmed C2 beacon | Host is communicating with known C2 infrastructure | Incident Lead |
| Ransomware encryption detected | Active destruction of data | IR Lead + CISO (potential crisis) |
| PII/exfiltration confirmed | Data leaving the environment | Privacy Officer + IR Lead |
| Domain admin credential theft | Attacker has or pivoting toward domain-level access | IR Lead + AD team |
| Rootkit/bootkit | Persistence at kernel or boot level | IR Lead + Forensics team |
| Zero-day exploitation | Unknown vulnerability used | IR Lead + Threat Intel |
| Physical access breach | Attacker has physical access to systems | Security + Legal |
| Critical system impact | Production, revenue-generating, or safety-critical system | Operations + IR Lead |
Escalation Communication Template
## ESCALATION NOTIFICATION
Priority: [P1/P2/P3]
Alert ID: SOC-2026-05-23-001
Analyst: [Name]
### What happened (30-second summary)
[Write what a non-technical manager needs to know in 2 sentences.]
### Evidence summary
[List the key findings with data source — not speculation.]
### Affected assets
[List hosts, users, data classifications.]
### Actions taken so far
[List triage steps already completed.]
### Actions needed
[What do you need from the person you are escalating to?]
### Current risk assessment
[Your best assessment: is this contained or escalating?]
Preventing Alert Fatigue
Alert fatigue occurs when analysts are overwhelmed by alerts, leading to burnout, missed critical alerts, and high turnover. It is a systemic problem, not an individual weakness.
Signs of Alert Fatigue
- “The SIEM is just noise” — analysts stop taking alerts seriously
- High number of alerts closed as false positive without investigation
- Dwell time increases — analysts take longer to respond
- Analysts stop checking correlated data sources
- Increase in missed alerts (alerts that were clearly actionable but ignored)
Mitigations — What Analysts Can Do
| Technique | How It Helps | How to Do It |
|---|---|---|
| Batch processing | Reduce context-switching overhead | Set aside dedicated blocks (e.g., 45 min triage, 15 min documentation) |
| Known-false-positive playbook | Reduce decision fatigue | Create a pinned doc of common FPs and their reasons |
| Triage timer | Prevent spending too long on one alert | Use a physical timer — 5 minutes, then decide |
| Alert quality feedback | Improve rule quality over time | When you close an FP, note why. Submit rule improvement requests to detection engineering. |
| Team huddles | Share cognitive load | End-of-shift 5-minute standup to surface tricky alerts |
Mitigations — What SOC Management Can Do
| Technique | Description |
|---|---|
| Tune rules quarterly | Review false positive rate for every rule. Suppress or rewrite rules with > 20% FP. |
| Alert priority tiers | Tier 1 (auto-close known FPs), Tier 2 (investigate within 30 min), Tier 3 (investigate within 2h) |
| Alert volume caps | No analyst should see more than 100 actionable alerts per shift |
| Automated enrichment | Auto-tag alerts with host criticality, user role, and IOC reputation before analyst sees them |
| Burnout indicators | Track average alert processing time, missed alerts, and sick days per analyst |
Writing Incident Reports
The 5-Part Incident Report
Every incident that requires escalation gets a written report. Use this structure:
## 1. Executive Summary (1 paragraph)
What happened, what was affected, whether it was contained, and the current risk level.
Write this for C-level readers who need a decision without reading the full report.
## 2. Timeline
Chronological sequence of events: initial detection, analyst actions, escalation, containment, eradication, recovery.
## 3. Investigation Details
Full technical narrative: what data sources were checked, what was found, what was ruled out.
This is the analyst's story of the investigation.
## 4. Indicators of Compromise
Bulleted list of all IOCs with types: IPs, domains, hashes, registry keys, filenames, mutexes.
## 5. Recommendations
What should change to prevent recurrence: rule tuning, additional logging, configuration changes, process improvements.
Report Writing Principles
| Principle | Bad Example | Good Example |
|---|---|---|
| Be specific | ”The system was compromised" | "HOST-01 established 142 connections to 185.220.101.45:443 at ~60s intervals between 12:00-14:30 UTC” |
| Distinguish evidence from speculation | ”The attacker likely used phishing" | "Evidence: winword.exe was the parent of svchost.exe (phishing likely, but email logs are being retrieved for confirmation)“ |
| Date/time everything | ”This happened yesterday" | "The first beacon was observed at 2026-05-23T12:00:00 UTC” |
| Write for multiple audiences | Full technical detail only | Use executive summary + technical appendix |
| No editorializing | ”The attacker was clever" | "The attacker used process hollowing to inject shellcode into a legitimate svchost.exe process” |
Shift Handoff Best Practices
| Do | Don’t |
|---|---|
| Write handoff notes as you investigate | Leave voice notes or mental notes |
| Flag alerts that need follow-up | Assume the next analyst will check everything |
| Note which data sources are down | Only mention what you checked |
| Include the current status of every open case | Say “nothing much happened” |
| Note pending escalations or awaiting replies | Close out before verifying resolution |
Handoff Template
# SHIFT HANDOFF — [Shift Name, e.g., Day/Mid/Overnight]
Date: 2026-05-23
Analyst offboarding: [Name] → Analyst onboarding: [Name]
## OPEN ALERTS
1. SOC-2026-05-23-001 — C2 beacon — ESCALATED to IR Lead at 14:12 UTC
Status: Host isolated, memory capture pending
Next step: IR lead will handle from here
2. SOC-2026-05-23-002 — Failed logins on LEGACY-SRV — INVESTIGATING
Status: Account exists, appears to be user error (stale password)
Next step: Check with user, close if confirmed. ~5 min work.
## DATA SOURCE ISSUES
- Elastic cluster re-indexing until 01:00 UTC (slow queries expected)
- AWS CloudTrail is 30 minutes behind
## TO WATCH
- CVE-2026-NNNNN active exploitation reports — no hits in our env yet
- Phishing campaign targeting finance team — yesterday's lures were reported
## PRIORITIES FOR NEXT SHIFT
1. Confirm closure of SOC-2026-05-23-002
2. Check for CVE-2026-NNNNN exploitation attempts
3. Verify Elastic re-index completed
Related
- SOC Shift Handoff — step-by-step soc shift handoff response procedures
- Active Directory Compromise Response — detection and response for T1558 techniques
- Business Email Compromise Response — detection and response for T1566, T1114, T1098, T1586 techniques
