Kubernetes Manifest Hardening Checklist: 25 Fixes
Kubernetes manifest hardening checklist - 25 enumerated fixes mapping each risk to the exact YAML and the scanner that flags it before prod.
You can spend an afternoon reading the CIS Kubernetes Benchmark, or you can run a manifest through a scanner and watch it light up red. Both are useful, but neither hands your team a single artifact they can act on before a release. That’s what this is: a 25-point Kubernetes manifest hardening checklist grouped into five categories, where every item pairs the risk, the exact YAML fix, and the scanner that flags it.
The format is deliberate. An enumerated, sourced list is the thing you hand a developer the day before a compliance audit, and it’s the thing you wire into CI so the fixes stay fixed.
How to use this checklist
Every item below follows the same shape so you can quote or copy any single one:
- Risk - what goes wrong if the manifest ships as-is.
- Fix - the exact YAML to add or change.
- Tool - the scanner that flags it, from kube-score, Trivy, Checkov, kubeaudit, or kube-bench.
The 25 items are grouped into five categories so you can audit by area: securityContext, resource limits and reliability, RBAC and service accounts, network policy and secrets, and supply chain and admission control. Most map to a control area in the CIS Kubernetes Benchmark, which is the authoritative source when an auditor asks “says who?”
The most important thing you can do with this list is stop running it by hand. Turn it into an automated CI gate. A manual review catches issues the week someone remembers to do it; a pipeline step catches them on every pull request. We cover the CI wiring in the last section, and if you want the deeper scanner trade-offs, see our kube-score vs kubeaudit vs Checkov vs Trivy comparison.
Pod and container securityContext (items 1-7)
This is where most real-world breaches start. A container running as root with a writable filesystem and full capabilities is one application bug away from a node compromise. CIS Benchmark section 5.2 covers nearly all of these.
1. Run as a non-root user Risk: A root process inside the container maps to root-equivalent power on the node if it escapes. Fix:
securityContext:
runAsNonRoot: true
runAsUser: 10001 # any non-zero UID
Tool: kube-score, kubeaudit (kubeaudit nonroot), Trivy (KSV012).
2. Make the root filesystem read-only Risk: An attacker who lands code execution can write tools, binaries, or persistence to disk. Fix:
securityContext:
readOnlyRootFilesystem: true
Mount an emptyDir for any path the app genuinely needs to write.
Tool: kube-score, kubeaudit (kubeaudit rootfs), Trivy (KSV014).
3. Disable privilege escalation
Risk: setuid binaries or kernel calls let a process gain more privileges than it started with.
Fix:
securityContext:
allowPrivilegeEscalation: false
Tool: kubeaudit (kubeaudit allowpe), Trivy (KSV001), Checkov (CKV_K8S_20).
4. Drop ALL capabilities, add back only what’s needed Risk: The default capability set gives containers far more kernel access than a typical workload uses. Fix:
securityContext:
capabilities:
drop: ["ALL"]
# add: ["NET_BIND_SERVICE"] # only if you truly need it
Tool: kubeaudit (kubeaudit caps), Trivy (KSV003), Checkov (CKV_K8S_28).
5. Never run privileged containers
Risk: privileged: true disables almost every container isolation boundary - it’s effectively root on the host.
Fix: Remove the flag entirely, or set it explicitly false.
securityContext:
privileged: false
Tool: kube-bench (CIS 5.2.x), Trivy (KSV017), Checkov (CKV_K8S_16).
6. No hostNetwork, hostPID, or hostIPC Risk: Sharing host namespaces lets a pod sniff node traffic, see host processes, or read host shared memory. Fix: Leave these unset, or pin them to false at the pod spec:
spec:
hostNetwork: false
hostPID: false
hostIPC: false
Tool: kubeaudit (kubeaudit hostns), Trivy (KSV009/KSV010), Checkov (CKV_K8S_19).
7. Set the seccomp profile to RuntimeDefault Risk: Without a seccomp profile, the container can make any syscall, widening the kernel attack surface. Fix:
securityContext:
seccompProfile:
type: RuntimeDefault
Tool: kube-score, Trivy (KSV030), Checkov (CKV_K8S_31).
Resource limits and reliability (items 8-12)
Hardening isn’t only about attackers. A pod with no limits can starve its neighbors; a pod with no probes can sit dead in the rotation. These are the reliability controls that keep one bad workload from taking down the node.
8. Set CPU and memory requests and limits on every container Risk: An unbounded container can consume the whole node, triggering OOM kills and noisy-neighbor outages. Fix:
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
Tool: kube-score, Polaris, Trivy (KSV011/KSV015), Checkov (CKV_K8S_10-13).
9. Define a liveness probe Risk: A hung process keeps serving traffic until something restarts it manually. Fix:
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
Tool: kube-score, Polaris.
10. Define a readiness probe Risk: Kubernetes routes traffic to a pod that isn’t ready, causing failed requests during rollout. Fix:
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
Tool: kube-score, Polaris.
11. Set a replica count and a PodDisruptionBudget
Risk: A single replica means any node drain, upgrade, or eviction is a full outage.
Fix: Run replicas: 3 or more, and protect availability during voluntary disruptions:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: web
Tool: kube-score (pod topology / PDB checks), Polaris.
12. Pin images by tag or digest and set imagePullPolicy
Risk: :latest is a moving target - the image you tested isn’t guaranteed to be the image that runs.
Fix:
image: registry.example.com/web@sha256:9b2c... # digest is strongest
imagePullPolicy: IfNotPresent
Tool: kube-score, Trivy (KSV013), Checkov (CKV_K8S_14/15).
RBAC and service accounts (items 13-17)
Most clusters leak privilege through RBAC that was scoped wide “just to get it working.” These five items pull workloads back to least privilege. CIS Benchmark section 5.1 is the reference.
13. No wildcard verbs or resources in Roles
Risk: A Role granting * on * is a standing escalation path if the service account is ever compromised.
Fix: Enumerate exactly what’s needed:
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"]
Tool: kubeaudit, Checkov (CKV_K8S_49), Trivy (KSV045/KSV046).
14. Disable service account token auto-mounting where it isn’t needed Risk: Every pod that auto-mounts a token hands an attacker an API credential by default. Fix:
spec:
automountServiceAccountToken: false
Set it on the pod spec or the service account, and only enable it for workloads that call the API.
Tool: kubeaudit (kubeaudit autmount), Checkov (CKV_K8S_38), Trivy (KSV036).
15. No cluster-admin bindings to workloads
Risk: Binding a workload to cluster-admin means a single compromised pod owns the entire cluster.
Fix: Bind to a scoped Role or ClusterRole, never the built-in cluster-admin. Audit existing bindings:
kubectl get clusterrolebindings -o json | \
jq '.items[] | select(.roleRef.name=="cluster-admin") | .metadata.name'
Tool: kubeaudit, kube-bench (CIS 5.1.1).
16. Use a dedicated service account per workload
Risk: Sharing the default service account means every app inherits the same permissions, so least privilege is impossible.
Fix:
spec:
serviceAccountName: web-app-sa
Tool: Checkov (CKV_K8S_37), kubeaudit.
17. Don’t grant secret-wide read access broadly
Risk: A Role with get/list on all secrets lets one workload read every credential in the namespace.
Fix: Scope secret access with resourceNames to the specific secrets a workload needs.
Tool: Checkov, Trivy (KSV041/KSV113).
Network policy and secrets (items 18-22)
By default, every pod in a cluster can reach every other pod. That flat network is how a single compromised front end reaches your database. These items close it and clean up how secrets are handled. See our compliance automation guide for mapping these to frameworks.
18. Apply a default-deny NetworkPolicy, then explicit allows Risk: Flat networking lets lateral movement spread unchecked after an initial compromise. Fix: Start every namespace with a default-deny baseline:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes: ["Ingress", "Egress"]
Then add narrow allow rules for the traffic each workload actually needs. Tool: kube-score (NetworkPolicy targets a pod), Checkov (CKV_K8S_27).
19. No secrets as plaintext env values or in ConfigMaps
Risk: Plaintext secrets in the manifest end up in Git history, logs, and kubectl describe output.
Fix: Reference a Secret instead of inlining the value:
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
Better still, pull from an external store via the Secrets Store CSI driver or External Secrets Operator. Tool: Trivy (secret detection + KSV109), Checkov (CKV_K8S_35).
20. Enforce TLS on ingress; remove insecure annotations Risk: Plaintext ingress exposes credentials and session tokens on the wire. Fix: Define a TLS block and drop backend-protocol overrides that disable verification:
spec:
tls:
- hosts: ["app.example.com"]
secretName: app-tls
Tool: Checkov (ingress TLS checks), Trivy.
21. Restrict egress for sensitive namespaces Risk: Unrestricted egress lets a compromised pod exfiltrate data or call out to a command-and-control host. Fix: Add an egress NetworkPolicy that allows DNS plus only the specific destinations the workload needs, and deny the rest. Tool: Checkov, kube-score.
22. Encrypt secrets at rest
Risk: Without EncryptionConfiguration, secrets sit base64-encoded (not encrypted) in etcd.
Fix: Enable an EncryptionConfiguration on the API server with a KMS or aescbc provider.
Tool: kube-bench (CIS 1.2.x / 3.x control areas).
Supply chain and admission control (items 23-25)
The last three items stop you from shipping a known-vulnerable or untrusted image, and turn the whole checklist into policy that’s enforced at the cluster boundary. This is also where deployment gates live - see deployment gates that block bad releases.
23. Gate on image vulnerability scans with a severity threshold Risk: Shipping an image with known critical CVEs hands an attacker a ready-made exploit. Fix: Run a scan in CI that fails the build above a threshold:
trivy image --severity CRITICAL,HIGH --exit-code 1 \
registry.example.com/web@sha256:9b2c...
Tool: Trivy (image scanning), Grype as an alternative.
24. Verify image signatures and provenance at admission Risk: Without signature verification, a tampered or substituted image can run unnoticed. Fix: Sign images with cosign and verify at admission so only signed, provenance-attested images are admitted.
cosign sign registry.example.com/web@sha256:9b2c...
cosign verify --key cosign.pub registry.example.com/web@sha256:9b2c...
Tool: cosign + a Kyverno verifyImages policy.
25. Add an admission-time policy backstop enforcing the whole checklist as code
Risk: CI checks can be skipped; a manual kubectl apply bypasses every gate above.
Fix: Run Kyverno or OPA Gatekeeper in the cluster so non-compliant resources are rejected at apply time - this is what makes items 1-24 unbypassable.
# Kyverno: require runAsNonRoot on every pod
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-run-as-nonroot
spec:
validationFailureAction: Enforce
rules:
- name: check-nonroot
match:
any:
- resources:
kinds: ["Pod"]
validate:
message: "runAsNonRoot must be true"
pattern:
spec:
containers:
- securityContext:
runAsNonRoot: true
Tool: Kyverno, OPA Gatekeeper.
Turn the checklist into an enforced CI gate
A checklist you run by hand decays. Wire it into the pipeline and it stays true on every change. The two-layer pattern:
- CI scanner step - run kube-score,
trivy config, or Checkov against your manifests on every pull request, with a non-zero exit on critical findings so bad YAML fails the build before merge. - Admission backstop - run Kyverno or Gatekeeper so anything that skips CI is rejected at apply time.
A minimal CI step looks like this:
kube-score score k8s/*.yaml --output-format ci || exit 1
trivy config --severity HIGH,CRITICAL --exit-code 1 k8s/
If your team is figuring out who owns this in the QA workflow, our piece on the QA engineer’s Kubernetes role and tooling covers where manifest scanning fits.
Get an external manifest audit
This 25-point checklist is the artifact - the hard part is running it against real clusters and keeping it enforced after the audit is over. Book a hardening engagement and we’ll run all 25 checks against your actual manifests, map each finding to its CIS control, and wire the whole checklist into CI as enforced Kyverno policy so it can’t drift back.
Frequently Asked Questions
How do you harden a Kubernetes manifest before production?
Work through a structured checklist by category: securityContext (runAsNonRoot, readOnlyRootFilesystem, drop ALL capabilities), resource limits and probes for reliability, least-privilege RBAC and service accounts, a default-deny NetworkPolicy plus secret hygiene, and a supply-chain gate. Then wire each check into CI with kube-score, Trivy, or Checkov so the fixes stay enforced instead of drifting after a one-time review.
What should be in a Kubernetes securityContext?
A hardened securityContext sets runAsNonRoot: true with a non-zero runAsUser, readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, and drops ALL Linux capabilities (adding back only what the workload needs). Add seccompProfile.type: RuntimeDefault and never set privileged: true. These map directly to CIS Kubernetes Benchmark pod-security controls and are flagged by kube-score, kubeaudit, and Trivy.
Which Kubernetes misconfigurations are most common?
The most frequent issues are missing resource limits, containers running as root, no liveness or readiness probes, images pinned to :latest, wildcard RBAC verbs, and the absence of a NetworkPolicy (so every pod can talk to every other pod). Secrets stored as plaintext env values and auto-mounted service account tokens round out the list. Scanners like kube-score and Trivy catch all of these in seconds.
What tool flags missing resource limits in Kubernetes?
kube-score and Polaris both flag containers without CPU and memory requests and limits, and both grade probes and image-tag hygiene at the same time. Trivy and Checkov include equivalent misconfiguration checks (for example Trivy's KSV011 and KSV015) so you can enforce the same rule from your existing security scanner inside CI.
How do you enforce a Kubernetes hardening checklist in CI?
Run a manifest scanner (kube-score, Trivy config, or Checkov) as a pipeline step with a non-zero exit on critical findings, so bad manifests fail the build. Then back it with an admission controller like Kyverno or Gatekeeper that rejects non-compliant resources at apply time. That two-layer gate - CI plus admission - turns a manual checklist into enforced policy that cannot be bypassed.
Ship Kubernetes with Confidence
Free for open-source use. No credit card required. Install kubeqa and run your first cluster scan in under 5 minutes.
Get Started Free