🛡️
CTHFM: Kubernetes
  • Welcome
  • Kubernetes Fundamentals
    • Kubernetes Components
      • Kubernetes Master Node
      • Worker Nodes
      • Pods
      • Service
      • ConfigMaps and Secrets
      • Namespaces
      • Deployments
      • ReplicaSets
      • Jobs and CronJobs
      • Horizontal Pod Autoscaler (HPA)
      • Kubernetes Ports and Protocols
    • Kubectl
      • Installation and Setup
      • Basic Kubectl
      • Working With Pods
      • Deployments and ReplicaSets
      • Services and Networking
      • ConfigMaps and Secrets
      • YAML Manifest Management
      • Debugging and Troubleshooting
      • Kubectl Scripting: Security
      • Customizing Kubectl
      • Security Best Practices
      • Common Issues
      • Reading YAML Files
    • MiniKube
      • Intro
      • Prerequisites
      • Installation MiniKube
      • Starting MiniKube
      • Deploy a Sample Application
      • Managing Kubernetes Resources
      • Configuring MiniKube
      • Persistent Storage in Minikube
      • Using Minikube for Local Development
      • Common Pitfalls
      • Best Practices
  • Kubernetes Logging
    • Kubernetes Logging Overview
    • Audit Logs
    • Node Logs
    • Pod Logs
    • Application Logs
    • Importance of Logging
    • Types of Logs
    • Collecting and Aggregating Logs
    • Monitoring and Alerting
    • Log Parsing and Enrichment
    • Security Considerations in Logging
    • Best Practices
    • Kubernetes Logging Architecture
  • Threat Hunting
    • Threat Hunting Introduction
    • What Makes Kubernetes Threat Hunting Unique
    • Threat Hunting Process
      • Hypothesis Generation
      • Investigation
      • Identification
      • Resolution & Follow Up
    • Pyramid of Pain
    • Threat Frameworks
      • MITRE Containers Matrix
        • MITRE Att&ck Concepts
        • MITRE Att&ck Data Sources
        • MITRE ATT&CK Mitigations
        • MITRE Att&ck Containers Matrix
      • Microsoft Threat for Kubernetes
    • Kubernetes Behavioral Analysis and Anomaly Detection
    • Threat Hunting Ideas
    • Threat Hunting Labs
  • Security Tools
    • Falco
      • Falco Overview
      • Falco's Architecture
      • Runtime Security Explained
      • Installation and Setup
      • Falco Rules
      • Tuning Falco Rules
      • Integrating Falco with Kubernetes
      • Detecting Common Threats with Falco
      • Integrating Falco with Other Security Tools
      • Automating Incident Response with Falco
      • Managing Falco Performance and Scalability
      • Updating and Maintaining Falco
      • Real-World Case Studies and Lessons Learned
      • Labs
        • Deploying Falco on a Kubernetes Cluster
        • Writing and Testing Custom Falco Rules
        • Integrating Falco with a SIEM System
        • Automating Responses to Falco Alerts
    • Open Policy Agent (OPA)
      • Introduction to Open Policy Agent (OPA)
      • Getting Started with OPA
      • Rego
      • Advanced Rego Concepts
      • Integrating OPA with Kubernetes
      • OPA Gatekeeper
      • Policy Enforcement in Microservices
      • OPA API Gateways
      • Introduction to CI/CD Pipelines and Policy Enforcement
      • External Data in OPA
      • Introduction to Decision Logging
      • OPA Performance Monitoring
      • OPA Implementation Best Practices
      • OPA Case Studies
      • OPA Ecosystem
    • Kube-Bench
    • Kube-Hunter
    • Trivy
    • Security Best Practices and Documentation
      • RBAC Good Practices
      • Official CVE Feed
      • Kubernetes Security Checklist
      • Securing a Cluster
      • OWASP
  • Open Source Tools
    • Cloud Native Computing Foundation (CNCF)
      • Security Projects
  • Infrastructure as Code
    • Kubernetes and Terraform
      • Key Focus Areas for Threat Hunters
      • Infastructure As Code: Kubernetes
      • Infrastructure as Code (IaC) Basics
      • Infastructure As Code Essential Commands
      • Terraform for Container Orchestration
      • Network and Load Balancing
      • Secrets Management
      • State Management
      • CI/CD
      • Security Considerations
      • Monitoring and Logging
      • Scaling and High Availability
      • Backup and Disaster Recovery
    • Helm
      • What is Helm?
      • Helm Architecture
      • Write Helm Charts
      • Using Helm Charts
      • Customizing Helm Charts
      • Customizing Helm Charts
      • Building Your Own Helm Chart
      • Advanced Helm Chart Customization
      • Helm Repositories
      • Helm Best Practices
      • Helmfile and Continuous Integration
      • Managing Secrets with Helm and Helm Secrets
      • Troubleshooting and Debugging Helm
      • Production Deployments
      • Helm Case Studies
Powered by GitBook
On this page
  • Debugging and Troubleshooting Overview
  • Understanding the Basics of Troubleshooting in Kubernetes
  • Inspecting Logs
  • Viewing Events
  • Accessing Container Terminals
  • Checking Resource Status
  • Monitoring Resource Usage
  • Common Troubleshooting Scenarios
  • Best Practices for Debugging and Troubleshooting
  • Kubectl Commands
  • Summary
  1. Kubernetes Fundamentals
  2. Kubectl

Debugging and Troubleshooting

Debugging and Troubleshooting Overview

Effective debugging and troubleshooting are crucial skills for managing Kubernetes environments. Kubernetes provides a variety of tools and commands through kubectl to help you diagnose and resolve issues within your cluster. This section will cover the essential commands and techniques for debugging and troubleshooting common problems in Kubernetes, including inspecting logs, viewing events, accessing container terminals, and understanding resource statuses.


Understanding the Basics of Troubleshooting in Kubernetes

Kubernetes environments can be complex, with many moving parts, including pods, containers, services, and network configurations. When something goes wrong, it's important to systematically approach the problem to identify the root cause. The general steps for troubleshooting include:

  1. Identify the Problem Area: Determine which component (pod, node, service, etc.) is experiencing issues.

  2. Gather Information: Use kubectl commands to collect logs, events, and resource statuses.

  3. Analyze the Data: Review the collected information to identify errors, failures, or misconfigurations.

  4. Take Corrective Actions: Apply fixes or adjustments to resolve the issue.

Inspecting Logs

Logs are often the first place to look when debugging an issue, as they provide a detailed record of what is happening inside your containers.

  • View logs for a specific pod:

    kubectl logs <pod-name>

    This command retrieves the logs for the default container in the specified pod.

  • View logs for a specific container in a pod:

    kubectl logs <pod-name> -c <container-name>

    This is useful when a pod has multiple containers, and you need to focus on one specific container.

  • Stream logs in real-time:

    kubectl logs -f <pod-name>

    The -f (follow) flag streams the logs in real-time, which is helpful for observing ongoing processes or debugging issues as they occur.

Viewing Events

Kubernetes records events that provide insights into what is happening in your cluster. Events can help identify issues such as pod scheduling failures, container crashes, or configuration errors.

  • List all events in a namespace:

    kubectl get events

    This command lists all events in the current namespace, sorted by the most recent events first.

  • Filter events related to a specific resource:

    kubectl get events --field-selector involvedObject.name=<resource-name>

    This command filters events to show only those related to a specific resource, such as a particular pod or deployment.

  • Describe a resource to see related events:

    kubectl describe <resource-type> <resource-name>

    The output includes a section listing recent events related to the resource, such as failed deployments, restarts, or other issues.

Accessing Container Terminals

Sometimes you need to interact directly with a running container to inspect its environment, run diagnostic commands, or investigate issues.

  • Access a shell inside a container:

    kubectl exec -it <pod-name> -- /bin/sh

    This command opens an interactive terminal session inside the container. Replace /bin/sh with /bin/bash if the container has Bash installed.

  • Run a specific command inside a container:

    kubectl exec <pod-name> -- <command>

    This is useful for running one-off diagnostic commands without entering an interactive shell.

Checking Resource Status

Understanding the status of your Kubernetes resources is key to identifying where issues might be occurring.

  • Get the status of all pods in a namespace:

    kubectl get pods

    This command lists all pods, showing their current status (e.g., Running, Pending, CrashLoopBackOff).

  • Describe a specific pod:

    kubectl describe pod <pod-name>

    This provides detailed information about the pod, including its state, events, and the status of each container.

  • Check node status:

    kubectl get nodes

    This command lists all nodes in your cluster and their status (e.g., Ready, NotReady).

  • Describe a specific node:

    kubectl describe node <node-name>

    This command provides detailed information about the node, including resource usage, running pods, and taints.

Monitoring Resource Usage

Monitoring CPU and memory usage can help you identify performance issues or resource constraints that may be affecting your applications.

  • Monitor CPU and memory usage of pods:

    kubectl top pod

    This command displays the current CPU and memory usage of all pods in the namespace.

  • Monitor CPU and memory usage of nodes:

    kubectl top node

    This command displays the current CPU and memory usage for each node in the cluster.

Common Troubleshooting Scenarios

  1. Pods in CrashLoopBackOff State:

    • Steps:

      • Check the logs of the affected pod using kubectl logs.

      • Describe the pod to view events that might indicate why the pod is crashing.

      • Access the container terminal to inspect the environment or run diagnostic commands.

  2. Pods Stuck in Pending State:

    • Steps:

      • Describe the pod to identify why it’s pending (e.g., insufficient resources, unsatisfied node affinity).

      • Check the status of the nodes to ensure they are ready and have sufficient resources.

  3. Service Not Accessible:

    • Steps:

      • Ensure the service is correctly configured by describing it.

      • Check the endpoints associated with the service using kubectl get endpoints.

      • Verify that the pods selected by the service are running and healthy.

  4. High Resource Usage:

    • Steps:

      • Use kubectl top pod and kubectl top node to monitor resource usage.

      • Adjust resource requests and limits in your pod specs to ensure pods get the resources they need without overwhelming the cluster.

Best Practices for Debugging and Troubleshooting

  • Automate Monitoring: Use monitoring tools like Prometheus, Grafana, and Kubernetes-native monitoring to automate resource usage tracking and alerting.

  • Regularly Review Logs and Events: Keep an eye on logs and events to catch issues early, before they escalate into bigger problems.

  • Document Issues and Resolutions: Maintain a log of common issues and their resolutions to speed up troubleshooting in the future.


Kubectl Commands

Inspecting Logs

  • View logs for a specific pod:

    kubectl logs <pod-name>

    Retrieves the logs for the default container in the specified pod.

  • View logs for a specific container in a pod:

    kubectl logs <pod-name> -c <container-name>

    Retrieves logs for a specific container within a pod that has multiple containers.

  • Stream logs in real-time (follow mode):

    kubectl logs -f <pod-name>

    Streams the logs in real-time, useful for observing ongoing processes.

Viewing Events

  • List all events in a namespace:

    kubectl get events

    Lists all events in the current namespace, ordered by the most recent first.

  • Filter events related to a specific resource:

    kubectl get events --field-selector involvedObject.name=<resource-name>

    Shows only events related to a specified resource, such as a particular pod.

  • Describe a resource to see related events:

    kubectl describe <resource-type> <resource-name>

    Provides detailed information about the resource, including events related to it.

Accessing Container Terminals

  • Access a shell inside a container:

    kubectl exec -it <pod-name> -- /bin/sh

    Opens an interactive shell session inside the container. Replace /bin/sh with /bin/bash if Bash is installed in the container.

  • Run a specific command inside a container:

    kubectl exec <pod-name> -- <command>

    Executes a command inside the specified container.

Checking Resource Status

  • Get the status of all pods in a namespace:

    kubectl get pods

    Lists all pods and their current status (Running, Pending, CrashLoopBackOff, etc.).

  • Describe a specific pod:

    kubectl describe pod <pod-name>

    Provides detailed information about the specified pod, including its current state and events.

  • Check node status:

    kubectl get nodes

    Lists all nodes in the cluster and their status (Ready, NotReady, etc.).

  • Describe a specific node:

    kubectl describe node <node-name>

    Provides detailed information about the specified node, including resource usage, running pods, and taints.

Monitoring Resource Usage

  • Monitor CPU and memory usage of pods:

    kubectl top pod

    Displays the current CPU and memory usage of all pods in the namespace.

  • Monitor CPU and memory usage of nodes:

    kubectl top node

    Displays the current CPU and memory usage for each node in the cluster.

Troubleshooting Specific Scenarios

  • Investigate a pod in CrashLoopBackOff state:

    1. View logs:

      kubectl logs <pod-name>
    2. Describe the pod:

      kubectl describe pod <pod-name>
    3. Access the container:

      kubectl exec -it <pod-name> -- /bin/sh
  • Check why a pod is stuck in Pending state:

    1. Describe the pod:

      kubectl describe pod <pod-name>
    2. Check node status:

      kubectl get nodes
  • Investigate why a service is not accessible:

    1. Describe the service:

      kubectl describe service <service-name>
    2. Check associated endpoints:

      kubectl get endpoints <service-name>
    3. Check the status of related pods:

      kubectl get pods -l <label-selector>

General Debugging Tools

  • Run a probe manually to check pod health:

    kubectl exec <pod-name> -- curl -f http://localhost:<port>/<path>

    Manually run a readiness or liveness probe to check the health of a pod.

  • Check the cluster's DNS resolution:

    ubectl exec -it <pod-name> -- nslookup <service-name>

    Verify DNS resolution within the cluster.


Summary

These commands provide you with a comprehensive toolkit for debugging and troubleshooting issues in a Kubernetes environment, enabling you to quickly diagnose and resolve problems.

By mastering these debugging and troubleshooting techniques, you'll be well-equipped to maintain the health and performance of your Kubernetes clusters, ensuring that your applications run smoothly and reliably.

PreviousYAML Manifest ManagementNextKubectl Scripting: Security

Last updated 9 months ago