Optimizing Resource Limits and Pod Distribution for ArgoCD

Overview

This guide details the process of optimizing an ArgoCD deployment in Kubernetes by implementing proper resource limits and ensuring optimal pod distribution across worker nodes. These optimizations enhance reliability, performance, and high availability of the ArgoCD installation.

Initial Environment Analysis

Cluster Configuration

Kubernetes Cluster (K3s v1.31.6+k3s1)
Architecture:
- 1 master node (control plane)
- 2 worker nodes
Initial state: Pods distributed across all nodes including master

Master Node Protection

Implementing Control Plane Isolation

# Adding taint to master node
kubectl taint nodes <master-node> node-role.kubernetes.io/control-plane=:NoSchedule

Purpose of Master Node Taint

Prevents regular workloads from scheduling on the control plane
Reserves master node resources for critical cluster operations
Enforces best practices for Kubernetes architecture

Resource Limits Configuration

Resource Allocation by Component

Application Controller

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

API Server

resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    cpu: 200m
    memory: 128Mi

Repository Server

resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    cpu: 200m
    memory: 256Mi

Dex Server

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 200m
    memory: 256Mi

Redis

resources:
  requests:
    cpu: 50m
    memory: 32Mi
  limits:
    cpu: 100m
    memory: 64Mi

ApplicationSet Controller

resources:
  requests:
    cpu: 50m
    memory: 32Mi
  limits:
    cpu: 200m
    memory: 128Mi

Notifications Controller

resources:
  requests:
    cpu: 50m
    memory: 32Mi
  limits:
    cpu: 100m
    memory: 128Mi

Pod Distribution Strategy

Anti-Affinity Rules

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - <component-name>
        topologyKey: kubernetes.io/hostname

Final Pod Distribution

Worker Node 1:

argocd-repo-server
argocd-server

Worker Node 2:

argocd-application-controller
argocd-applicationset-controller
argocd-dex-server
argocd-notifications-controller
argocd-redis

Optimization Results

Resource Management Benefits

Predictable resource usage
Prevention of resource contention
Efficient resource allocation
Protection against memory/CPU exhaustion

High Availability Improvements

Proper workload distribution
Enhanced fault tolerance
Better resource utilization
Reduced single point of failure risk

Performance Metrics

CPU usage maintained within defined limits
Memory consumption optimized
Improved response times
Better overall cluster stability

Monitoring and Maintenance

Health Checks

# View pod distribution
kubectl get pods -n argocd -o wide

# Check resource usage
kubectl top pods -n argocd

Resource Adjustment

Monitor resource usage and adjust limits based on:

Actual usage patterns
Application load
Cluster capacity
Performance requirements

Best Practices Summary

Control Plane Protection
- Master node tainted
- Workloads on worker nodes only
- Critical services protected
Resource Management
- Appropriate limits set
- Resources allocated based on component needs
- Buffer for spike handling
High Availability
- Pod anti-affinity rules
- Distributed workload
- Failure domain isolation

Advanced Troubleshooting

Resource Management Issues

Pod Eviction Problems

# Check node resources
kubectl describe node <node-name>

# View pod resource usage
kubectl top pods -n argocd --containers

# Check eviction events
kubectl get events -n argocd --field-selector reason=Evicted

Common causes:

Memory pressure
CPU throttling
Storage pressure
Node pressure

Resource Limit Violations

# Check pod QoS class
kubectl get pods -n argocd -o custom-columns="NAME:.metadata.name,QOS:.status.qosClass"

# View resource quotas
kubectl describe resourcequota -n argocd

# Monitor resource usage trends
kubectl top pods -n argocd --containers --sort-by=cpu

Pod Distribution Issues

Node Affinity Problems

# Check node labels
kubectl get nodes --show-labels

# Verify pod placement
kubectl get pods -n argocd -o wide

# Debug scheduler decisions
kubectl describe pod <pod-name> -n argocd | grep -A 10 Events

Taint and Toleration Issues

# List node taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

# Check pod tolerations
kubectl get pod <pod-name> -n argocd -o jsonpath='{.spec.tolerations}'

K3s-Specific Optimizations

Worker Node Management

# Check K3s service status
sudo systemctl status k3s-agent

# View K3s logs
sudo journalctl -u k3s-agent

# Monitor node conditions
kubectl get nodes -o custom-columns=NAME:.metadata.name,STATUS:.status.conditions[*].type

Storage Issues

# Check default storage class
kubectl get sc

# View persistent volumes
kubectl get pv,pvc -n argocd

# Monitor storage usage
kubectl describe nodes | grep "Allocated resources"

High Availability Troubleshooting

Pod Distribution Check

# View pod spread
kubectl get pods -n argocd -o wide --sort-by=.spec.nodeName

# Check anti-affinity rules
kubectl get pods -n argocd -o yaml | grep -A 10 affinity

Load Balancing Issues

# Check service endpoints
kubectl get endpoints -n argocd

# View service distribution
kubectl describe svc argocd-server -n argocd

Performance Optimization

CPU Optimization

# Monitor CPU throttling
kubectl get pods -n argocd -o custom-columns="NAME:.metadata.name,CPU_REQUESTS:.spec.containers[*].resources.requests.cpu,CPU_LIMITS:.spec.containers[*].resources.limits.cpu"

# Check CPU pressure
kubectl describe node <node-name> | grep -A 5 "Allocated resources"

Memory Management

# View memory usage
kubectl top pods -n argocd --containers

# Check memory limits
kubectl get pods -n argocd -o custom-columns="NAME:.metadata.name,MEMORY_REQUESTS:.spec.containers[*].resources.requests.memory,MEMORY_LIMITS:.spec.containers[*].resources.limits.memory"

Network Optimization

Network Policy Verification

# List network policies
kubectl get networkpolicies -n argocd

# Test network connectivity
kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash

Service Mesh Integration

# Check proxy sidecars
kubectl get pods -n argocd -o jsonpath='{.items[*].spec.containers[*].name}'

# Verify mesh configuration
kubectl describe pod -n argocd -l app.kubernetes.io/name=argocd-server

Recovery and Rollback

Configuration Backup

# Backup resource configurations
kubectl get deployment,statefulset -n argocd -o yaml > argocd-workloads.yaml

# Export current settings
kubectl get configmap -n argocd -o yaml > argocd-config.yaml

Rollback Procedures

# Revert to previous configuration
kubectl apply -f argocd-workloads.yaml

# Verify rollback
kubectl get pods -n argocd

Monitoring and Alerts

Prometheus Integration

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-metrics
spec:
  endpoints:
  - port: metrics
    interval: 30s
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-metrics

Alert Configuration

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: argocd-alerts
spec:
  groups:
  - name: argocd
    rules:
    - alert: HighResourceUsage
      expr: container_memory_usage_bytes{namespace="argocd"} > 1Gi
      for: 5m
      labels:
        severity: warning

Conclusion

These optimizations ensure a robust, reliable, and efficient ArgoCD deployment. The configuration provides:

Protected control plane
Efficient resource utilization
High availability
Proper workload distribution
Sustainable scaling capability

Return to the main guide for complete ArgoCD implementation documentation.

Optimizing Resource Limits and Pod Distribution for ArgoCD#

Overview#

Initial Environment Analysis#

Cluster Configuration#

Master Node Protection#

Implementing Control Plane Isolation#

Purpose of Master Node Taint#

Resource Limits Configuration#

Resource Allocation by Component#

Application Controller#

API Server#

Repository Server#

Dex Server#

Redis#

ApplicationSet Controller#

Notifications Controller#

Pod Distribution Strategy#

Anti-Affinity Rules#

Final Pod Distribution#

Worker Node 1:#

Worker Node 2:#

Optimization Results#

Resource Management Benefits#

High Availability Improvements#

Performance Metrics#

Monitoring and Maintenance#

Health Checks#

Resource Adjustment#

Best Practices Summary#

Advanced Troubleshooting#

Resource Management Issues#

Pod Eviction Problems#

Resource Limit Violations#

Pod Distribution Issues#

Node Affinity Problems#

Taint and Toleration Issues#

K3s-Specific Optimizations#

Worker Node Management#

Storage Issues#

High Availability Troubleshooting#

Pod Distribution Check#

Load Balancing Issues#

Performance Optimization#

CPU Optimization#

Memory Management#

Network Optimization#

Network Policy Verification#

Service Mesh Integration#

Recovery and Rollback#

Configuration Backup#

Rollback Procedures#

Monitoring and Alerts#

Prometheus Integration#

Alert Configuration#

Conclusion#