StackGres on Kubernetes: Production-Grade PostgreSQL with the SGCluster Operator

Introduction

Running PostgreSQL on Kubernetes sounds simple until you need high availability, connection pooling, automated backups, and Prometheus metrics — all working together, all day-one. Vanilla PostgreSQL StatefulSets leave you stitching together PgBouncer, WAL-G, and custom exporters by hand.

StackGres is an open-source Kubernetes operator from OnGres that packages all of this into a single, declarative API. It wraps PostgreSQL in a sidecar-based architecture where every pod runs the database alongside PgBouncer, Envoy, and a Prometheus exporter — all managed through custom resources.

This guide walks through a production-ready StackGres deployment: from operator installation to cluster creation, connection pooling, object-storage backups, monitoring, and zero-downtime operator upgrades.

Why StackGres Over Vanilla PostgreSQL on Kubernetes

A plain PostgreSQL StatefulSet gives you a running database. StackGres gives you a running database plus:

Automatic failover via Patroni — primary election happens without manual intervention
Connection pooling via PgBouncer, configured as a first-class SGPoolingConfig resource
WAL archiving and PITR backups to S3, GCS, or Azure Blob via SGBackupConfig
Prometheus metrics exposed out of the box on every pod
Declarative configuration — PostgreSQL postgresql.conf parameters live in SGPostgresConfig CRs
Minor and major version upgrades handled by the operator with rolling restarts

The trade-off is complexity: StackGres adds several CRDs and a more opinionated pod structure. For a single-instance dev database, it's overkill. For anything you'd page someone at 3am over, it's worth it.

Prerequisites

Infrastructure:

Kubernetes 1.25+ (K3s, kubeadm, EKS, GKE, AKS all work)
kubectl with cluster-admin access
helm v3.12+
A default StorageClass that supports ReadWriteOnce PVCs
An S3-compatible bucket (AWS S3, MinIO, Backblaze B2) for backups

Knowledge:

Basic Kubernetes concepts (StatefulSets, Services, Secrets, CRDs)
Familiarity with PostgreSQL configuration parameters
Basic understanding of WAL archiving

Naming convention used in this guide:

Operator namespace:  stackgres
Cluster namespace:   postgres
Cluster name:        prod-pg
PostgreSQL version:  16

Installing the StackGres Operator

StackGres ships as a Helm chart. The operator runs in its own namespace and watches all namespaces by default.

helm repo add stackgres-charts https://stackgres.io/downloads/stackgres-k8s/stackgres/helm/
helm repo update

helm install stackgres-operator stackgres-charts/stackgres-operator \
  --namespace stackgres \
  --create-namespace \
  --version 1.12.0 \
  --set adminui.service.type=ClusterIP

The adminui.service.type=ClusterIP flag keeps the web UI internal — you can port-forward to it later. Avoid LoadBalancer unless you've secured it with authentication.

Verify the Operator

kubectl -n stackgres get pods
# NAME                                  READY   STATUS    RESTARTS   AGE
# stackgres-operator-xxxxxxxxx-xxxxx    1/1     Running   0          90s
# stackgres-restapi-xxxxxxxxx-xxxxx     1/1     Running   0          90s

kubectl get crd | grep stackgres
# sgclusters.stackgres.io
# sginstanceprofiles.stackgres.io
# sgpoolconfigs.stackgres.io
# sgpostgresconfigs.stackgres.io
# sgbackupconfigs.stackgres.io
# sgbackups.stackgres.io
# sgdbops.stackgres.io
# ... (several more)

Creating an SGCluster

The SGCluster resource is the core of StackGres. It declares the PostgreSQL version, instance count, storage, and references to configuration objects.

Create the Namespace

kubectl create namespace postgres

Define an Instance Profile

SGInstanceProfile maps to CPU and memory requests/limits for each pod:

# sg-instance-profile.yaml
apiVersion: stackgres.io/v1
kind: SGInstanceProfile
metadata:
  name: size-small
  namespace: postgres
spec:
  cpu: "2"
  memory: "4Gi"

kubectl apply -f sg-instance-profile.yaml

Define a PostgreSQL Configuration

SGPostgresConfig holds postgresql.conf parameters. StackGres merges these with its own required settings:

# sg-postgres-config.yaml
apiVersion: stackgres.io/v1
kind: SGPostgresConfig
metadata:
  name: pg16-config
  namespace: postgres
spec:
  postgresVersion: "16"
  postgresql.conf:
    max_connections: "200"
    shared_buffers: "1GB"
    effective_cache_size: "3GB"
    maintenance_work_mem: "256MB"
    checkpoint_completion_target: "0.9"
    wal_buffers: "16MB"
    default_statistics_target: "100"
    random_page_cost: "1.1"
    effective_io_concurrency: "200"
    work_mem: "5242kB"
    min_wal_size: "1GB"
    max_wal_size: "4GB"
    max_worker_processes: "4"
    max_parallel_workers_per_gather: "2"
    max_parallel_workers: "4"
    max_parallel_maintenance_workers: "2"
    log_min_duration_statement: "1000"
    log_checkpoints: "on"
    log_connections: "on"
    log_disconnections: "on"
    log_lock_waits: "on"

kubectl apply -f sg-postgres-config.yaml

Deploy the SGCluster

# sg-cluster.yaml
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
  name: prod-pg
  namespace: postgres
spec:
  postgresVersion: "16"
  instances: 3
  sgInstanceProfile: size-small
  configurations:
    sgPostgresConfig: pg16-config
  pods:
    persistentVolume:
      size: "50Gi"
      storageClass: standard
  replication:
    mode: async
    syncInstances: 1
  prometheusAutobind: true

kubectl apply -f sg-cluster.yaml

Watch the cluster come up — StackGres initialises the primary first, then streams replicas:

kubectl -n postgres get pods -w
# NAME            READY   STATUS    RESTARTS   AGE
# prod-pg-0       5/5     Running   0          3m
# prod-pg-1       5/5     Running   0          4m
# prod-pg-2       5/5     Running   0          5m

Each pod runs five containers: postgres, pgbouncer, envoy, postgres-util, and fluent-bit. That's the StackGres sidecar model.

StackGres creates two Services automatically:

kubectl -n postgres get svc
# NAME              TYPE        CLUSTER-IP      PORT(S)             AGE
# prod-pg           ClusterIP   10.96.10.20     5432/TCP,5433/TCP   5m
# prod-pg-replicas  ClusterIP   10.96.10.21     5432/TCP,5433/TCP   5m

prod-pg — always points to the current primary (port 5432 direct, 5433 via PgBouncer)
prod-pg-replicas — load-balances across all replicas

Configuring Connection Pooling

StackGres runs PgBouncer as a sidecar on every pod. You configure it via SGPoolingConfig:

# sg-pooling-config.yaml
apiVersion: stackgres.io/v1
kind: SGPoolingConfig
metadata:
  name: pgbouncer-config
  namespace: postgres
spec:
  pgBouncer:
    pgbouncer.ini:
      pool_mode: transaction
      max_client_conn: "1000"
      default_pool_size: "25"
      reserve_pool_size: "5"
      reserve_pool_timeout: "3"
      server_idle_timeout: "600"
      client_idle_timeout: "0"
      log_connections: "0"
      log_disconnections: "0"
      log_pooler_errors: "1"
      stats_period: "60"
      ignore_startup_parameters: extra_float_digits

Reference it in the cluster:

# Add to sg-cluster.yaml under spec.configurations
configurations:
  sgPostgresConfig: pg16-config
  sgPoolingConfig: pgbouncer-config

kubectl apply -f sg-pooling-config.yaml
kubectl apply -f sg-cluster.yaml

transaction pool mode is the right default for most web applications — connections are returned to the pool after each transaction, not each session. Avoid it if your application uses advisory locks, SET statements that persist across transactions, or prepared statements without server_reset_query.

Backup Configuration

StackGres uses WAL-G under the hood for continuous WAL archiving and base backups. You configure the storage target via SGObjectStorage and SGBackupConfig.

Create Object Storage Credentials

Store your S3 credentials as a Kubernetes Secret:

kubectl -n postgres create secret generic s3-backup-credentials \
  --from-literal=accessKeyId=AKIAIOSFODNN7EXAMPLE \
  --from-literal=secretAccessKey=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Define SGObjectStorage

# sg-object-storage.yaml
apiVersion: stackgres.io/v1beta1
kind: SGObjectStorage
metadata:
  name: s3-backup-storage
  namespace: postgres
spec:
  type: s3Compatible
  s3Compatible:
    bucket: my-postgres-backups
    region: us-east-1
    endpoint: https://s3.amazonaws.com
    enablePathStyleAddressing: false
    awsCredentials:
      secretKeySelectors:
        accessKeyId:
          name: s3-backup-credentials
          key: accessKeyId
        secretAccessKey:
          name: s3-backup-credentials
          key: secretAccessKey

kubectl apply -f sg-object-storage.yaml

Define SGBackupConfig

# sg-backup-config.yaml
apiVersion: stackgres.io/v1
kind: SGBackupConfig
metadata:
  name: backup-config
  namespace: postgres
spec:
  baseBackups:
    cronSchedule: "0 2 * * *"
    retention: 7
    compression: lz4
    performance:
      maxNetworkBandwidth: 0
      maxDiskBandwidth: 0
      uploadDiskConcurrency: 1
  sgObjectStorage: s3-backup-storage

This schedules a full base backup at 02:00 UTC daily, retaining the last 7. WAL archiving runs continuously between base backups, enabling point-in-time recovery.

Reference it in the cluster:

# Add to sg-cluster.yaml under spec.configurations
configurations:
  sgPostgresConfig: pg16-config
  sgPoolingConfig: pgbouncer-config
  sgBackupConfig: backup-config

kubectl apply -f sg-backup-config.yaml
kubectl apply -f sg-cluster.yaml

Monitoring Integration

StackGres exposes Prometheus metrics via a postgres_exporter sidecar on every pod. The prometheusAutobind: true field in the SGCluster spec automatically creates PodMonitor resources if the Prometheus Operator is installed.

Verify Metrics Exposure

# Port-forward to the metrics endpoint on the primary pod
kubectl -n postgres port-forward pod/prod-pg-0 9187:9187

# In another terminal
curl -s http://localhost:9187/metrics | grep pg_up
# pg_up 1

Grafana Dashboard

StackGres maintains an official Grafana dashboard. Import it by ID:

# Dashboard ID: 14114 (StackGres Cluster Overview)
# Import via Grafana UI: Dashboards → Import → Enter ID 14114

Or deploy it as a GrafanaDashboard resource if you're using the Grafana Operator:

apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
  name: stackgres-overview
  namespace: monitoring
spec:
  instanceSelector:
    matchLabels:
      dashboards: grafana
  url: "https://grafana.com/api/dashboards/14114/revisions/latest/download"

Key Metrics to Alert On

| Metric | Alert Threshold | Description | |--------|----------------|-------------| | pg_up | < 1 | PostgreSQL is unreachable | | pg_replication_lag | > 30s | Replica falling behind | | pgbouncer_pools_cl_waiting | > 10 | Clients queued waiting for a connection | | pg_stat_bgwriter_checkpoint_write_time | trending up | I/O pressure during checkpoints | | pg_database_size_bytes | > 80% of PVC | Disk space warning |

Connecting to the Cluster

From Inside the Cluster

Applications connect via the Services StackGres creates. Use port 5433 to go through PgBouncer (recommended), or 5432 to connect directly to PostgreSQL.

# Get the superuser password
kubectl -n postgres get secret prod-pg \
  -o jsonpath='{.data.superuser-password}' | base64 -d

# Connect via psql from a debug pod
kubectl -n postgres run psql-debug \
  --image=postgres:16 \
  --restart=Never \
  --rm -it \
  -- psql -h prod-pg -p 5433 -U postgres

Connection String Format

# Via PgBouncer (recommended for applications)
postgresql://postgres:<password>@prod-pg.postgres.svc.cluster.local:5433/postgres

# Direct to primary (for admin tasks, migrations)
postgresql://postgres:<password>@prod-pg.postgres.svc.cluster.local:5432/postgres

# Read replica (for read-heavy workloads)
postgresql://postgres:<password>@prod-pg-replicas.postgres.svc.cluster.local:5432/postgres

Creating Application Users

Don't use the postgres superuser for application connections. Create a dedicated role:

-- Connect as superuser first
CREATE USER app_user WITH PASSWORD 'strong-password-here';
CREATE DATABASE app_db OWNER app_user;
GRANT CONNECT ON DATABASE app_db TO app_user;
\c app_db
GRANT USAGE ON SCHEMA public TO app_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
  GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO app_user;

Store the credentials as a Kubernetes Secret and reference them in your application Deployment.

Performing a Manual Backup

Trigger an on-demand backup using an SGBackup resource:

# sg-manual-backup.yaml
apiVersion: stackgres.io/v1
kind: SGBackup
metadata:
  name: manual-backup-20250301
  namespace: postgres
spec:
  sgCluster: prod-pg
  managedLifecycle: false

kubectl apply -f sg-manual-backup.yaml

# Watch the backup progress
kubectl -n postgres get sgbackup manual-backup-20250301 -w
# NAME                       STATUS     BACKUP-PATH                                    AGE
# manual-backup-20250301     Running    s3://my-postgres-backups/prod-pg/...           10s
# manual-backup-20250301     Completed  s3://my-postgres-backups/prod-pg/...           2m

managedLifecycle: false means this backup won't be deleted when the retention policy runs — useful for pre-migration snapshots.

Restoring from a Backup

Restore by creating a new SGCluster that references the backup as its initial data source:

# sg-cluster-restore.yaml
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
  name: prod-pg-restored
  namespace: postgres
spec:
  postgresVersion: "16"
  instances: 1
  sgInstanceProfile: size-small
  configurations:
    sgPostgresConfig: pg16-config
  pods:
    persistentVolume:
      size: "50Gi"
  initialData:
    restore:
      fromBackup:
        name: manual-backup-20250301
      downloadDiskConcurrency: 1

kubectl apply -f sg-cluster-restore.yaml

StackGres will restore the base backup and replay WAL to bring the cluster to a consistent state. For point-in-time recovery, add a pointInTimeRecovery block:

initialData:
  restore:
    fromBackup:
      name: manual-backup-20250301
    pointInTimeRecovery:
      restoreToTimestamp: "2025-03-01T14:30:00Z"

Upgrading the Operator

StackGres operator upgrades are handled via Helm. The operator performs a rolling restart of all managed clusters after upgrading its own Deployment.

Check the Current Version

helm list -n stackgres
# NAME                  NAMESPACE   REVISION  UPDATED                  STATUS    CHART
# stackgres-operator    stackgres   1         2025-01-15 10:00:00 UTC  deployed  stackgres-operator-1.12.0

Upgrade to a New Version

helm repo update

helm upgrade stackgres-operator stackgres-charts/stackgres-operator \
  --namespace stackgres \
  --version 1.13.0 \
  --reuse-values

Watch the operator pod restart:

kubectl -n stackgres get pods -w
# NAME                                  READY   STATUS        RESTARTS   AGE
# stackgres-operator-old-xxxxxxxxx      1/1     Terminating   0          30d
# stackgres-operator-new-xxxxxxxxx      1/1     Running       0          15s

After the operator upgrades, it will reconcile all SGCluster resources. Clusters running on the same PostgreSQL major version will get a rolling restart to pick up any sidecar updates. Major PostgreSQL version upgrades require an SGDbOps resource with op: majorVersionUpgrade.

Major PostgreSQL Version Upgrade

# sg-major-upgrade.yaml
apiVersion: stackgres.io/v1
kind: SGDbOps
metadata:
  name: pg16-to-pg17-upgrade
  namespace: postgres
spec:
  sgCluster: prod-pg
  op: majorVersionUpgrade
  majorVersionUpgrade:
    postgresVersion: "17"
    link: true
    clone: false
    check: true

kubectl apply -f sg-major-upgrade.yaml

# Monitor the upgrade
kubectl -n postgres get sgdbops pg16-to-pg17-upgrade -w

The check: true flag runs pg_upgrade --check first without making changes — always do this before the real upgrade.

Troubleshooting Common Issues

Pod stuck in Init state: Check the init container logs. StackGres runs several init containers to set up the data directory and configuration:

kubectl -n postgres logs prod-pg-0 -c setup-scripts --previous

Patroni failover not happening: Check the Patroni REST API via the postgres-util sidecar:

kubectl -n postgres exec prod-pg-0 -c postgres-util -- \
  curl -s http://localhost:8008/cluster | python3 -m json.tool

PgBouncer connection errors: Check PgBouncer stats from inside the pod:

kubectl -n postgres exec prod-pg-0 -c pgbouncer -- \
  psql -h /var/run/postgresql -p 6432 -U pgbouncer pgbouncer -c "SHOW POOLS;"

Backup failing with S3 permission error: Verify the IAM policy attached to your access key includes s3:PutObject, s3:GetObject, s3:ListBucket, and s3:DeleteObject on the backup bucket.

Production Considerations

Before this setup handles real traffic:

Resource limits — Set CPU and memory limits in SGInstanceProfile based on your workload. PostgreSQL is memory-hungry; shared_buffers should be 25% of available RAM.
Network policies — Restrict which namespaces can reach port 5432/5433 on the postgres namespace using NetworkPolicy resources.
Secrets management — Rotate the superuser password via kubectl patch secret and update your application's connection string. Consider using External Secrets Operator to sync credentials from Vault or AWS Secrets Manager.
PVC expansion — StackGres supports online PVC expansion if your StorageClass allows it. Increase pods.persistentVolume.size in the SGCluster spec and the operator will resize the underlying PVCs.
Patroni tuning — The default ttl (30s) and loop_wait (10s) values work for most clusters. On high-latency networks, increase ttl to avoid spurious failovers.
Connection pooling mode — Switch from transaction to session pool mode if your application uses LISTEN/NOTIFY, advisory locks, or SET LOCAL parameters that must persist across a session.

Conclusion

StackGres turns PostgreSQL on Kubernetes from a DIY project into a managed service you control. The operator handles the operational complexity — failover, WAL archiving, connection pooling, metrics — while keeping everything declarative and version-controlled in your cluster.

The learning curve is the CRD surface area: SGCluster, SGInstanceProfile, SGPostgresConfig, SGPoolingConfig, SGBackupConfig, SGObjectStorage, SGBackup, SGDbOps. Once you understand what each one does, the system is remarkably coherent.

From here, explore SGDistributedLogs for centralised log aggregation across all your clusters, and SGShardedCluster if you need horizontal sharding. Both follow the same declarative pattern you've already learned.