Introduction
Running PostgreSQL on Kubernetes sounds simple until you need high availability, connection pooling, automated backups, and Prometheus metrics — all working together, all day-one. Vanilla PostgreSQL StatefulSets leave you stitching together PgBouncer, WAL-G, and custom exporters by hand.
StackGres is an open-source Kubernetes operator from OnGres that packages all of this into a single, declarative API. It wraps PostgreSQL in a sidecar-based architecture where every pod runs the database alongside PgBouncer, Envoy, and a Prometheus exporter — all managed through custom resources.
This guide walks through a production-ready StackGres deployment: from operator installation to cluster creation, connection pooling, object-storage backups, monitoring, and zero-downtime operator upgrades.
Why StackGres Over Vanilla PostgreSQL on Kubernetes
A plain PostgreSQL StatefulSet gives you a running database. StackGres gives you a running database plus:
- Automatic failover via Patroni — primary election happens without manual intervention
- Connection pooling via PgBouncer, configured as a first-class
SGPoolingConfigresource - WAL archiving and PITR backups to S3, GCS, or Azure Blob via
SGBackupConfig - Prometheus metrics exposed out of the box on every pod
- Declarative configuration — PostgreSQL
postgresql.confparameters live inSGPostgresConfigCRs - Minor and major version upgrades handled by the operator with rolling restarts
The trade-off is complexity: StackGres adds several CRDs and a more opinionated pod structure. For a single-instance dev database, it's overkill. For anything you'd page someone at 3am over, it's worth it.
Prerequisites
Infrastructure:
- Kubernetes 1.25+ (K3s, kubeadm, EKS, GKE, AKS all work)
kubectlwith cluster-admin accesshelmv3.12+- A default
StorageClassthat supportsReadWriteOncePVCs - An S3-compatible bucket (AWS S3, MinIO, Backblaze B2) for backups
Knowledge:
- Basic Kubernetes concepts (StatefulSets, Services, Secrets, CRDs)
- Familiarity with PostgreSQL configuration parameters
- Basic understanding of WAL archiving
Naming convention used in this guide:
Operator namespace: stackgres
Cluster namespace: postgres
Cluster name: prod-pg
PostgreSQL version: 16
Installing the StackGres Operator
StackGres ships as a Helm chart. The operator runs in its own namespace and watches all namespaces by default.
helm repo add stackgres-charts https://stackgres.io/downloads/stackgres-k8s/stackgres/helm/
helm repo update
helm install stackgres-operator stackgres-charts/stackgres-operator \
--namespace stackgres \
--create-namespace \
--version 1.12.0 \
--set adminui.service.type=ClusterIP
The adminui.service.type=ClusterIP flag keeps the web UI internal — you can port-forward to it later. Avoid LoadBalancer unless you've secured it with authentication.
Verify the Operator
kubectl -n stackgres get pods
# NAME READY STATUS RESTARTS AGE
# stackgres-operator-xxxxxxxxx-xxxxx 1/1 Running 0 90s
# stackgres-restapi-xxxxxxxxx-xxxxx 1/1 Running 0 90s
kubectl get crd | grep stackgres
# sgclusters.stackgres.io
# sginstanceprofiles.stackgres.io
# sgpoolconfigs.stackgres.io
# sgpostgresconfigs.stackgres.io
# sgbackupconfigs.stackgres.io
# sgbackups.stackgres.io
# sgdbops.stackgres.io
# ... (several more)
Creating an SGCluster
The SGCluster resource is the core of StackGres. It declares the PostgreSQL version, instance count, storage, and references to configuration objects.
Create the Namespace
kubectl create namespace postgres
Define an Instance Profile
SGInstanceProfile maps to CPU and memory requests/limits for each pod:
# sg-instance-profile.yaml
apiVersion: stackgres.io/v1
kind: SGInstanceProfile
metadata:
name: size-small
namespace: postgres
spec:
cpu: "2"
memory: "4Gi"
kubectl apply -f sg-instance-profile.yaml
Define a PostgreSQL Configuration
SGPostgresConfig holds postgresql.conf parameters. StackGres merges these with its own required settings:
# sg-postgres-config.yaml
apiVersion: stackgres.io/v1
kind: SGPostgresConfig
metadata:
name: pg16-config
namespace: postgres
spec:
postgresVersion: "16"
postgresql.conf:
max_connections: "200"
shared_buffers: "1GB"
effective_cache_size: "3GB"
maintenance_work_mem: "256MB"
checkpoint_completion_target: "0.9"
wal_buffers: "16MB"
default_statistics_target: "100"
random_page_cost: "1.1"
effective_io_concurrency: "200"
work_mem: "5242kB"
min_wal_size: "1GB"
max_wal_size: "4GB"
max_worker_processes: "4"
max_parallel_workers_per_gather: "2"
max_parallel_workers: "4"
max_parallel_maintenance_workers: "2"
log_min_duration_statement: "1000"
log_checkpoints: "on"
log_connections: "on"
log_disconnections: "on"
log_lock_waits: "on"
kubectl apply -f sg-postgres-config.yaml
Deploy the SGCluster
# sg-cluster.yaml
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
name: prod-pg
namespace: postgres
spec:
postgresVersion: "16"
instances: 3
sgInstanceProfile: size-small
configurations:
sgPostgresConfig: pg16-config
pods:
persistentVolume:
size: "50Gi"
storageClass: standard
replication:
mode: async
syncInstances: 1
prometheusAutobind: true
kubectl apply -f sg-cluster.yaml
Watch the cluster come up — StackGres initialises the primary first, then streams replicas:
kubectl -n postgres get pods -w
# NAME READY STATUS RESTARTS AGE
# prod-pg-0 5/5 Running 0 3m
# prod-pg-1 5/5 Running 0 4m
# prod-pg-2 5/5 Running 0 5m
Each pod runs five containers: postgres, pgbouncer, envoy, postgres-util, and fluent-bit. That's the StackGres sidecar model.
StackGres creates two Services automatically:
kubectl -n postgres get svc
# NAME TYPE CLUSTER-IP PORT(S) AGE
# prod-pg ClusterIP 10.96.10.20 5432/TCP,5433/TCP 5m
# prod-pg-replicas ClusterIP 10.96.10.21 5432/TCP,5433/TCP 5m
prod-pg— always points to the current primary (port5432direct,5433via PgBouncer)prod-pg-replicas— load-balances across all replicas
Configuring Connection Pooling
StackGres runs PgBouncer as a sidecar on every pod. You configure it via SGPoolingConfig:
# sg-pooling-config.yaml
apiVersion: stackgres.io/v1
kind: SGPoolingConfig
metadata:
name: pgbouncer-config
namespace: postgres
spec:
pgBouncer:
pgbouncer.ini:
pool_mode: transaction
max_client_conn: "1000"
default_pool_size: "25"
reserve_pool_size: "5"
reserve_pool_timeout: "3"
server_idle_timeout: "600"
client_idle_timeout: "0"
log_connections: "0"
log_disconnections: "0"
log_pooler_errors: "1"
stats_period: "60"
ignore_startup_parameters: extra_float_digits
Reference it in the cluster:
# Add to sg-cluster.yaml under spec.configurations
configurations:
sgPostgresConfig: pg16-config
sgPoolingConfig: pgbouncer-config
kubectl apply -f sg-pooling-config.yaml
kubectl apply -f sg-cluster.yaml
transaction pool mode is the right default for most web applications — connections are returned to the pool after each transaction, not each session. Avoid it if your application uses advisory locks, SET statements that persist across transactions, or prepared statements without server_reset_query.
Backup Configuration
StackGres uses WAL-G under the hood for continuous WAL archiving and base backups. You configure the storage target via SGObjectStorage and SGBackupConfig.
Create Object Storage Credentials
Store your S3 credentials as a Kubernetes Secret:
kubectl -n postgres create secret generic s3-backup-credentials \
--from-literal=accessKeyId=AKIAIOSFODNN7EXAMPLE \
--from-literal=secretAccessKey=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Define SGObjectStorage
# sg-object-storage.yaml
apiVersion: stackgres.io/v1beta1
kind: SGObjectStorage
metadata:
name: s3-backup-storage
namespace: postgres
spec:
type: s3Compatible
s3Compatible:
bucket: my-postgres-backups
region: us-east-1
endpoint: https://s3.amazonaws.com
enablePathStyleAddressing: false
awsCredentials:
secretKeySelectors:
accessKeyId:
name: s3-backup-credentials
key: accessKeyId
secretAccessKey:
name: s3-backup-credentials
key: secretAccessKey
kubectl apply -f sg-object-storage.yaml
Define SGBackupConfig
# sg-backup-config.yaml
apiVersion: stackgres.io/v1
kind: SGBackupConfig
metadata:
name: backup-config
namespace: postgres
spec:
baseBackups:
cronSchedule: "0 2 * * *"
retention: 7
compression: lz4
performance:
maxNetworkBandwidth: 0
maxDiskBandwidth: 0
uploadDiskConcurrency: 1
sgObjectStorage: s3-backup-storage
This schedules a full base backup at 02:00 UTC daily, retaining the last 7. WAL archiving runs continuously between base backups, enabling point-in-time recovery.
Reference it in the cluster:
# Add to sg-cluster.yaml under spec.configurations
configurations:
sgPostgresConfig: pg16-config
sgPoolingConfig: pgbouncer-config
sgBackupConfig: backup-config
kubectl apply -f sg-backup-config.yaml
kubectl apply -f sg-cluster.yaml
Monitoring Integration
StackGres exposes Prometheus metrics via a postgres_exporter sidecar on every pod. The prometheusAutobind: true field in the SGCluster spec automatically creates PodMonitor resources if the Prometheus Operator is installed.
Verify Metrics Exposure
# Port-forward to the metrics endpoint on the primary pod
kubectl -n postgres port-forward pod/prod-pg-0 9187:9187
# In another terminal
curl -s http://localhost:9187/metrics | grep pg_up
# pg_up 1
Grafana Dashboard
StackGres maintains an official Grafana dashboard. Import it by ID:
# Dashboard ID: 14114 (StackGres Cluster Overview)
# Import via Grafana UI: Dashboards → Import → Enter ID 14114
Or deploy it as a GrafanaDashboard resource if you're using the Grafana Operator:
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
name: stackgres-overview
namespace: monitoring
spec:
instanceSelector:
matchLabels:
dashboards: grafana
url: "https://grafana.com/api/dashboards/14114/revisions/latest/download"
Key Metrics to Alert On
| Metric | Alert Threshold | Description |
|--------|----------------|-------------|
| pg_up | < 1 | PostgreSQL is unreachable |
| pg_replication_lag | > 30s | Replica falling behind |
| pgbouncer_pools_cl_waiting | > 10 | Clients queued waiting for a connection |
| pg_stat_bgwriter_checkpoint_write_time | trending up | I/O pressure during checkpoints |
| pg_database_size_bytes | > 80% of PVC | Disk space warning |
Connecting to the Cluster
From Inside the Cluster
Applications connect via the Services StackGres creates. Use port 5433 to go through PgBouncer (recommended), or 5432 to connect directly to PostgreSQL.
# Get the superuser password
kubectl -n postgres get secret prod-pg \
-o jsonpath='{.data.superuser-password}' | base64 -d
# Connect via psql from a debug pod
kubectl -n postgres run psql-debug \
--image=postgres:16 \
--restart=Never \
--rm -it \
-- psql -h prod-pg -p 5433 -U postgres
Connection String Format
# Via PgBouncer (recommended for applications)
postgresql://postgres:<password>@prod-pg.postgres.svc.cluster.local:5433/postgres
# Direct to primary (for admin tasks, migrations)
postgresql://postgres:<password>@prod-pg.postgres.svc.cluster.local:5432/postgres
# Read replica (for read-heavy workloads)
postgresql://postgres:<password>@prod-pg-replicas.postgres.svc.cluster.local:5432/postgres
Creating Application Users
Don't use the postgres superuser for application connections. Create a dedicated role:
-- Connect as superuser first
CREATE USER app_user WITH PASSWORD 'strong-password-here';
CREATE DATABASE app_db OWNER app_user;
GRANT CONNECT ON DATABASE app_db TO app_user;
\c app_db
GRANT USAGE ON SCHEMA public TO app_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO app_user;
Store the credentials as a Kubernetes Secret and reference them in your application Deployment.
Performing a Manual Backup
Trigger an on-demand backup using an SGBackup resource:
# sg-manual-backup.yaml
apiVersion: stackgres.io/v1
kind: SGBackup
metadata:
name: manual-backup-20250301
namespace: postgres
spec:
sgCluster: prod-pg
managedLifecycle: false
kubectl apply -f sg-manual-backup.yaml
# Watch the backup progress
kubectl -n postgres get sgbackup manual-backup-20250301 -w
# NAME STATUS BACKUP-PATH AGE
# manual-backup-20250301 Running s3://my-postgres-backups/prod-pg/... 10s
# manual-backup-20250301 Completed s3://my-postgres-backups/prod-pg/... 2m
managedLifecycle: false means this backup won't be deleted when the retention policy runs — useful for pre-migration snapshots.
Restoring from a Backup
Restore by creating a new SGCluster that references the backup as its initial data source:
# sg-cluster-restore.yaml
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
name: prod-pg-restored
namespace: postgres
spec:
postgresVersion: "16"
instances: 1
sgInstanceProfile: size-small
configurations:
sgPostgresConfig: pg16-config
pods:
persistentVolume:
size: "50Gi"
initialData:
restore:
fromBackup:
name: manual-backup-20250301
downloadDiskConcurrency: 1
kubectl apply -f sg-cluster-restore.yaml
StackGres will restore the base backup and replay WAL to bring the cluster to a consistent state. For point-in-time recovery, add a pointInTimeRecovery block:
initialData:
restore:
fromBackup:
name: manual-backup-20250301
pointInTimeRecovery:
restoreToTimestamp: "2025-03-01T14:30:00Z"
Upgrading the Operator
StackGres operator upgrades are handled via Helm. The operator performs a rolling restart of all managed clusters after upgrading its own Deployment.
Check the Current Version
helm list -n stackgres
# NAME NAMESPACE REVISION UPDATED STATUS CHART
# stackgres-operator stackgres 1 2025-01-15 10:00:00 UTC deployed stackgres-operator-1.12.0
Upgrade to a New Version
helm repo update
helm upgrade stackgres-operator stackgres-charts/stackgres-operator \
--namespace stackgres \
--version 1.13.0 \
--reuse-values
Watch the operator pod restart:
kubectl -n stackgres get pods -w
# NAME READY STATUS RESTARTS AGE
# stackgres-operator-old-xxxxxxxxx 1/1 Terminating 0 30d
# stackgres-operator-new-xxxxxxxxx 1/1 Running 0 15s
After the operator upgrades, it will reconcile all SGCluster resources. Clusters running on the same PostgreSQL major version will get a rolling restart to pick up any sidecar updates. Major PostgreSQL version upgrades require an SGDbOps resource with op: majorVersionUpgrade.
Major PostgreSQL Version Upgrade
# sg-major-upgrade.yaml
apiVersion: stackgres.io/v1
kind: SGDbOps
metadata:
name: pg16-to-pg17-upgrade
namespace: postgres
spec:
sgCluster: prod-pg
op: majorVersionUpgrade
majorVersionUpgrade:
postgresVersion: "17"
link: true
clone: false
check: true
kubectl apply -f sg-major-upgrade.yaml
# Monitor the upgrade
kubectl -n postgres get sgdbops pg16-to-pg17-upgrade -w
The check: true flag runs pg_upgrade --check first without making changes — always do this before the real upgrade.
Troubleshooting Common Issues
Pod stuck in Init state:
Check the init container logs. StackGres runs several init containers to set up the data directory and configuration:
kubectl -n postgres logs prod-pg-0 -c setup-scripts --previous
Patroni failover not happening:
Check the Patroni REST API via the postgres-util sidecar:
kubectl -n postgres exec prod-pg-0 -c postgres-util -- \
curl -s http://localhost:8008/cluster | python3 -m json.tool
PgBouncer connection errors: Check PgBouncer stats from inside the pod:
kubectl -n postgres exec prod-pg-0 -c pgbouncer -- \
psql -h /var/run/postgresql -p 6432 -U pgbouncer pgbouncer -c "SHOW POOLS;"
Backup failing with S3 permission error:
Verify the IAM policy attached to your access key includes s3:PutObject, s3:GetObject, s3:ListBucket, and s3:DeleteObject on the backup bucket.
Production Considerations
Before this setup handles real traffic:
- Resource limits — Set CPU and memory limits in
SGInstanceProfilebased on your workload. PostgreSQL is memory-hungry;shared_buffersshould be 25% of available RAM. - Network policies — Restrict which namespaces can reach port
5432/5433on the postgres namespace usingNetworkPolicyresources. - Secrets management — Rotate the superuser password via
kubectl patch secretand update your application's connection string. Consider using External Secrets Operator to sync credentials from Vault or AWS Secrets Manager. - PVC expansion — StackGres supports online PVC expansion if your StorageClass allows it. Increase
pods.persistentVolume.sizein theSGClusterspec and the operator will resize the underlying PVCs. - Patroni tuning — The default
ttl(30s) andloop_wait(10s) values work for most clusters. On high-latency networks, increasettlto avoid spurious failovers. - Connection pooling mode — Switch from
transactiontosessionpool mode if your application usesLISTEN/NOTIFY, advisory locks, orSET LOCALparameters that must persist across a session.
Conclusion
StackGres turns PostgreSQL on Kubernetes from a DIY project into a managed service you control. The operator handles the operational complexity — failover, WAL archiving, connection pooling, metrics — while keeping everything declarative and version-controlled in your cluster.
The learning curve is the CRD surface area: SGCluster, SGInstanceProfile, SGPostgresConfig, SGPoolingConfig, SGBackupConfig, SGObjectStorage, SGBackup, SGDbOps. Once you understand what each one does, the system is remarkably coherent.
From here, explore SGDistributedLogs for centralised log aggregation across all your clusters, and SGShardedCluster if you need horizontal sharding. Both follow the same declarative pattern you've already learned.