= k8s MariaDb Galera Cluster = * Links * [[k8s/MariaDbGaleraInitDb ]] * [[https://severalnines.com/blog/galera-cluster-recovery-101-deep-dive-network-partitioning/]] * [[https://bobcares.com/blog/mysql-cluster-vs-galera/|Mysql-vs-Galera]] * [[https://proxysql.com/services/support/|Commercial Support]] * [[https://releem.com|Releem SaaS mysql tunning with agent]] * [[https://medium.com/dba-jungle/make-mariadb-galera-cluster-auto-recovery-fb1ce1d89f09]] * Safe to bootstrap * https://galeracluster.com/2016/11/introducing-the-safe-to-bootstrap-feature-in-galera-cluster/ * In case of a sudden crash of the entire cluster, all nodes will be considered unsafe to bootstrap from, so operator action will always be required to force the use of a particular node as a bootstrap node. == Restore huge db to Galera/Mariadb - using single node == * https://severalnines.com/blog/guide-mysql-galera-cluster-restoration-using-mysqldump/ * https://galeracluster.com/library/training/tutorials/galera-backup.html * https://github.com/mydumper/mydumper - multi threaded db dump == Restart - after orderly shutdown == * Check for "safe_to_bootstrap: 1" in grastate.dat * see - https://github.com/bitnami/charts/tree/main/bitnami/mariadb-galera#user-content-bootstraping-a-node-other-than-0 == Restart - after hard crash of all nodes == * all grastate.dat should now have :( "safe_to_bootstrap: 0" * Find node with last transaction committed {{{ mysql --wsrep-recover # Look in logs for highest "WSREP: Recoverd position: 37bb-addd-xxx # Pick the node with highest number and change grastage.dat "safe_to_bootstrap: 0 -> 1" }}} * k8s recover from hard restart by mounting pvc volume into temp container, and then manually editing /mnt/data/grastage.dat {{{ #!/usr/bin/env bash export k8s_claimName=mariadb-galera-0 kubectl get pvc ${k8s_claimName} | grep "${k8s_claimName}\s\+Bound\s" || echo "# Didn't find Bound pvc ${k8s_claimName} in namespace" kubectl run -i --tty --rm volpodcontainer --overrides=' { "apiVersion": "v1", "kind": "Pod", "metadata": { "name": "volpod" }, "spec": { "containers": [ { "command": [ "bash" ] ,"image": "docker.io/diepes/debug:latest", "name": "volpod" ,"stdin": true, "tty": true ,"volumeMounts": [{ "mountPath": "/mnt", "name": "galeradata" }] }] ,"restartPolicy": "Never" ,"volumes": [{ "name": "galeradata" , "persistentVolumeClaim": { "claimName": "'${k8s_claimName}'" } }] ,"tolerations": [{"effect": "NoSchedule", "key": "kubernetes.azure.com/scalesetpriority", "operator": "Equal", "value": "spot" }] } }' --image="docker.io/diepes/debug:latest" }}} == HAPROXY liveness script for MariaDB Galera == * https://github.com/olafz/percona-clustercheck == MySQL (MariaDB) ram tuning == * https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool-resize.html == Error messages Mariadb/Galera == 1. "[Warning] WSREP: no nodes coming from prim view, prim not possible" or "[ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster ..." * Which means that no cluster primary node exists and it can't figure out if it should become primary. * recovery: * we could try starting the DB’s in parallel, or putting each to sleep and making the HealthCheck pass, while we manually follow the recovery steps * '''Boot strapping / recovery''' 1. Delay restarts * update on the StatefulSets parameter readinessProbe under initialDelaySeconds from the default 30 to 300 (which is 5 minutes) to allow sufficient time to edit the impacted file 1. Find latest db {{{ mysqld --wsrep-recover }}} 1. select the pod to boot first * Update grstate.dat {{{ cat /bitnami/mariadb/data/grastate.dat # uuid: 2a651c5d-139e-11ee-8733-0eab9be77c14 # seqno: -1 # safe_to_bootstrap: 0 cd /bitnami/mariadb/data sed -i “s/safe_to_bootstrap: 0/safe_to_bootstrap: 1/“ grstate.dat # Now delete / recreate pod to bootstrap }}}