Backup and Restore etcd in Kubernetes Cluster for CKA v1.19

The final module of the Cluster Architecture, Installation, and Configuration is Implement etcd backup and restore. Let’s quickly perform the actions we need to complete this step for the exam.

Perform a Backup of etcd

While it’s still early and details of the CKA v1.19 environment aren’t known yet, I’m anticipating a small change to how etcd backup and restore is performed. If you’ve been preparing for the CKA before the September 2020 change to Kubernetes v1.19, you may know be familiar with the environment variable export ETCDCTL_API=3 to ensure you’re using version 3 of etcd’s API, which has the backup and restore capability. However, Kubernetes v1.19 ships with etcd 3.4.9 and in etcd 3.4.x, the default API version is 3 so this process is no longer necessary! If etcdctl version returns a version lower than 3.4.x, you will still need to set the API version to 3 for performing backup and restore operations.

Get The Info You Need First

When you type the etcd backup command, you’re going to need to specify the location of a few certificates and a key. Let’s grab that really quick! Get the name of our etcd pod:

kubectl get pods -A

Get the details of our etcd pod:

kubectl describe pods etcd-controlplane -n kube-system

The output that we’re interested in is under the Command section. You will need to copy the locations of:

  • cert-file
  • key-file
  • trusted-ca-file
  • listen-client-urls
Command: 
etcd
--advertise-client-urls=https://172.17.0.54:2379
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--client-cert-auth=true --data-dir=/var/lib/etcd
--initial-advertise-peer-urls=https://172.17.0.54:2380
--initial-cluster=master=https://172.17.0.54:2380
--key-file=/etc/kubernetes/pki/etcd/server.key
--listen-client-urls=https://127.0.0.1:2379,https://172.17.0.54:2379
--listen-metrics-urls=http://127.0.0.1:2381
--listen-peer-urls=https://172.17.0.54:2380
--name=master
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
--peer-client-cert-auth=true
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--snapshot-count=10000
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

Now here’s the fun part: the names of the options that etcd uses isn’t the same that etcdctl uses for the backup. They’re close enough to match up. Here’s how they map:

etcd optionsetcdctl options
–cert-file–cert
–key-file–key
–trusted-ca-file–cacert
–listen-client-urls–endpoints

Your backup command should look like this:

etcdctl snapshot save etcd.db \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--endpoints=https://127.0.0.1:2379 \
--key=/etc/kubernetes/pki/etcd/server.key

Output:

{"level":"info","ts":1603021662.1152575,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"etcd.db.part"}{"level":"info","ts":"2020-10-18T11:47:42.129Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1603021662.1302097,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2020-10-18T11:47:42.173Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1603021662.198739,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"1.8 MB","took":0.083223978}
{"level":"info","ts":1603021662.199425,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"etcd.db"}
Snapshot saved at etcd.db

That’s it! The etcd database is backed up and we’re ready to restore!

Perform a Restore of etcd

With a restore, we’re going to specify those same 4 parameters from the backup operation but add a few more that are needed to initialize the restore as a new etcd store:

etcdctl snapshot restore etcd.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--name=controlplane \
--data-dir /var/lib/etcd-from-backup \
--initial-cluster=controlplane=https://127.0.0.1:2380 \
--initial-cluster-token=etcd-cluster-1 \
--initial-advertise-peer-urls=https://127.0.0.1:2380 \

What are these extra parameters doing?

  • Giving the etcd cluster a new name
  • Restoring the etcd snapshot to the /var/lib/etcd-from-backup directory
  • Re-initializing the etcd cluster token since we are creating a new cluster
  • Specifying the IP:Port for etcd-to-etcd communication

Output:

{"level":"info","ts":1603021679.8156757,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"etcd.db","wal-dir":"/var/lib/etcd-from-backup/member/wal","data-dir":"/var/lib/etcd-from-backup","snap-dir":"/var/lib/etcd-from-backup/member/snap"}
{"level":"info","ts":1603021679.8793259,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"7581d6eb2d25405b","local-member-id":"0","added-peer-id":"e92d66acd89ecf29","added-peer-peer-urls":["https://127.0.0.1:2380"]}
{"level":"info","ts":1603021679.9166896,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"etcd.db","wal-dir":"/var/lib/etcd-from-backup/member/wal","data-dir":"/var/lib/etcd-from-backup","snap-dir":"/var/lib/etcd-from-backup/member/snap"}

For the CKA exam, this is all that’s necessary to complete the task! However, in production you wouldn’t want to stop here since this process doesn’t modify the existing etcd pod in anyway. The restore process will expand the snapshot into the directory specified and make some changes to the data to represent the new name and cluster token but that’s it!

In production, you would modify the etcd pod’s manifest in /etc/kubernetes/manifests/etcd.yaml to utilize the new data directory and the initial cluster token. Upon saving the file, the etcd pod will be destroyed and recreated and after a minute or so, the pod will be running and ready. You will also be able to check it by viewing the logs with kubectl logs <etcd-pod-name>.

11 thoughts on “Backup and Restore etcd in Kubernetes Cluster for CKA v1.19

  1. Pingback: CKA 2020 Curriculum for Kubernetes v1.19 | Brandon Willmott

  2. Pingback: Important Directories to Know for Kubernetes CKA Exam | Brandon Willmott

  3. Satya

    Hi
    Can we execute restore from the edge node, if I have “db” file on edge node, instead of ssh to master node?

    Reply
    1. Brandon Willmott Post author

      Hi Satya, while I think it’s technically possible, I don’t think that will pass for the exam since etcd should only run on control plane nodes, they will likely be looking to see if the file was restored to the control plane node.

      Reply
  4. Biswanath Roy

    I have the similar question regarding this ETCD restore . I went to the master node and then edited the etcd.yaml file under /etc/kubernetes/manifest and then moved this file to a different location to stop the static pod. Then i ran the command kubectl get pods -n kube-system but could not find any pods in the exam . Am i doing something wrong ?

    Reply
    1. Brandon Willmott Post author

      Hi Biswanath, the etcd restore step on the exam can be performed without modifying the existing etcd pod. I know it seems counterintuitive to a restore operation in production, but on the exam they’re just looking to see the backup file restored to a new directory.

      Reply
  5. Gabriel

    Hi Brandon, in the exam CKA what is the bet and easy way to make a restore.
    I am confuse, I need to restore on another data-dir location? can I use the same

    Reply
    1. Brandon Willmott Post author

      Hi Gabriel, yes you will need to restore the backup to a different directory with –data-dir. You don’t need to create the directory first. In fact, etcdctl won’t restore to a directory that already exists. Use the full etcdctl snapshot restore command in the post and you’ll be perfect!

      Reply
      1. Deepak

        Hi Brandon ,

        Regarding restore , do we have to run backup and restore commands from the edge node/jump server or we have to ssh to master node of that cluster in order to take the backup.
        And also how come the test script will verify that restore is done properly if we have not restarted the etcd service with the restore data dir

      2. Brandon Willmott Post author

        Hi Deepak, you will need to ssh to a master/control plane node. From my understanding of the etcdctl restore process, it reports that it successfully expanded the snapshot to a data directory. In production you would need to tell etcd about the new data-dir but the exam doesn’t ask you to do that.

  6. Jose

    Hi Brandon,

    Please excuse me if this is obvious, but new with kubernetes. Where is this snapshot actually saved? I ran find on my system and found this location, but not sure if this is the right place /var/lib/docker/overlay2/c166fe1d0b3eee9275f84845c7ae3c0014648ebbddf27e37303358058a45a5a5/diff/etcd.db

    Is it safe to move this file or off-load it?

    For restore, suppose I’ve created a snapshot, located the file and off-loaded to external storage. Then disaster happens and I’ve lost all etcd members and the the dbs. When I want to restore, where do I put the backup file so that I can be restored or how to tell etcdctl to restore from a specific path?

    Thanks

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s