Backup and Restore etcd in Kubernetes Cluster for CKA v1.19

The final module of the Cluster Architecture, Installation, and Configuration is Implement etcd backup and restore. Let’s quickly perform the actions we need to complete this step for the exam.

Perform a Backup of etcd

While it’s still early and details of the CKA v1.19 environment aren’t known yet, I’m anticipating a small change to how etcd backup and restore is performed. If you’ve been preparing for the CKA before the September 2020 change to Kubernetes v1.19, you may know be familiar with the environment variable export ETCDCTL_API=3 to ensure you’re using version 3 of etcd’s API, which has the backup and restore capability. However, Kubernetes v1.19 ships with etcd 3.4.9 and in etcd 3.4.x, the default API version is 3 so this process is no longer necessary! If etcdctl version returns a version lower than 3.4.x, you will still need to set the API version to 3 for performing backup and restore operations.

Get The Info You Need First

When you type the etcd backup command, you’re going to need to specify the location of a few certificates and a key. Let’s grab that really quick! Get the name of our etcd pod:

kubectl get pods -A

Get the details of our etcd pod:

kubectl describe pods etcd-controlplane -n kube-system

The output that we’re interested in is under the Command section. You will need to copy the locations of:

cert-file
key-file
trusted-ca-file
listen-client-urls

Command: 
etcd 
--advertise-client-urls=https://172.17.0.54:2379 
--cert-file=/etc/kubernetes/pki/etcd/server.crt 
--client-cert-auth=true --data-dir=/var/lib/etcd 
--initial-advertise-peer-urls=https://172.17.0.54:2380 
--initial-cluster=master=https://172.17.0.54:2380 
--key-file=/etc/kubernetes/pki/etcd/server.key 
--listen-client-urls=https://127.0.0.1:2379,https://172.17.0.54:2379 
--listen-metrics-urls=http://127.0.0.1:2381 
--listen-peer-urls=https://172.17.0.54:2380 
--name=master 
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt 
--peer-client-cert-auth=true 
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key 
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt 
--snapshot-count=10000 
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

Now here’s the fun part: the names of the options that etcd uses isn’t the same that etcdctl uses for the backup. They’re close enough to match up. Here’s how they map:

etcd options	etcdctl options
–cert-file	–cert
–key-file	–key
–trusted-ca-file	–cacert
–listen-client-urls	–endpoints

Your backup command should look like this:

etcdctl snapshot save etcd.db \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--endpoints=https://127.0.0.1:2379 \
--key=/etc/kubernetes/pki/etcd/server.key

Output:

{"level":"info","ts":1603021662.1152575,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"etcd.db.part"}{"level":"info","ts":"2020-10-18T11:47:42.129Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1603021662.1302097,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2020-10-18T11:47:42.173Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1603021662.198739,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"1.8 MB","took":0.083223978}
{"level":"info","ts":1603021662.199425,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"etcd.db"}
Snapshot saved at etcd.db

That’s it! The etcd database is backed up and we’re ready to restore!

Perform a Restore of etcd

With a restore, we’re going to specify those same 4 parameters from the backup operation but add a few more that are needed to initialize the restore as a new etcd store:

etcdctl snapshot restore etcd.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--name=controlplane \
--data-dir /var/lib/etcd-from-backup \
--initial-cluster=controlplane=https://127.0.0.1:2380 \
--initial-cluster-token=etcd-cluster-1 \
--initial-advertise-peer-urls=https://127.0.0.1:2380 \

What are these extra parameters doing?

Giving the etcd cluster a new name
Restoring the etcd snapshot to the /var/lib/etcd-from-backup directory
Re-initializing the etcd cluster token since we are creating a new cluster
Specifying the IP:Port for etcd-to-etcd communication

Output:

{"level":"info","ts":1603021679.8156757,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"etcd.db","wal-dir":"/var/lib/etcd-from-backup/member/wal","data-dir":"/var/lib/etcd-from-backup","snap-dir":"/var/lib/etcd-from-backup/member/snap"}
{"level":"info","ts":1603021679.8793259,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"7581d6eb2d25405b","local-member-id":"0","added-peer-id":"e92d66acd89ecf29","added-peer-peer-urls":["https://127.0.0.1:2380"]}
{"level":"info","ts":1603021679.9166896,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"etcd.db","wal-dir":"/var/lib/etcd-from-backup/member/wal","data-dir":"/var/lib/etcd-from-backup","snap-dir":"/var/lib/etcd-from-backup/member/snap"}

For the CKA exam, this is all that’s necessary to complete the task! However, in production you wouldn’t want to stop here since this process doesn’t modify the existing etcd pod in anyway. The restore process will expand the snapshot into the directory specified and make some changes to the data to represent the new name and cluster token but that’s it!

In production, you would modify the etcd pod’s manifest in /etc/kubernetes/manifests/etcd.yaml to utilize the new data directory and the initial cluster token. Upon saving the file, the etcd pod will be destroyed and recreated and after a minute or so, the pod will be running and ready. You will also be able to check it by viewing the logs with kubectl logs <etcd-pod-name>.

21 thoughts on “Backup and Restore etcd in Kubernetes Cluster for CKA v1.19”

Pingback: CKA 2020 Curriculum for Kubernetes v1.19 | Brandon Willmott

Pingback: Important Directories to Know for Kubernetes CKA Exam | Brandon Willmott

Satya October 1, 2020 at 2:17 pm

Hi
Can we execute restore from the edge node, if I have “db” file on edge node, instead of ssh to master node?

Reply ↓

Brandon Willmott Post authorOctober 15, 2020 at 5:47 am

Hi Satya, while I think it’s technically possible, I don’t think that will pass for the exam since etcd should only run on control plane nodes, they will likely be looking to see if the file was restored to the control plane node.

Reply ↓
1. Thomas December 7, 2020 at 4:16 am
  
  Hello Satya,
  
  I remember that there was no context given. So, I don’t know the master node. How can I solve the task without a context?
  
  Thank you,
  Thomas

Biswanath Roy October 14, 2020 at 9:41 pm

I have the similar question regarding this ETCD restore . I went to the master node and then edited the etcd.yaml file under /etc/kubernetes/manifest and then moved this file to a different location to stop the static pod. Then i ran the command kubectl get pods -n kube-system but could not find any pods in the exam . Am i doing something wrong ?

Reply ↓

Brandon Willmott Post authorOctober 15, 2020 at 5:49 am

Hi Biswanath, the etcd restore step on the exam can be performed without modifying the existing etcd pod. I know it seems counterintuitive to a restore operation in production, but on the exam they’re just looking to see the backup file restored to a new directory.

Reply ↓

Gabriel October 15, 2020 at 5:25 pm

Hi Brandon, in the exam CKA what is the bet and easy way to make a restore.
I am confuse, I need to restore on another data-dir location? can I use the same

Reply ↓

Brandon Willmott Post authorOctober 18, 2020 at 5:55 am

Hi Gabriel, yes you will need to restore the backup to a different directory with –data-dir. You don’t need to create the directory first. In fact, etcdctl won’t restore to a directory that already exists. Use the full etcdctl snapshot restore command in the post and you’ll be perfect!

Reply ↓
1. Deepak October 29, 2020 at 12:03 pm
  
  Hi Brandon ,
  
  Regarding restore , do we have to run backup and restore commands from the edge node/jump server or we have to ssh to master node of that cluster in order to take the backup.
  And also how come the test script will verify that restore is done properly if we have not restarted the etcd service with the restore data dir
2. Brandon Willmott Post authorOctober 30, 2020 at 7:00 pm
  
  Hi Deepak, you will need to ssh to a master/control plane node. From my understanding of the etcdctl restore process, it reports that it successfully expanded the snapshot to a data directory. In production you would need to tell etcd about the new data-dir but the exam doesn’t ask you to do that.

Jose November 4, 2020 at 10:04 pm

Hi Brandon,

Please excuse me if this is obvious, but new with kubernetes. Where is this snapshot actually saved? I ran find on my system and found this location, but not sure if this is the right place /var/lib/docker/overlay2/c166fe1d0b3eee9275f84845c7ae3c0014648ebbddf27e37303358058a45a5a5/diff/etcd.db

Is it safe to move this file or off-load it?

For restore, suppose I’ve created a snapshot, located the file and off-loaded to external storage. Then disaster happens and I’ve lost all etcd members and the the dbs. When I want to restore, where do I put the backup file so that I can be restored or how to tell etcdctl to restore from a specific path?

Thanks

Reply ↓

Youssef December 31, 2020 at 11:08 am

in my exam etcdctl was not installed in the master/control plane. I checked twice and was not there which is strange

Reply ↓

Brandon Willmott Post authorJanuary 4, 2021 at 9:58 am

That is very odd, Youssef. Did you pass anyway? If not, did you submit a request for a re-score from the LF Training Portal? https://jira.linuxfoundation.org/plugins/servlet/theme/portal/15/create/324

Reply ↓
1. Emil January 7, 2021 at 6:33 pm
  
  Hi,
  I had exam today and the situation is:
  1. controlplane has not installed etcctl and there is no folder for backup
  2. node-1 contains backup data and corresponding certs
  3. certs located on node-1 are not proper to backup active etcctl
  
  I don’t know how to resolve this question in proper way 😦

Srinivas February 24, 2021 at 8:54 pm

They are asking to backup and restore on student node , not on control plane node. They started this question recently. Quite confusing.

Reply ↓

Halil February 25, 2021 at 4:38 am

Exactly, you are right guys. They asked at the student node me too and I confused soo much. Also, they have endpoint ip is a localhost ip in the question. So, I don’t understand. Is the student node has an etcd? I opened the case at LF but they say, there isn’t any mistake in the question.

Reply ↓
1. Bilal March 1, 2021 at 12:06 am
  
  Exactly, just gave the exam and the same scenario. The Certs are provided in question, but the certs within the existing etcdl pod are definition are quite different. I proceeded with the backup and restore process as per normal on student node itself. But after restore with the certs provided, the etcdl pod on control plane started crashing. And between the two nodes it’s difficult to understand which endpoint to use? Controlplane node IP or student-node?

samrj March 27, 2021 at 9:01 pm

Can you please explain more about these parameters, the naming(controlplane, etcd-cluster-1), port(2380) used??
–initial-cluster=controlplane=https://127.0.0.1:2380 \
–initial-cluster-token=etcd-cluster-1 \
–initial-advertise-peer-urls=https://127.0.0.1:2380 \

Reply ↓

Usman Zubair July 1, 2021 at 7:41 pm

Same Scenario for ETCD does anyone know the answer?

Reply ↓

CNCF September 5, 2021 at 6:07 am

etcd manifest must be changed after restoring to another folder, also if etcdctl is missing on the host feel free to use the one available in the etcd container

Reply ↓

Pingback: CKA 2020 Curriculum for Kubernetes v1.19 | Brandon Willmott
Pingback: Important Directories to Know for Kubernetes CKA Exam | Brandon Willmott
Satya October 1, 2020 at 2:17 pm

Hi
Can we execute restore from the edge node, if I have “db” file on edge node, instead of ssh to master node?

Reply ↓
1. Brandon Willmott Post authorOctober 15, 2020 at 5:47 am
  
  Hi Satya, while I think it’s technically possible, I don’t think that will pass for the exam since etcd should only run on control plane nodes, they will likely be looking to see if the file was restored to the control plane node.
  
  Reply ↓
  1. Thomas December 7, 2020 at 4:16 am
    
    Hello Satya,
    
    I remember that there was no context given. So, I don’t know the master node. How can I solve the task without a context?
    
    Thank you,
    Thomas
Biswanath Roy October 14, 2020 at 9:41 pm

I have the similar question regarding this ETCD restore . I went to the master node and then edited the etcd.yaml file under /etc/kubernetes/manifest and then moved this file to a different location to stop the static pod. Then i ran the command kubectl get pods -n kube-system but could not find any pods in the exam . Am i doing something wrong ?

Reply ↓
1. Brandon Willmott Post authorOctober 15, 2020 at 5:49 am
  
  Hi Biswanath, the etcd restore step on the exam can be performed without modifying the existing etcd pod. I know it seems counterintuitive to a restore operation in production, but on the exam they’re just looking to see the backup file restored to a new directory.
  
  Reply ↓
Gabriel October 15, 2020 at 5:25 pm

Hi Brandon, in the exam CKA what is the bet and easy way to make a restore.
I am confuse, I need to restore on another data-dir location? can I use the same

Reply ↓
1. Brandon Willmott Post authorOctober 18, 2020 at 5:55 am
  
  Hi Gabriel, yes you will need to restore the backup to a different directory with –data-dir. You don’t need to create the directory first. In fact, etcdctl won’t restore to a directory that already exists. Use the full etcdctl snapshot restore command in the post and you’ll be perfect!
  
  Reply ↓
  1. Deepak October 29, 2020 at 12:03 pm
    
    Hi Brandon ,
    
    Regarding restore , do we have to run backup and restore commands from the edge node/jump server or we have to ssh to master node of that cluster in order to take the backup.
    And also how come the test script will verify that restore is done properly if we have not restarted the etcd service with the restore data dir
  2. Brandon Willmott Post authorOctober 30, 2020 at 7:00 pm
    
    Hi Deepak, you will need to ssh to a master/control plane node. From my understanding of the etcdctl restore process, it reports that it successfully expanded the snapshot to a data directory. In production you would need to tell etcd about the new data-dir but the exam doesn’t ask you to do that.
Jose November 4, 2020 at 10:04 pm

Hi Brandon,

Please excuse me if this is obvious, but new with kubernetes. Where is this snapshot actually saved? I ran find on my system and found this location, but not sure if this is the right place /var/lib/docker/overlay2/c166fe1d0b3eee9275f84845c7ae3c0014648ebbddf27e37303358058a45a5a5/diff/etcd.db

Is it safe to move this file or off-load it?

For restore, suppose I’ve created a snapshot, located the file and off-loaded to external storage. Then disaster happens and I’ve lost all etcd members and the the dbs. When I want to restore, where do I put the backup file so that I can be restored or how to tell etcdctl to restore from a specific path?

Thanks

Reply ↓
Youssef December 31, 2020 at 11:08 am

in my exam etcdctl was not installed in the master/control plane. I checked twice and was not there which is strange

Reply ↓
1. Brandon Willmott Post authorJanuary 4, 2021 at 9:58 am
  
  That is very odd, Youssef. Did you pass anyway? If not, did you submit a request for a re-score from the LF Training Portal? https://jira.linuxfoundation.org/plugins/servlet/theme/portal/15/create/324
  
  Reply ↓
  1. Emil January 7, 2021 at 6:33 pm
    
    Hi,
    I had exam today and the situation is:
    1. controlplane has not installed etcctl and there is no folder for backup
    2. node-1 contains backup data and corresponding certs
    3. certs located on node-1 are not proper to backup active etcctl
    
    I don’t know how to resolve this question in proper way 😦
Srinivas February 24, 2021 at 8:54 pm

They are asking to backup and restore on student node , not on control plane node. They started this question recently. Quite confusing.

Reply ↓
1. Halil February 25, 2021 at 4:38 am
  
  Exactly, you are right guys. They asked at the student node me too and I confused soo much. Also, they have endpoint ip is a localhost ip in the question. So, I don’t understand. Is the student node has an etcd? I opened the case at LF but they say, there isn’t any mistake in the question.
  
  Reply ↓
  1. Bilal March 1, 2021 at 12:06 am
    
    Exactly, just gave the exam and the same scenario. The Certs are provided in question, but the certs within the existing etcdl pod are definition are quite different. I proceeded with the backup and restore process as per normal on student node itself. But after restore with the certs provided, the etcdl pod on control plane started crashing. And between the two nodes it’s difficult to understand which endpoint to use? Controlplane node IP or student-node?
samrj March 27, 2021 at 9:01 pm

Can you please explain more about these parameters, the naming(controlplane, etcd-cluster-1), port(2380) used??
–initial-cluster=controlplane=https://127.0.0.1:2380 \
–initial-cluster-token=etcd-cluster-1 \
–initial-advertise-peer-urls=https://127.0.0.1:2380 \

Reply ↓
Usman Zubair July 1, 2021 at 7:41 pm

Same Scenario for ETCD does anyone know the answer?

Reply ↓
CNCF September 5, 2021 at 6:07 am

etcd manifest must be changed after restoring to another folder, also if etcdctl is missing on the host feel free to use the one available in the etcd container

Reply ↓

Brandon Willmott

Cloud, Virtualization, & Storage

Backup and Restore etcd in Kubernetes Cluster for CKA v1.19

Perform a Backup of etcd

Get The Info You Need First

Perform a Restore of etcd

21 thoughts on “Backup and Restore etcd in Kubernetes Cluster for CKA v1.19”

Leave a reply to Brandon Willmott Cancel reply

Perform a Backup of etcd

Get The Info You Need First

Perform a Restore of etcd

Share this:

Related

21 thoughts on “Backup and Restore etcd in Kubernetes Cluster for CKA v1.19”

Leave a reply to Brandon Willmott Cancel reply