Wednesday, 30 September 2020

Kubernetes (k8s) Disaster Recovery with Heptio Velero on AWS

Following post list down the steps to perform manual disaster recovery of an kubernetes cluster on AWS using Velero. 

Pre-requisites: Kubernetes cluster is up and running with one master node and one node acting as worker. Please refer following post  to setup kubernetes cluster with kubeadm on AWS free tier. 

AWS CLI is installed 

Kubernetes Cluster with kubeadm

1) Create s3 bucket to be used for backup.

export BUCKET=velero-backup-bkt

export REGION=ap-southeast-2

aws s3api create-bucket --bucket $BUCKET --region $REGION --create-bucket-configuration LocationConstraint=$REGION
 

2) Create IAM role to access bucket to be used by Velero

cat > assume-role-policy-document.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<ACCOUNT_ID>:role/InstanceRole"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

aws iam create-role --role-name velero \
  --assume-role-policy-document \
  file://assume-role-policy-document.json
Where, InstanceRole,  is the role assigned to worker node EC2 instance. If no role is assigned, then create a new role.
 
3) Create and assign policy to above role to access S3

cat > velero-trust-policy.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:DescribeSnapshots",
                "ec2:CreateTags",
                "ec2:CreateVolume",
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET}"
            ]
        }
    ]
}
EOF

aws iam put-role-policy \
  --role-name velero \
  --policy-name s3 \
  --policy-document file://velero-trust-policy.json

4) Download velero client

wget https://github.com/vmware-tanzu/velero/releases/download/v1.3.2/velero-v1.3.2-linux-amd64.tar.gz

tar -xvf velero-v1.3.2-linux-amd64.tar.gz -C /tmp sudo mv /tmp/velero-v1.3.2-linux-amd64/velero /usr/local/bin

5) Install velero

velero install \

--provider aws \

--plugins velero/velero-plugin-for-aws:v1.0.1 \

--bucket ${BUCKET} \

--backup-location-config region=${REGION} \

--snapshot-location-config region=${REGION} \

--pod-annotations iam.amazonaws.com/role=arn:aws:iam::<ACCOUNT_ID>:role/velero \

--no-secret

kubectl get all -n velero





6) Install sample NGINX application on worker node that will serve as our test application to perform disaster recovery.


cat > nginx-deployment.yaml <<EOF apiVersion: v1 kind: Namespace metadata: name: nginx-example labels: app: nginx --- apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment namespace: nginx-example spec: replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: nginx:1.17.6 name: nginx ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: labels: app: nginx name: my-nginx namespace: nginx-example spec: ports: - port: 80 targetPort: 80 selector: app: nginx type: LoadBalancer EOF kubectl apply -f nginx-deployment.yaml

Verify all the resources are up and running

kubectl get all -n nginx-example









7) Create backup either based on app selector or namespace

velero backup create nginx-backup --selector app=nginx

OR

velero backup create nginx-backup --include-namespaces nginx-example

Check the status. Should show completion time once completed successfully.

velero backup describe nginx-backup


To create scheduled backups

velero create schedule daily-backup-at-7am --schedule="0 7 * * *" --include-namespaces nginx-example

8) Check S3 bucket to confirm backup is created:








9) Now to simulate disaster, lets delete nginx-example namespace. This will delete deployment and all running NGINX pods

kubectl delete namespace nginx-example


10) Restore all resources from backup

velero restore create --from-backup nginx-backup



Check restore status:

velero restore describe nginx-backup-20200930060821

Once restore completes, exactly same nginx resources should be up and running.







To restore scheduled backup. We can either do it using specific backup or with latest backup created by schedule:

#Specific backup

velero restore create --from-backup <BACKUP_NAME>

# Latest backup from schedule

velero restore create --from-schedule <SCHEDULE-NAME>


No comments:

Post a Comment