tencent cloud

Feedback

Guide on Migrating Resources in a TKE Managed Cluster to an Serverless Cluster

Last updated: 2022-12-13 18:23:37

    Prerequisites

    • A TKE managed cluster on v1.18 or later (cluster A) exists.
    • A migration target TKE Serverless cluster on v1.20 or later (cluster B) is created. For how to create a TKE Serverless cluster, see Connecting to a Cluster.
    • Both cluster A and cluster B need to share the same COS bucket as Velero backend storage. For how to configure a COS bucket, see Configuring COS.
    • We recommend that Clusters A and B be under the same VPC, so that you can back up data in the PVC.
    • Make sure that image resources can be pulled properly after migration. For how to configure an image repository in a TKE Serverless cluster, see Image Repository FAQs.
    • Make sure that the Kubernetes versions of both clusters are compatible. We recommend you use the same version. If cluster A is on a lower version, upgrade it before migration.

    Limitations

    • After workloads with a fixed IP are enabled in a TKE cluster, their IPs will change after the migration to a TKE Serverless cluster. You can specify an IP to create a Pod in the Pod template, for example eks.tke.cloud.tencent.com/pod-ip: "xx.xx.xx.xx".
    • TKE Serverless clusters with containerd as the container runtime are not compatible with images from Docker Registry v2.5 or earlier, or Harbor v1.10 or earlier.
    • In a TKE Serverless cluster, each Pod comes with 20 GiB temporary disk space for image storage by default, which is created and terminated along the lifecycle of the Pod. If you need larger disk space, mount other types of volumes, such as PVC volumes, for data storage.
    • When deploying DaemonSet workloads on a TKE Serverless cluster, you need to deploy them on business pods in sidecar mode.
    • When deploying NodePort services on a TKE Serverless cluster, you cannot access the services through NodeIP:Port. Instead, you need to use ClusterIP:Port to access the services.
    • Pods deployed on a TKE Serverless cluster expose monitoring data via port 9100 by default. If your business Pod requires listening on port 9100, you can avoid conflicts by using other ports to collect monitoring data when creating a Pod. For example, you can configure as follows: eks.tke.cloud.tencent.com/metrics-port: "9110".
    • In addition to the preceding limitations, other points for attention of TKE Serverless clusters are described here.

    Migration Directions

    The following describes how to migrate resources from TKE cluster A to TKE Serverless cluster B.

    Configuring COS

    For operation details, see Creating a bucket.

    Downloading Velero

    1. Download the latest version of Velero to the cluster environment. Velero v1.8.1 is used as an example in this document.

      wget https://github.com/vmware-tanzu/velero/releases/download/v1.8.1/velero-v1.8.1-linux-amd64.tar.gz
      
    2. Run the following command to decompress the installation package, which contains Velero command lines and some sample files.

      tar -xvf velero-v1.8.1-linux-amd64.tar.gz
      
    3. Run the following command to migrate the Velero executable file from the decompressed directory to the system environment variable directory, that is, /usr/bin in this document, as shown below:

      cp velero-v1.8.1-linux-amd64/velero /usr/bin/
      

    Installing Velero in clusters A and B

    1. Configure the Velero client and enable CSI.

      velero client config set features=EnableCSI
      
    2. Run the following command to install Velero in clusters A and B and create Velero workloads as well as other necessary resource objects.

    • Below is an example of using CSI for PVC backup:
       velero install  --provider aws  \
       --plugins velero/velero-plugin-for-aws:v1.1.0,velero/velero-plugin-for-csi:v0.2.0 \
       --features=EnableCSI \
       --features=EnableAPIGroupVersions \
       --bucket <BucketName> \
       --secret-file ./credentials-velero \
       --use-volume-snapshots=false \
       --backup-location-config region=ap-guangzhou,s3ForcePathStyle="true",s3Url=https://cos.ap-guangzhou.myqcloud.com
      
    Note:

    TKE Serverless clusters do not support DaemonSet deployment, so none of the samples in this document support the restic add-on.

    • If you don't need to back up the PVC, see the following installation sample:
      ./velero install  --provider aws --use-volume-snapshots=false --bucket gtest-1251707795  --plugins velero/velero-plugin-for-aws:v1.1.0   --secret-file ./credentials-velero  --backup-location-config region=ap-guangzhou,s3ForcePathStyle="true",s3Url=https://cos.ap-guangzhou.myqcloud.com
      

    For installation parameters, see Using COS as Velero Storage to Implement Backup and Restoration of Cluster Resources or run the velero install --help command.
    Other installation parameters are as described below:

    Parameter Configuration
    --plugins Use the AWS S3 API-compatible add-on `velero-plugin-for-aws`; use the CSI add-on velero-plugin-for-csi to back up `csi-pv`. We recommend you enable it.
    --features Enable optional features:Enable the API group version feature. This feature is used for compatibility with different API group versions and we recommend you enable it.Enable the CSI snapshot feature. This feature is used to back up the CSI-supported PVC, so we recommend you enable it.
    --use-restic Velero supports the restic open-source tool to back up and restore Kubernetes storage volume data (hostPath volumes are not supported. For details, see here). It's used to supplement the Velero backup feature. During the migration to a TKE Serverless cluster, enabling this parameter will fail the backup.
    --use-volume-snapshots=false Disable the default snapshot backup of storage volumes.
    3. After the installation is complete, wait for the Velero workload to be ready. Run the following command to check whether the configured storage location is available. If `Available` is displayed, the cluster can access the COS bucket.
    velero backup-location get
    NAME      PROVIDER   BUCKET/PREFIX      PHASE       LAST VALIDATED                  ACCESS MODE   DEFAULT
    default   aws        <BucketName>   Available     2022-03-24 21:00:05 +0800 CST      ReadWrite     true
    

    At this point, you have completed the Velero installation. For more information, see Velero Documentation.

    (Optional) Installing VolumeSnapshotClass in clusters A and B

    Note:

    1. Check that you have installed the CBS-CSI add-on.

    2. You have granted related permissions of CBS snapshot for TKE_QCSRole on the Access Management page of the console. For details, see CBS-CSI.

    3. Use the following YAML to create a VolumeSnapshotClass object, as shown below:

      apiVersion: snapshot.storage.k8s.io/v1beta1
      kind: VolumeSnapshotClass
      metadata:
      labels:
      velero.io/csi-volumesnapshot-class: "true"
      name: cbs-snapclass
      driver: com.tencent.cloud.csi.cbs
      deletionPolicy: Delete
      
    4. Run the following command to check whether the VolumeSnapshotClass has been created successfully, as shown below:

      $ kubectl get volumesnapshotclass
      NAME            DRIVER                      DELETIONPOLICY   AGE
      cbs-snapclass   com.tencent.cloud.csi.cbs   Delete           17m
      

    (Optional) Creating sample resource for cluster A

    Note:

    Skip this step if you don't need to back up the PVC.

    Deploy a MinIO workload with the PVC in a Velero instance in cluster A. Here, the cbs-csi dynamic storage class is used to create the PVC and PV.

    1. Use provisioner in the cluster to dynamically create the PV for the com.tencent.cloud.csi.cbs storage class. A sample PVC is as follows:

      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
      annotations:
      volume.beta.kubernetes.io/storage-provisioner: com.tencent.cloud.csi.cbs
      name: minio
      spec:
      accessModes:
      - ReadWriteOnce
      resources:
      requests:
        storage: 10Gi
      storageClassName: cbs-csi
      volumeMode: Filesystem
      
    2. Use the Helm tool to create a MinIO testing service that references the above PVC. For more information on MinIO installation, see here. In this sample, a load balancer has been bound to the MinIO service, and you can access the management page by using a public network address.

    3. Log in to the MinIO web management page and upload the images for testing as shown below:

    Backup and restoration

    1. To create a backup in cluster A, see Creating a backup in cluster A in the Cluster Migration directions.
    2. To perform a restoration in cluster B, see Performing a restoration in cluster B in the Cluster Migration directions.
    3. Verify the migration result:
      • If you don't need to back up the PVC, see Verifying migration result in the Cluster Migration directions.
      • If you need to back up the PVC, perform a verification as follows:
        a. Run the following command to verify the resources in cluster B after migration. You can see that the Pods, PVC, and Service have been successfully migrated as shown below:

        b. Log in to the MinIO service in cluster B. You can see that the images in the MinIO service are not lost, indicating that the persistent volume data has been successfully migrated as expected.
    4. Now, resource migration from the TKE cluster to the TKE Serverless cluster is completed.
      After the migration is complete, run the following command to restore the backup storage locations of clusters A and B to read/write mode as shown below, so that the next backup task can be performed normally:
      kubectl patch backupstoragelocation default --namespace velero \

    --type merge
    --patch '{"spec":{"accessMode":"ReadWrite"}}'

    Serverless Cluster FAQs

    • Failed to pull an image: See Image Repository.
    • Failed to perform a DNS query: This type of failure often takes the form of failing to pull a Pod image or deliver logs to a self-built Kafka cluster. For more information, see Customized DNS Service of Serverless Cluster.
    • Failed to deliver logs to CLS: When you use a TKE Serverless cluster to deliver logs to CLS for the first time, you need to authorize the service as instructed in Enabling Log Collection.
    • By default, up to 100 Pods can be created for each cluster. If you need to create more, see Default Quota.
    • When Pods are frequently terminated and recreated, the Timeout to ensure pod sandbox error is reported: The add-ons in TKE Serverless cluster Pods communicate with the control plane for health checks. If the network remains disconnected for six minutes after Pod creation, the control plane will initiate the termination and recreation. In this case, you need to check whether the security group associated with the Pod has allowed access to the 169.254 route.
    • Pod port access failure/not ready:
      • Check whether the service container port conflicts with the TKE Serverless cluster control plane port as instructed in Port Limits.
      • If the Pod can be pinged succeeded, but the telnet failed, check the security group.
    • When creating an instance, you can use the following features to speed up image pull: Mirror cache and Mirror reuse.
    • Failed to dump business logs: After a TKE Serverless job business exits, the underlying resources are repossessed, and container logs can't be viewed by using the kubectl logs command, adversely affecting debugging. You can dump the business logs by delaying the termination or setting the terminationMessage field as instructed in How to set container's termination message?.
    • The Pod restarts frequently, and the ImageGCFailed error is reported: A TKE Serverless cluster Pod has 20 GiB disk size by default. If the disk usage reaches 80%, the TKE Serverless cluster control plane will trigger the container image repossession process to try to repossess the unused images and free up the space. If it fails to free up any space, ImageGCFailed: failed to garbage collect required amount of images will be reported to remind you that the disk space is insufficient. Common causes of insufficient disk space include:
      • The business has a lot of temporary output.
      • The business holds deleted file descriptors, so some space is not freed up.

    Learn More

    Contact Us

    Contact our sales team or business advisors to help your business.

    Technical Support

    Open a ticket if you're looking for further assistance. Our Ticket is 7x24 avaliable.

    7x24 Phone Support