Scenarios
During microservice release, service changes may cause traffic errors or interruptions. Spring Cloud Tencent provides plugins to achieve lossless deployment and decommissioning. The principle is:
Graceful deployment: After the service starts, it waits until fully ready before registering with the service registry (delayed registration when no health probe interface is configured) and begins serving traffic. It then proceeds to rolling update the next node in conjunction with the Kubernetes lifecycle.
Graceful deactivation: Before the service is stopped, it first deregisters from the service registry and rejects new requests. The service waits for existing requests to complete before going offline.
lossless deployment
Solution One: Service Readiness/Delayed Registration
In some scenarios, services support delayed loading, asynchronously loading resources after startup. For example: services need to obtain data or files from file storage COS and can only provide services externally after the data or files are fetched. If services are registered directly after application startup, it will cause service calls to fail because the service is not actually ready. Therefore, by ensuring the service is ready before services are registered and provided externally, it can ensure a smooth and lossless service launch.
TSE Polaris supports the following two service readiness/delayed registration scenarios:
Scenario 1: The service exposes a health check endpoint. Service registration occurs only after the endpoint is successfully probed.
Scenario 2: If the service does not expose a health check endpoint, service registration is delayed for a period. The default delayed registration duration is 30 seconds, which can be customized via configuration.
Operation Steps
|
spring.cloud.polaris.lossless.enabled | false | No | Switch for zero-downtime deployment and graceful shutdown. |
spring.cloud.polaris.lossless.health-check-path | None | No | Health check API for business applications. |
spring.cloud.polaris.lossless.delay-register-interval | 30000 | No | Delay for registration if no health check API is configured for the business application. The default value is 30000 (unit: ms). |
spring.cloud.polaris.lossless.health-check-interval | 5000 | No | Health check interval after a health check API is configured for the business application. The default value is 5000 (unit: ms). |
Configuration example:
apiVersion: apps/v1
kind: Deployment
......
spec:
......
template:
metadata:
annotations:
# Declare the need to inject javaagent into this POD
polarismesh.cn/javaagent: "true"
# Specify the application framework type. For SpringCloud applications, specify spring-cloud
polarismesh.cn/javaagentFrameworkName: spring-cloud
# Declare the application framework version. Currently supports hoxton, 2023.
polarismesh.cn/javaagentFrameworkVersion: hoxton
# Declare the image version of the java-agent package. Available versions: https://github.com/polarismesh/polaris-java-agent/releases
polarismesh.cn/javaagentVersion: 1.7.0-RC3
# User-defined JavaAgent configuration. If not specified, the default configuration is used. Format: JSON. For specific configurations, see: https://github.com/polarismesh/polaris-controller/blob/main/deploy/kubernetes_v1.22/kubernetes/javaagent-configmap.yaml
# Lossless launch configuration example:
# polarismesh.cn/javaagentConfig: "{\\"spring.cloud.polaris.lossless.enabled\\": \\"true\\", \\"spring.cloud.polaris.lossless.delay-register-interval\\": \\"30000\\"}"
......
Option 2: Service Registration Readiness Check
Typically, K8s provides a readiness check mechanism to perform health checks on instances before they become ready. It generally assumes that the application is ready as soon as the port is active. However, there is a gap between the port becoming active and the successful service registration. This may cause situations where the service fails to register, the old application instance is taken offline, and the next node begins deployment. Ultimately, this results in exceptions in consumer-side invocations.
TSE Polaris provides an interface and port for service registration status to cooperate with Kubernetes readiness checks. When application registration is complete, it returns a 200 status code to help Kubernetes determine the application is ready; returns a 500 status code if registration is incomplete, helping Kubernetes identify the application as not ready. Particularly during instance rolling updates, it waits until instances are ready before proceeding to update the next node.
Operation Steps:
Step 3: Configure the readiness check in Kubernetes application deployment platforms such as TKE, as shown in the following figure. Path: /online.
Port: 28080.
zero-downtime decommissioning
During a rolling release or deactivation process, when a service instance of the called service deregisters from the registry, and the calling service updates IPs from the registry, a time gap exists. This may still route requests to deactivated instances, causing service request failures.
TSE Polaris provides a graceful shutdown interface: /offline, integrating with the Kubernetes lifecycle to achieve lossless service deactivation. The overall process is as follows:
Operation Steps:
Step 3: Configure the preStop lifecycle hook in Kubernetes application deployment platforms such as TKE. preStop configuration check command: curl -X PUT http://localhost:28080/offline && sleep 20