tencent cloud

Tencent Kubernetes Engine

CVE-2026-31431 vulnerability remediation documentation

PDF
Focus Mode
Font Size
Last updated: 2026-05-05 22:38:56

CVE-2026-31431 vulnerability remediation documentation

Background

CVE-2026-31431 (also known as Copy Fail) is a high-risk local privilege escalation vulnerability in the Linux kernel crypto subsystem. The issue resides in the algif_aead module, which is the AEAD socket interface (AF_ALG) of the kernel's userspace crypto API. This vulnerability can affect the kernel page cache via AF_ALG + splice(). An attacker who already has the ability to execute user-space code on a node may further escalate privileges to root on that node or cause cross-container impact (processes on the same host share the kernel page cache). The Linux kernel community describes the fix for this vulnerability as: crypto: algif_aead - Revert to operating out-of-place.

Impact Scope

Affected Versions:
Linux kernel: versions that include the CONFIG_CRYPTO_USER_API_AEAD=m or =y compilation configuration and have not yet applied the CVE-2026-31431 fix patch.
TKE nodes: regular nodes, native nodes, and super nodes running the aforementioned affected kernel versions.
Register nodes: The operating system is managed by the user, who must assess the kernel version and associated risks.
Note:
If the kernel is compiled with CONFIG_CRYPTO_USER_API_AEAD=n (the module is not compiled), it is inherently immune and requires no action.
Attack Prerequisites:
An attacker must already have the ability to execute code in a container or user-space process on the target node (such as by gaining node execution capability through container escape, exec permissions, malicious images, and so on) to attempt local privilege escalation by exploiting this vulnerability. This vulnerability cannot be exploited directly over a remote connection, but it poses a higher risk in multi-tenant clusters, CI/CD Runner clusters, and clusters with open exec/debug permissions.
The following clusters require special attention:
Cluster Type
Risk Description
Multi-tenant cluster
Untrusted users can submit Pods, posing a high risk.
CI/CD Runner cluster
Build tasks typically execute untrusted code, posing a high risk.
Online service co-located cluster
If a single business container is compromised, it may affect the entire node.
Cluster with exec/debug permissions enabled
Attackers can more easily gain the ability to execute code within containers.
Cluster running high-privilege Pods
Lateral movement risk is greater after a vulnerability is exploited.
GPU / MaaS inference cluster
Typically hosts third-party models, code, plugins, or user tasks, requiring critical evaluation.

Fixing recommendations

1. Complete Fix (Recommended)

The remediation solutions differ for different node types:
The operating system for regular nodes is provided by CVM images. Please monitor the CVM public image update records and, after an image version containing the CVE-2026-31431 fix patch is released:
Newly created nodes use the patched version by default and are inherently unaffected by this vulnerability.
Existing nodes can be remediated by replacing nodes: first, drain the node to migrate the business Pods to other nodes; then, remove the old node from the cluster; finally, add a new node to the cluster (the new node uses the patched image by default).
The operating system for native nodes is uniformly managed by TKE. Please monitor TKE's subsequent announcements and updates, and after TKE releases a node image containing the fix patch:
Newly created nodes use the patched version by default and are inherently unaffected by this vulnerability.
Existing nodes can be remediated by replacing nodes: first, drain the node to migrate the business Pods to other nodes; then, remove the old node from the cluster; finally, add a new node to the cluster (the new node uses the patched image by default).
The underlying node resources for super nodes are uniformly managed by TKE. Please monitor TKE's subsequent announcements and updates, and after TKE releases the fix:
Newly created Pods are scheduled to run on patched nodes by default and are inherently unaffected by this vulnerability.
Existing Pods can be remediated by rebuilding Pods (deleting them and letting the controller reschedule them) to complete the rebuild and achieve remediation.

2. Temporary Mitigation (Before Kernel Upgrade)

Temporary mitigation cannot replace kernel upgrades. Please complete the thorough remediation as soon as possible.
Before completing node OS upgrades, temporary mitigation measures can be adopted to reduce the attack surface. Regular nodes / native nodes / super nodes do not load algif_aead by default, resulting in a relatively low overall risk. The temporary mitigation works by using a DaemonSet to write a blocklist to /etc/modprobe.d and unload the module. This DaemonSet requires high privileges (hostPID/hostNetwork/privileged) and constitutes a high-risk, node-level change. It must be deployed using a canary-first, then full-scale rollout approach.
Step 1: Select 1 to 3 low-risk nodes for pre-checking and labeling:
Single-node pre-check command: It can be executed on a small number of nodes first.
uname -r
lsmod | grep '^algif_aead' || echo "algif_aead not loaded"
modinfo algif_aead 2>/dev/null || echo "algif_aead module not found or built-in"
test -f /etc/modprobe.d/blacklist-algif_aead.conf && \\
cat /etc/modprobe.d/blacklist-algif_aead.conf || \\
echo "blacklist config not found"
Inspection result
Conclusions
algif_aead is visible via lsmod
The current runtime state poses a risk and requires uninstallation.
The configuration file does not exist.
It may still be automatically loaded after a restart and requires fixing.
Only blacklist algif_aead
It is recommended to enhance it to install algif_aead /bin/false.
The module information cannot be found via modinfo, but the feature still exists.
It may be built-in or due to a path discrepancy, and requires further confirmation.
The module is in use and cannot be unloaded.
It requires evaluation based on business needs and may take effect after a node restart.
After the pre-check is completed, label the nodes:
kubectl label node <node-1> algif-aead-fix=canary
kubectl label node <node-2> algif-aead-fix=canary
Step 2: Save it as disable-algif-aead-canary.yaml and deploy the Canary DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: disable-algif-aead
namespace: kube-system
labels:
app: disable-algif-aead
spec:
selector:
matchLabels:
app: disable-algif-aead
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
app: disable-algif-aead
spec:
hostPID: true
hostNetwork: true
tolerations:
- operator: Exists
nodeSelector:
algif-aead-fix: canary
restartPolicy: Always
containers:
- name: disable-algif-aead
image: busybox:1.36
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
volumeMounts:
- name: host-root
mountPath: /host
readOnly: false
command:
- /bin/sh
- -c
- |
set -eu

MODULE="algif_aead"
CONF="/host/etc/modprobe.d/blacklist-algif_aead.conf"
STATE_DIR="/host/var/lib/cve-2026-3143"
STATE_FILE="${STATE_DIR}/pre-fix-state.json"
ROLLBACK_DONE="${STATE_DIR}/rollback-done"

echo "[INFO] node=$(hostname) start precheck"

mkdir -p /host/etc/modprobe.d
mkdir -p "${STATE_DIR}"

# ========== Clean up the previous rollback markers (supports repeated disable/rollback operations) ==========
if [ -f "${ROLLBACK_DONE}" ]; then
echo "[FIX] clearing previous rollback marker"
rm -f "${ROLLBACK_DONE}"
fi

# ========== Record the pre-disable state (for rollback) ==========
MODULE_LOADED="false"
MODULE_LINE=""
if grep -q "^${MODULE} " /proc/modules; then
MODULE_LOADED="true"
MODULE_LINE=$(grep "^${MODULE} " /proc/modules)
fi

CONF_EXISTED="false"
CONF_BACKUP=""
if [ -f "${CONF}" ]; then
CONF_EXISTED="true"
CONF_BACKUP=$(cat "${CONF}" 2>/dev/null || echo "")
fi

# Write the state file only during the first execution to avoid overwriting the original state during repeated runs.
if [ ! -f "${STATE_FILE}" ]; then
cat > "${STATE_FILE}" <<SNAP
{
"timestamp": "$(date -u '+%Y-%m-%dT%H:%M:%SZ')",
"hostname": "$(hostname)",
"kernel": "$(uname -r)",
"module_was_loaded": ${MODULE_LOADED},
"module_proc_line": "${MODULE_LINE}",
"blacklist_conf_existed": ${CONF_EXISTED},
"blacklist_conf_backup": "${CONF_BACKUP}",
"cve": "CVE-2026-3143"
}
SNAP
chmod 0644 "${STATE_FILE}"
echo "[STATE] pre-fix state saved to ${STATE_FILE}"
else
echo "[STATE] state file already exists, skip (idempotent)"
fi

echo "[STATE] recorded state:"
cat "${STATE_FILE}"

# ========== Precheck ==========
if [ "${MODULE_LOADED}" = "true" ]; then
echo "[PRECHECK] ${MODULE} is loaded: ${MODULE_LINE}"
else
echo "[PRECHECK] ${MODULE} is not loaded"
fi

if [ "${CONF_EXISTED}" = "true" ]; then
echo "[PRECHECK] existing config:"
echo "${CONF_BACKUP}"
else
echo "[PRECHECK] config not found"
fi

# ========== Fix ==========
echo "[FIX] writing blacklist config"
cat > "${CONF}" <<EOF
# Managed by TKE DaemonSet disable-algif-aead
# CVE: CVE-2026-3143
blacklist algif_aead
install algif_aead /bin/false
EOF

chmod 0644 "${CONF}"
sync

echo "[FIX] try to unload ${MODULE}"
if grep -q "^${MODULE} " /proc/modules; then
if chroot /host /sbin/modprobe -r "${MODULE}" 2>/tmp/modprobe-r.err; then
echo "[FIX] modprobe -r ${MODULE} succeeded"
else
echo "[WARN] modprobe -r failed:"
cat /tmp/modprobe-r.err || true

if chroot /host /sbin/rmmod "${MODULE}" 2>/tmp/rmmod.err; then
echo "[FIX] rmmod ${MODULE} succeeded"
else
echo "[ERROR] failed to unload ${MODULE}:"
cat /tmp/rmmod.err || true
fi
fi
else
echo "[FIX] ${MODULE} is not loaded, skip unload"
fi

# ========== Postcheck ==========
echo "[POSTCHECK] config:"
cat "${CONF}"

if grep -q "^${MODULE} " /proc/modules; then
echo "[POSTCHECK][FAIL] ${MODULE} is still loaded"
else
echo "[POSTCHECK][OK] ${MODULE} is not loaded"
fi

while true; do
sleep 3600
done
volumes:
- name: host-root
hostPath:
path: /
type: Directory
kubectl apply -f disable-algif-aead-canary.yaml

# Observation (Success criterion: The log shows [POSTCHECK][OK] algif_aead is not loaded)
kubectl -n kube-system get pod -l app=disable-algif-aead -o wide
kubectl -n kube-system logs -l app=disable-algif-aead --tail=50
Step 3: After the Canary passes, delete the nodeSelector section from the YAML and apply it fully:
kubectl apply -f disable-algif-aead.yaml
kubectl -n kube-system rollout status ds/disable-algif-aead --timeout=10m
kubectl -n kube-system get ds disable-algif-aead
Post-repair check
# DaemonSet Status (Expected: DESIRED = READY)
kubectl -n kube-system get ds disable-algif-aead

# Log check (Success criterion: [POSTCHECK][OK] algif_aead is not loaded)
kubectl -n kube-system logs -l app=disable-algif-aead --tail=300 | grep -E "POSTCHECK|ERROR|WARN|FAIL|OK"

# Sample check of host configurations
POD=$(kubectl -n kube-system get pod -l app=disable-algif-aead -o jsonpath='{.items[0].metadata.name}')
kubectl -n kube-system exec "$POD" -- cat /host/etc/modprobe.d/blacklist-algif_aead.conf
kubectl -n kube-system exec "$POD" -- grep '^algif_aead ' /proc/modules || echo "not loaded - OK"

# Node and Service Status
kubectl get nodes
kubectl get events -A --sort-by='.lastTimestamp' | tail -100
Focus on confirming that there are no new occurrences of: NodeNotReady / MemoryPressure / DiskPressure / PIDPressure / NetworkUnavailable / FailedCreatePodSandBox / FailedMount, and verifying whether service-level metrics such as Pod restarts, error rates, CNI/DNS connectivity, storage mounting, and GPU scheduling (including the NVIDIA device plugin) are normal. If scenarios involve self-developed encryption components, IPsec, or dependencies on the kernel crypto API, perform additional verification.

Exception Handling and Rollback

If module unloading fails (rmmod: ERROR: Module algif_aead is in use), do not forcibly terminate business processes. After confirming that the configuration file has been written, record the node as "configured but not unloaded". Restart or replace the node during off-peak business hours. If the log shows [POSTCHECK][FAIL] algif_aead is still loaded, it indicates that the configuration has been persisted but runtime unloading failed. Common causes include: the module is in use, the module is built-in, the node lacks modprobe/rmmod, or a security policy prevents unloading.
If a DaemonSet Pod fails to start, the common cause is that PSP / OPA / Gatekeeper policies block privileged containers. Temporarily lift the k8sPSPPrivilegedContainer, k8sPSPHostNamespace, and k8sPSPHostNetworkingPorts restrictions for the kube-system namespace. Restore them after the fix is complete.
If a node becomes abnormal, diagnose it first. If the node hosts critical business, immediately drain it, then roll back or replace the node:
kubectl describe node <node-name>
kubectl get events -A --sort-by='.lastTimestamp' | tail -200
# Critical Business Nodes: Drain First
kubectl cordon <node-name>
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
If a business exception occurs (e.g., in encryption, networking, or security components), immediately stop expanding the scope and execute the following:
kubectl -n kube-system delete ds disable-algif-aead
Then, execute the rollback DaemonSet below to restore the configuration. The rollback actions include: deleting the host blocklist configuration, optionally reloading algif_aead, and checking business recovery status. Note that deleting the DaemonSet only deletes the Pods and does not automatically delete the configuration files on the host. You must use the following rollback DaemonSet to clean them up.
Save it as rollback-algif-aead.yaml and execute:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: rollback-algif-aead
namespace: kube-system
labels:
app: rollback-algif-aead
spec:
selector:
matchLabels:
app: rollback-algif-aead
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
app: rollback-algif-aead
spec:
hostPID: true
hostNetwork: true
tolerations:
- operator: Exists
restartPolicy: Always
containers:
- name: rollback-algif-aead
image: busybox:1.36
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
volumeMounts:
- name: host-root
mountPath: /host
readOnly: false
command:
- /bin/sh
- -c
- |
set -eu

MODULE="algif_aead"
CONF="/host/etc/modprobe.d/blacklist-algif_aead.conf"

echo "[ROLLBACK] node=$(hostname) start rollback"

if [ -f "${CONF}" ]; then
echo "[ROLLBACK] remove ${CONF}"
rm -f "${CONF}"
sync
else
echo "[ROLLBACK] config not found, skip"
fi

echo "[ROLLBACK] try to reload ${MODULE}, optional"
if grep -q "^${MODULE} " /proc/modules; then
echo "[ROLLBACK] ${MODULE} already loaded"
else
if chroot /host /sbin/modprobe "${MODULE}" 2>/tmp/modprobe.err; then
echo "[ROLLBACK] modprobe ${MODULE} succeeded"
else
echo "[WARN] modprobe ${MODULE} failed; module may be unavailable or not required"
cat /tmp/modprobe.err || true
fi
fi

if [ -f "${CONF}" ]; then
echo "[POSTCHECK][FAIL] config still exists"
else
echo "[POSTCHECK][OK] config removed"
fi

if grep -q "^${MODULE} " /proc/modules; then
echo "[POSTCHECK] ${MODULE} is loaded"
else
echo "[POSTCHECK] ${MODULE} is not loaded"
fi

while true; do
sleep 3600
done
volumes:
- name: host-root
hostPath:
path: /
type: Directory
kubectl apply -f rollback-algif-aead.yaml

# Observation
kubectl -n kube-system get pod -l app=rollback-algif-aead -o wide
kubectl -n kube-system logs -l app=rollback-algif-aead --tail=200
# Cleanup After Success
kubectl -n kube-system delete ds rollback-algif-aead

FAQs

Are TKE Regular / Native / Super Nodes Vulnerable by Default?

By default, the operating systems of these three node types are managed by TKE, and the algif_aead module is not enabled, resulting in a relatively low overall risk. For Register nodes, users need to perform their own assessment.

Will Disabling algif_aead Affect HTTPS, SSH, and Regular TLS?

Usually not. Common TLS/SSH/OpenSSL implementations use user-space crypto libraries and do not depend on algif_aead. If you have self-developed kernel encryption components or IPsec scenarios, perform a canary deployment first.

Is a Kernel Upgrade Still Required After the DaemonSet Temporary Fix is Applied?

Yes. A DaemonSet is only a temporary mitigation. To completely resolve the issue, upgrade to a kernel or node image that contains the patch for CVE-2026-31431.

Why Write install algif_aead /bin/false Instead of Just blacklist?

The blacklist primarily prevents automatic loading, while install algif_aead /bin/false further blocks manual modprobe. Configuring both simultaneously provides stronger security hardening.

Why Does the DaemonSet Require privileged Permissions?

You need to make changes to the host's /etc/modprobe.d directory and unload the kernel module. The operation cannot be completed without privileged access and a mount of the host's root directory.

What to Do If PSP Is Enabled on the Cluster?

If k8sPSPPrivilegedContainer, k8sPSPHostNamespace, and k8sPSPHostNetworkingPorts are enabled and kube-system is not exempted, you must first temporarily lift the restrictions on the kube-system namespace and restore them after the fix is completed. The super node solution is not subject to this restriction.

What to Do If Service Exceptions Occur After the Fix Is Applied?

Immediately stop expanding the scope and perform a rollback by referring to the Exception Handling and Rollback section above.


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback