Release Notes and Announcements
- Release Notes
- Announcements
- Release Notes
Product Introduction
Purchase Guide
- Purchase Instructions
- Purchase a TKE General Cluster
- Purchasing Native Nodes
- Purchasing a Super Node
Getting Started
Cluster Configuration
- General Cluster Overview
- Cluster Management
- Network Management
- Storage Management
- Node Management
- GPU Resource Management
- Remote Terminals
Application Configuration
- Workload Management
- Service and Configuration Management
- Component and Application Management
- Auto Scaling
- Container Login Methods
Observability Configuration
- Ops Observability
- Cost Insights and Optimization
Scheduler Configuration
- Scheduling Component Overview
- Resource Utilization Optimization Scheduling
- Business Priority Assurance Scheduling
- QoS Awareness Scheduling
Security and Stability
- TKE Security Group Settings
- Identity Authentication and Authorization
- Application Security
Multi-cluster Management
- Planned Upgrade
- Backup Center
Cloud Native Service Guide
- Cloud Service for etcd
- TMP
- TKE Serverless Cluster Guide
- TKE Registered Cluster Guide
Use Cases
- Cluster
- Serverless Cluster
- Scheduling
- Security
- Service Deployment
- Network
- Release
- Logs
- Monitoring
- OPS
- Terraform
- DevOps
- Auto Scaling
- Containerization
- Microservice
- Cost Management
- Hybrid Cloud
- AI
Troubleshooting
API Documentation
- History
- Introduction
- API Category
- Making API Requests
- Elastic Cluster APIs
- Resource Reserved Coupon APIs
- Cluster APIs
- Third-party Node APIs
- Relevant APIs for Addon
- Network APIs
- Node APIs
- Node Pool APIs
- TKE Edge Cluster APIs
- Cloud Native Monitoring APIs
- Scaling group APIs
- Super Node APIs
- Other APIs
- Data Types
- Error Codes
- TKE API 2022-05-01
FAQs
- TKE General Cluster
- TKE Serverless Cluster
- About OPS
- Hidden Danger Handling
- About Services
- Image Repositories
- About Remote Terminals
- Event FAQs
- Resource Management
Service Agreement
- TKE Service Level Agreement
- TKE Serverless Service Level Agreement
Contact Us
Glossary

Installing CoScheduling for Batch Scheduling

Download

Focus Mode

Font Size

Last updated: 2024-12-24 15:48:37

Background
For AI, big data, and other multi-task collaboration scenarios, there is an "All-or-Nothing" requirement for scheduling, meaning all tasks must be scheduled at the same time. CoScheduling is an open-source solution that schedules a group of Pods (or PodGroups) to the same node simultaneously within a Kubernetes cluster. This document will explain how to install CoScheduling for batch scheduling on TKE.
Prerequisites
A TKE cluster has been created.
Helm has been already installed.
The TKE cluster's kubeconfig has been configured, with permissions to operate the TKE cluster granted. For details, please refer to Connect to the Cluster.
Using Helm for Installation
Installing CoScheduler as the Second Scheduler
When scheduling pods, it is required to specify schedulerName as scheduler-plugins-scheduler by using the following command:
$ git clone git@github.com:kubernetes-sigs/scheduler-plugins.git
$ cd scheduler-plugins/manifests/install/charts
$ helm install scheduler-plugins as-a-second-scheduler/ --create-namespace --namespace scheduler-plugins
Verifying the Installation
Run the following command to observe the Pod operating status.
$ kubectl get deploy -n scheduler-plugins
Expected output:
NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
scheduler-plugins-controller   1/1     1            1           7s
scheduler-plugins-scheduler    1/1     1            1           7s
How to Use?
PodGroup
PodGroup is a custom resource from the CoScheduling component, used to define the minimum number of Pods that need to be scheduled simultaneously. By setting a tag, you can indicate which Pod belongs to a particular PodGroup. Below is a standard example of the PodGroup CRD:
# PodGroup CRD spec
apiVersion: scheduling.x-k8s.io/v1alpha1
kind: PodGroup
metadata:
  name: nginx
spec:
  scheduleTimeoutSeconds: 10
  minMember: 3
---
# Add a label `scheduling.x-k8s.io/pod-group` to mark the pod belongs to a group
labels:
  scheduling.x-k8s.io/pod-group: nginx
We will calculate the sum of running and pending (assumed but unbound) pods in the scheduler. If the sum is greater than or equal to minMember, a pending pod will be created. Pods with different priorities within the same PodGroup may cause unexpected behaviors. Therefore, it's essential to ensure that the Pods within the same PodGroup have the same priority.
Sample
Assume we have a cluster that can accommodate only 3 nginx pods. We create a ReplicaSet with replicas=6 and set the value of minMember to 3.
apiVersion: scheduling.x-k8s.io/v1alpha1
kind: PodGroup
metadata:
  name: nginx
spec:
  scheduleTimeoutSeconds: 10
  minMember: 3
---
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 6
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      name: nginx
      labels:
        app: nginx
        scheduling.x-k8s.io/pod-group: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        resources:
          limits:
            cpu: 3000m
            memory: 500Mi
          requests:
            cpu: 3000m
            memory: 500Mi
Three Pods will be scheduled together as follows:
$ kubectl get pods
NAME          READY   STATUS    RESTARTS   AGE
nginx-4jw2m   0/1     Pending   0          55s
nginx-4mn52   1/1     Running   0          55s
nginx-c9gv8   1/1     Running   0          55s
nginx-frm24   0/1     Pending   0          55s
nginx-hsflk   0/1     Pending   0          55s
nginx-qtj5f   1/1     Running   0          55s
If you now change the value of minMember to 4, all nginx pods will be in a pending state because the value of 3 for minMember defined by the PodGroup is not met:
$ kubectl get pods
NAME          READY   STATUS    RESTARTS   AGE
nginx-4vqrk   0/1     Pending   0          3s
nginx-bw9nn   0/1     Pending   0          3s
nginx-gnjsv   0/1     Pending   0          3s
nginx-hqhhz   0/1     Pending   0          3s
nginx-n47r7   0/1     Pending   0          3s
nginx-n7vtq   0/1     Pending   0          3s
﻿
﻿

Help and Support

Was this page helpful?

You can also Contact sales or Submit a Ticket for help.

Help us improve! Rate your documentation experience in 5 mins.

Feedback

tencent cloud

Tencent Kubernetes Engine

Installing CoScheduling for Batch Scheduling

Background

Prerequisites

Using Helm for Installation

Installing CoScheduler as the Second Scheduler

Verifying the Installation

How to Use?

PodGroup

Sample

Help and Support