tencent cloud

Custom Training Image Specification
Last updated: 2025-05-09 15:58:08
Custom Training Image Specification
Last updated: 2025-05-09 15:58:08
If the built-in images on the platform do not meet your requirements, you can also use a custom image to create training tasks and Notebook instances. The following is a Dockerfile example for a custom image:

Basic Image Specification

To enable a custom image to start a training task in task-based modeling, the image must have the openssh-server component installed. An example is provided below:
# Self-modify the basic image
FROM ubuntu:20.04

# Install openssh-server
RUN apt-get update && apt-get install -y openssh-server && apt-get clean && mkdir -p /var/run/sshd
Note:
If the basic image is a centos system, use yum/dnf for package management and self-adjust the installation command.

Notebook Image Specification

To enable a custom image to launch an instance in Notebook, besides meeting the above basic image specifications, it is also required to install the JupyterLab component and set the appropriate /opt/dl/run startup script. An example is as follows:
# Self-modify the basic image
FROM ubuntu:20.04

# Install openssh-server
RUN apt-get update && apt-get install -y openssh-server && apt-get clean && mkdir -p /var/run/sshd

# Install python3, pip3
RUN apt-get update && apt-get install -y python3.8 python3.8-distutils curl && \\
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \\
python3.8 get-pip.py && rm -f get-pip.py

# Install jupyterlab
RUN pip3 install jupyterlab

# Configure the /opt/dl/run startup script
RUN mkdir -p /opt/dl && echo "cd /home/tione/notebook && jupyter lab --allow-root --no-browser --ip=0.0.0.0 --port=8888 --notebook-dir=/home/tione/notebook --NotebookApp.allow_origin='*' --NotebookApp.token=''" > /opt/dl/run && chmod a+x /opt/dl/run
Note:
If the basic image already includes the corresponding package, skip the corresponding installation command.
Here, /home/tione/notebook is the default disk mounting path for the notebook. Whether this path exists in the mirror does not affect its use on the platform.

Complete Custom Image Dockerfile Example

Here's a complete custom image example based on NVIDIA's PyTorch image that supports creating training tasks and Notebook instances. It is recommended to directly create a custom image based on this example:
# [Recommended] Use NVIDIA's PyTorch image as the basic image to be compatible with newer open-source libraries and GPU card types.
FROM nvcr.io/nvidia/pytorch:23.07-py3

# [Recommended] Modify the software source (if using in Tencent Cloud, it is recommended to use the private network source).
# [Tencent Public Network Software Source] mirrors.tencent.com
# [Tencent Cloud Private Network Software Source] mirrors.tencentyun.com
ENV TENCENT_MIRRORS="mirrors.tencentyun.com"
RUN sed -i "s/archive.ubuntu.com/${TENCENT_MIRRORS}/g" /etc/apt/sources.list && \\
sed -i "s/security.ubuntu.com/${TENCENT_MIRRORS}/g" /etc/apt/sources.list && \\
pip config set global.index-url http://${TENCENT_MIRRORS}/pypi/simple && \\
pip config set global.no-cache-dir true && \\
pip config set global.trusted-host ${TENCENT_MIRRORS}

# [Recommended] If using NVIDIA's PyTorch mirror, it is recommended to delete the default NVIDIA source to speed up pip package query and installation.
RUN rm /etc/xdg/pip/pip.conf /etc/pip.conf /root/.pip/pip.conf /root/.config/pip/pip.conf && pip config unset global.extra-index-url

# [Basic Image Specification] Install openssh-server. The SSH login functionality of notebook and task-based modeling both depend on the openssh-server component.
RUN apt-get update && apt-get install -y openssh-server && apt-get clean && mkdir -p /var/run/sshd

# [Notebook Image Specification] Configure the /opt/dl/run startup entry
RUN mkdir -p /opt/dl && echo "cd /home/tione/notebook && jupyter lab --allow-root --no-browser --ip=0.0.0.0 --port=8888 --notebook-dir=/home/tione/notebook --NotebookApp.allow_origin='*' --NotebookApp.token=''" > /opt/dl/run && chmod a+x /opt/dl/run

# [Recommended] Use tini as the entrypoint to facilitate reclaiming zombie processes
RUN apt-get update && apt-get install -y tini && apt-get clean
ENTRYPOINT ["/usr/bin/tini", "-g", "--"]

# [Optional - Recommended installation when using HCC-GPU instances] TCCL RDMA communication optimization
# (If using NVIDIA's PyTorch mirror, need to delete the pre-installed NCCL plugin in /opt/hpcx/nccl_rdma_sharp_plugin/lib)
RUN wget https://taco-1251783334.cos.ap-shanghai.myqcloud.com/nccl/nccl-rdma-sharp-plugins_1.2_amd64.deb && \\
dpkg -i nccl-rdma-sharp-plugins_1.2_amd64.deb && rm -f nccl-rdma-sharp-plugins_1.2_amd64.deb && \\
rm -rf /opt/hpcx/nccl_rdma_sharp_plugin/lib/*

# [Optional] Install Tikit (excluding big data components)
RUN pip install tencentcloud-sdk-python==3.0.955 coscmd==1.8.6.31 && \\
pip install --no-dependencies -U tikit

# [Custom] Install required dependency libraries.
RUN pip3 install accelerate==0.21.0 bitsandbytes==0.40.2 datasets==2.14.1 deepspeed==0.10.0 evaluate==0.4.0 peft==0.4.0 protobuf==3.20.3 scipy==1.10.1 sentencepiece==0.1.99 transformers==4.31.0
Note:
Recommend using Tencent Cloud's software source for Custom Image to support faster installation speed. The above example already includes the configuration method. To configure another software source, see Tencent Cloud Software Source to Accelerate Software Package Download and Update.
If you need to use HCC - GPU instances for multi - machine training, it is recommended to install the TCCL plug - in in the suggested image to optimize RDMA communication under Tencent Cloud StarPulse network. The above example has included the configuration method. If you want to use other installation methods, see Installation Instructions of TCCL for GPU Instances.
It is recommended to install Tikit in the Notebook image to easily submit training tasks. The above example already includes the simplest installation method. If you need to use big data components and require the full installation of Tikit, see Tikit Installation and Initialization for more information.
Training images do not currently support variables declared in bash configuration files such as bashrc.

Was this page helpful?
You can also Contact Sales or Submit a Ticket for help.
Yes
No

Feedback