tencent cloud

Resource Guide for Large Model Training
最終更新日:2025-05-09 15:54:30
Resource Guide for Large Model Training
最終更新日: 2025-05-09 15:54:30
This document aims to introduce the configuration resources that can guarantee the normal running of the model when performing large-scale model training on the TI-ONE platform, for your reference only.
The following are recommended resources for training the platform built-in open-source large model.

Recommended Resources (SFT-FULL)
BatchSize=1,MaxSequenceLength=2048
Recommended Resources (SFT-LORA)
BatchSize=1,MaxSequenceLength=2048
Models below 7B
HCCPNV6 Model: 1 card for models below 3b; 2 cards for 7b/8b models;
HCCPNV6 model: 1 GPU
13b model
HCCPNV6 model: 4 GPUs
HCCPNV6 model: 1 card
32b model
HCCPNV6 model: 8 GPUs
HCCPNV6 model: 2 GPUs
70b model
HCCPNV6 model: 2 machines with 16 GPUs
HCCPNV6 model: 4 GPUs
DeepSeek-R1-671b/DeepSeek-V3-671b
HCCPNV6 model: 32 machines with 256 GPUs
Not supported.
Hunyuan-large
HCCPNV6 model: 8 machines with 64 GPUs
HCCPNV6 model: 8 GPUs
The platform built-in open-source large model uses the LORA fine-tuning method by default, which can be configured through the FinetuningType parameter.
The 7b model requires 100 cores and 500g of memory on a single node; the 13b and 70b models require 150 cores and 1t of memory on a single node. It is recommended to use the complete machine resources for larger models.
Some models use tilearn acceleration technology, which can achieve about 30% acceleration effect when training on recommended resources.


この記事はお役に立ちましたか?
営業担当者に お問い合わせ いただくか チケットを提出 してサポートを求めることができます。
はい
いいえ

フィードバック