tencent cloud

Hyper Computing Cluster

Release Notes
Product Introduction
Overview
Strengths
Scenarios
Instance Specifications
Billing
Billing Overview
Instance Regions
Purchasing Hyper Computing Cluster Instances
Getting Started
User Guide
Managing Hyper Computing Cluster
Installing nvidia-fabricmanager Service on GPU Instance
Installation Instructions of TCCL on GPU Instances
Installing RDMA Millisecond-Level Monitoring Component on GPU Instances
API Document
FAQs
FAQs
RDMA Network Configuration Component RDMA-agent Description
Contact Us
ドキュメントHyper Computing ClusterUser GuideInstalling nvidia-fabricmanager Service on GPU Instance

Installing nvidia-fabricmanager Service on GPU Instance

PDF
フォーカスモード
フォントサイズ
最終更新日: 2024-08-20 17:04:57

Overview

The Hyper Computing ClusterPNV4h instance is equipped with A100 GPUs and supports NvLink & NvSwitch. It requires the installation of the nvidia-fabricmanager service corresponding to the driver version to enable interconnection between GPUs. If you are using this instance, see this document to install the nvidia-fabricmanager service. Otherwise, you may not be able to use the GPU instance properly.

Directions

This document takes the driver version 470.103.01 as an example. You can follow the steps below for installation. You can replace the driver version after the version parameter as needed.

Installing nvidia-fabricmanager Service

1. Log in to the instance. For details, see Logging in to Linux Instance (Standard Method).
2. The installation varies by operating system. Run the corresponding command for installation.
CentOS 7.x Image
Ubuntu 18.04 Image
TencentOS 2.4 Image
version=470.103.01
yum -y install yum-utils
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y nvidia-fabric-manager-${version}-1
version=470.103.01
main_version=$(echo $version | awk -F '.' '{print $1}')
apt-get updateapt
get -y install nvidia-fabricmanager-${main_version}=${version}-*
version=470.103.01
yum -y install yum-utils
yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
yum install -y nvidia-fabric-manager-${version}-1

Starting nvidia-fabricmanager Service

Run the following commands in sequence to start the service.
systemctl enable nvidia-fabricmanager
systemctl start nvidia-fabricmanager

Viewing nvidia-fabricmanager Service Status

Run the following command to view the service status.
systemctl status nvidia-fabricmanager
If the following information is output, the service is installed successfully.




ヘルプとサポート

この記事はお役に立ちましたか?

フィードバック