tencent cloud

Data Accelerator Goose FileSystem

Release Notes and Announcements
Release Notes
Product Selection Guide
GooseFSx
Product Introduction
Quick Start
Purchase Guide
Console Guide
Tool Guide
Practical Tutorial
Service Level Agreement
Glossary
GooseFS
Product Introduction
Billing Overview
Quick Start
Core Features
Console Guide
Developer Guide
Client Tools
Cluster Configuration Practice
Data Security
Service Level Agreement
GooseFS-Lite
GooseFS-Lite Tool
Practical Tutorial
Use GooseFS in Kubernetes to Speed Up Spark Data
Access Bucket Natively with POSIX Semantics Using GooseFS
GooseFS Distributedload Tuning Practice
FAQs
GooseFS Policy
Privacy Policy
Data Processing And Security Agreement
DocumentaçãoData Accelerator Goose FileSystemGooseFSCluster Configuration PracticeProduction Environment Configuration Practice in Big Data Scenarios

Production Environment Configuration Practice in Big Data Scenarios

PDF
Modo Foco
Tamanho da Fonte
Última atualização: 2025-07-17 17:42:55

Overview

Data Accelerator Goose FileSystem (GooseFS) provides multiple deployment methods, supporting control plane deployment, TKE cluster deployment, and EMR cluster deployment. In big data scenarios, the EMR cluster mode is usually used for deployment, and a high availability architecture is adopted to meet business continuity requirements. This document focuses on high availability deployment configurations based on Zookeeper and Raft.

High-availability Architecture refers to a Master-backup active-active architecture with multiple Master nodes. Among the multiple Master nodes, only one serves as the primary (Leader) node to provide external services, while the rest Standby nodes maintain the same file system status as the primary node through synchronization sharing of journals. If the primary node fails and goes down, a Standby node is automatically selected from the current nodes to take over and continue providing external services. This eliminates the system's single point of failure and implements an overall high-availability architecture. Currently, GooseFS supports strong consistency of Master-backup node status based on two methods: Raft logs and Zookeeper.

High-Availability Architecture Deployment Configuration Based on Zookeeper

Configuring the Zookeeper service to build a high availability architecture for GooseFS requires the following conditions are met:
A Zookeeper cluster is established. The GooseFS Master node uses Zookeeper for Leader selection, while GooseFS clients and Worker nodes query the primary Master node through Zookeeper.
A highly available shared storage system with strong consistency is ready, and ensures all GooseFS Master nodes can access it. The primary Master node will write logs to this storage system, while Standby nodes continuously read logs from it and replay them to maintain state consistency with the primary node. In general circumstances, HDFS or COS is recommended for this shared storage system, such as HDFS://10.0.0.1:9000/GooseFS/journal or cosn://bucket-1250000000/journal.

After completing the prerequisites, see the following recommended configuration and copy-paste this configuration item into the goosefs-site.properties file to complete your high availability architecture configuration:
# GooseFS Master HA deployment configuration
goosefs.zookeeper.enabled=true
goosefs.zookeeper.address=<zk_quorum_1>:<zk_client_port>,<zk_quorum_2>:<zk_client_port>,<zk_quorum_3>:<zk_client_port>
goosefs.underfs.hdfs.configuration=${HADOOOP_HOME}/etc/hadoop/core-site.xml:${HADOOP_HOME}/hadoop/etc/hadoop/hdfs-site.xml
goosefs.master.journal.type=UFS
goosefs.master.journal.folder=hdfs://HDFSXXXX/goosefs

# Master metadata storage method, recommended Heap + RocksDB method, supports metadata at scale of hundreds of millions
goosefs.master.metastore=ROCKS
goosefs.master.metastore.block=ROCKS
goosefs.master.metastore.block.locations=ROCKS
For GooseFS metadata storage directory, recommend choose a directory on high IOPS storage media.
goosefs.master.metastore.dir=/data/goosefs/metastore
#Metadata exchange method, RANDOM is selected by default; if there is obvious recent hot data access, consider setting it to LRU;
# goosefs.master.metastore.cache.type=LRU
# Disable orphan block verification at startup to lower leader election time
goosefs.master.startup.block.integrity.check.enabled=false
# You can also disable periodically validating orphan blocks logic depending on the actual situation
# goosefs.master.periodic.block.integrity.check.interval=-1
# If not used the TTL feature, can also consider disabling periodic file expire check
goosefs.master.ttl.checker.interval.ms=-1
Can consider disabling replica check to reduce Master overhead
goosefs.master.replication.check.interval=-1

# Worker configuration
goosefs.worker.tieredstore.levels=1
goosefs.worker.tieredstore.level0.alias=SSD
goosefs.worker.tieredstore.level0.dirs.path=/data1/goosefsWorker,/data2/goosefsWorker
# Set the following Quota value according to actual conditions
# goosefs.worker.tieredstore.level0.dirs.quota=2000G,2000G
goosefs.worker.block.heartbeat.interval.ms=10sec
goosefs.worker.tieredstore.free.ahead.bytes=134217728
goosefs.user.block.worker.client.pool.max=512

Security certification and user simulation related
goosefs.security.authorization.permission.enabled=true
goosefs.security.authentication.type=SIMPLE
# goosefs.security.login.username=hadoop
# goosefs.master.security.impersonation.hadoop.users=*
# goosefs.security.login.impersonation.username=_HDFS_USER_

# Client configuration
goosefs.user.client.transparent_acceleration.scope=GFS_UFS
goosefs.user.client.transparent_acceleration.enabled=true
goosefs.user.file.readtype.default=CACHE
goosefs.user.file.writetype.default=CACHE_THROUGH

goosefs.user.metrics.collection.enabled=true

High-Availability Architecture Deployment Configuration Based on Raft

The deployment solution based on Raft embedded log depends on the copycat Leader election mechanism. Therefore, the highly available deployment architecture of Raft cannot intersect with Zookeeper. If you plan to build a high availability architecture based on Raft embedded log, see the following recommended configuration and copy-paste this configuration item into the goosefs-site.properties file to complete your high availability architecture configuration:
# GooseFS Master Raft deployment configuration
goosefs.master.rpc.addresses=<master1>:9200,<master2>:9200,<master3>:9200
goosefs.master.embedded.journal.addresses=<master1>:9202,<master2>:9202,<master3>:9202
Metadata checkpoint interval, defaults to 2000000, actual rate can be set based on metadata production speed in actual production environment
goosefs.master.journal.checkpoint.period.entries=xxxx
# GooseFS Journal data storage location
goosefs.master.journal.folder=/data/goosefs/journal

# Master metadata storage method, recommended Heap + RocksDB method, supports metadata at scale of hundreds of millions
goosefs.master.metastore=ROCKS
goosefs.master.metastore.block=ROCKS
goosefs.master.metastore.block.locations=ROCKS
For GooseFS metadata storage directory, recommend choose a directory on high IOPS disks.
goosefs.master.metastore.dir=/data/goosefs/metastore
#Metadata exchange method, RANDOM is selected by default; if there is obvious recent hot data access, consider setting it to LRU;
# goosefs.master.metastore.cache.type=LRU
# Disable orphan block verification at startup to lower leader election time
goosefs.master.startup.block.integrity.check.enabled=false
# You can also disable periodically validating orphan blocks logic depending on the actual situation
# goosefs.master.periodic.block.integrity.check.interval=-1
If not used, the TTL feature can be considered to disable periodic file expiration check
goosefs.master.ttl.checker.interval.ms=-1
# Can consider disabling replica check to reduce Master overhead
goosefs.master.replication.check.interval=-1

# Worker configuration
goosefs.worker.tieredstore.levels=1
goosefs.worker.tieredstore.level0.alias=SSD
goosefs.worker.tieredstore.level0.dirs.path=/data1/goosefsWorker,/data2/goosefsWorker
# Set the following Quota value according to actual conditions
# goosefs.worker.tieredstore.level0.dirs.quota=2000G,2000G
goosefs.worker.block.heartbeat.interval.ms=10sec
goosefs.worker.tieredstore.free.ahead.bytes=134217728
goosefs.user.block.worker.client.pool.max=512

# Security authentication and user simulation related
goosefs.security.authorization.permission.enabled=true
goosefs.security.authentication.type=SIMPLE
# goosefs.security.login.username=hadoop
# goosefs.master.security.impersonation.hadoop.users=*
# goosefs.security.login.impersonation.username=_HDFS_USER_

# Client configuration
goosefs.user.client.transparent_acceleration.scope=GFS_UFS
goosefs.user.client.transparent_acceleration.enabled=true
goosefs.user.file.readtype.default=CACHE
goosefs.user.file.writetype.default=CACHE_THROUGH
goosefs.user.metrics.collection.enabled=true



Ajuda e Suporte

Esta página foi útil?

comentários