Release Notes and Announcements
- Release Notes
Product Selection Guide
GooseFSx
- Product Introduction
- Quick Start
- Purchase Guide
- Console Guide
- Tool Guide
- Practical Tutorial
- Service Level Agreement
- Glossary
GooseFS
- Product Introduction
- Billing Overview
- Quick Start
- Core Features
- Console Guide
- Developer Guide
- Client Tools
- Cluster Configuration Practice
- Data Security
- Service Level Agreement
GooseFS-Lite
- GooseFS-Lite Tool
Practical Tutorial
FAQs
GooseFS Policy
- Privacy Policy
- Data Processing And Security Agreement

Production Environment Configuration Practice in Big Data Scenarios

Download

Modo Foco

Tamanho da Fonte

Última atualização: 2025-07-17 17:42:55

Overview 
Data Accelerator Goose FileSystem (GooseFS) provides multiple deployment methods, supporting control plane deployment, TKE cluster deployment, and EMR cluster deployment. In big data scenarios, the EMR cluster mode is usually used for deployment, and a high availability architecture is adopted to meet business continuity requirements. This document focuses on high availability deployment configurations based on Zookeeper and Raft.
﻿
High-availability Architecture refers to a Master-backup active-active architecture with multiple Master nodes. Among the multiple Master nodes, only one serves as the primary (Leader) node to provide external services, while the rest Standby nodes maintain the same file system status as the primary node through synchronization sharing of journals. If the primary node fails and goes down, a Standby node is automatically selected from the current nodes to take over and continue providing external services. This eliminates the system's single point of failure and implements an overall high-availability architecture. Currently, GooseFS supports strong consistency of Master-backup node status based on two methods: Raft logs and Zookeeper.
High-Availability Architecture Deployment Configuration Based on Zookeeper
Configuring the Zookeeper service to build a high availability architecture for GooseFS requires the following conditions are met:
A Zookeeper cluster is established. The GooseFS Master node uses Zookeeper for Leader selection, while GooseFS clients and Worker nodes query the primary Master node through Zookeeper.
A highly available shared storage system with strong consistency is ready, and ensures all GooseFS Master nodes can access it. The primary Master node will write logs to this storage system, while Standby nodes continuously read logs from it and replay them to maintain state consistency with the primary node. In general circumstances, HDFS or COS is recommended for this shared storage system, such as HDFS://10.0.0.1:9000/GooseFS/journal or cosn://bucket-1250000000/journal.
﻿
After completing the prerequisites, see the following recommended configuration and copy-paste this configuration item into the goosefs-site.properties file to complete your high availability architecture configuration:
# GooseFS Master HA deployment configuration
goosefs.zookeeper.enabled=true
goosefs.zookeeper.address=<zk_quorum_1>:<zk_client_port>,<zk_quorum_2>:<zk_client_port>,<zk_quorum_3>:<zk_client_port>
goosefs.underfs.hdfs.configuration=${HADOOOP_HOME}/etc/hadoop/core-site.xml:${HADOOP_HOME}/hadoop/etc/hadoop/hdfs-site.xml
goosefs.master.journal.type=UFS
goosefs.master.journal.folder=hdfs://HDFSXXXX/goosefs
﻿
# Master metadata storage method, recommended Heap + RocksDB method, supports metadata at scale of hundreds of millions
goosefs.master.metastore=ROCKS
goosefs.master.metastore.block=ROCKS
goosefs.master.metastore.block.locations=ROCKS
For GooseFS metadata storage directory, recommend choose a directory on high IOPS storage media.
goosefs.master.metastore.dir=/data/goosefs/metastore
#Metadata exchange method, RANDOM is selected by default; if there is obvious recent hot data access, consider setting it to LRU;
# goosefs.master.metastore.cache.type=LRU
# Disable orphan block verification at startup to lower leader election time
goosefs.master.startup.block.integrity.check.enabled=false
# You can also disable periodically validating orphan blocks logic depending on the actual situation
# goosefs.master.periodic.block.integrity.check.interval=-1
# If not used the TTL feature, can also consider disabling periodic file expire check
goosefs.master.ttl.checker.interval.ms=-1
Can consider disabling replica check to reduce Master overhead
goosefs.master.replication.check.interval=-1
﻿
# Worker configuration
goosefs.worker.tieredstore.levels=1
goosefs.worker.tieredstore.level0.alias=SSD
goosefs.worker.tieredstore.level0.dirs.path=/data1/goosefsWorker,/data2/goosefsWorker
# Set the following Quota value according to actual conditions
# goosefs.worker.tieredstore.level0.dirs.quota=2000G,2000G
goosefs.worker.block.heartbeat.interval.ms=10sec
goosefs.worker.tieredstore.free.ahead.bytes=134217728
goosefs.user.block.worker.client.pool.max=512
﻿
Security certification and user simulation related
goosefs.security.authorization.permission.enabled=true
goosefs.security.authentication.type=SIMPLE
# goosefs.security.login.username=hadoop
# goosefs.master.security.impersonation.hadoop.users=*
# goosefs.security.login.impersonation.username=_HDFS_USER_
﻿
# Client configuration
goosefs.user.client.transparent_acceleration.scope=GFS_UFS
goosefs.user.client.transparent_acceleration.enabled=true
goosefs.user.file.readtype.default=CACHE
goosefs.user.file.writetype.default=CACHE_THROUGH
﻿
goosefs.user.metrics.collection.enabled=true
High-Availability Architecture Deployment Configuration Based on Raft
The deployment solution based on Raft embedded log depends on the copycat Leader election mechanism. Therefore, the highly available deployment architecture of Raft cannot intersect with Zookeeper. If you plan to build a high availability architecture based on Raft embedded log, see the following recommended configuration and copy-paste this configuration item into the goosefs-site.properties file to complete your high availability architecture configuration:
# GooseFS Master Raft deployment configuration
goosefs.master.rpc.addresses=<master1>:9200,<master2>:9200,<master3>:9200
goosefs.master.embedded.journal.addresses=<master1>:9202,<master2>:9202,<master3>:9202
Metadata checkpoint interval, defaults to 2000000, actual rate can be set based on metadata production speed in actual production environment
goosefs.master.journal.checkpoint.period.entries=xxxx
# GooseFS Journal data storage location
goosefs.master.journal.folder=/data/goosefs/journal
﻿
# Master metadata storage method, recommended Heap + RocksDB method, supports metadata at scale of hundreds of millions
goosefs.master.metastore=ROCKS
goosefs.master.metastore.block=ROCKS
goosefs.master.metastore.block.locations=ROCKS
For GooseFS metadata storage directory, recommend choose a directory on high IOPS disks.
goosefs.master.metastore.dir=/data/goosefs/metastore
#Metadata exchange method, RANDOM is selected by default; if there is obvious recent hot data access, consider setting it to LRU;
# goosefs.master.metastore.cache.type=LRU
# Disable orphan block verification at startup to lower leader election time
goosefs.master.startup.block.integrity.check.enabled=false
# You can also disable periodically validating orphan blocks logic depending on the actual situation
# goosefs.master.periodic.block.integrity.check.interval=-1
If not used, the TTL feature can be considered to disable periodic file expiration check
goosefs.master.ttl.checker.interval.ms=-1
# Can consider disabling replica check to reduce Master overhead
goosefs.master.replication.check.interval=-1
﻿
# Worker configuration
goosefs.worker.tieredstore.levels=1
goosefs.worker.tieredstore.level0.alias=SSD
goosefs.worker.tieredstore.level0.dirs.path=/data1/goosefsWorker,/data2/goosefsWorker
# Set the following Quota value according to actual conditions
# goosefs.worker.tieredstore.level0.dirs.quota=2000G,2000G
goosefs.worker.block.heartbeat.interval.ms=10sec
goosefs.worker.tieredstore.free.ahead.bytes=134217728
goosefs.user.block.worker.client.pool.max=512
﻿
# Security authentication and user simulation related
goosefs.security.authorization.permission.enabled=true
goosefs.security.authentication.type=SIMPLE
# goosefs.security.login.username=hadoop
# goosefs.master.security.impersonation.hadoop.users=*
# goosefs.security.login.impersonation.username=_HDFS_USER_
﻿
# Client configuration
goosefs.user.client.transparent_acceleration.scope=GFS_UFS
goosefs.user.client.transparent_acceleration.enabled=true
goosefs.user.file.readtype.default=CACHE
goosefs.user.file.writetype.default=CACHE_THROUGH
goosefs.user.metrics.collection.enabled=true
﻿
﻿

Ajuda e Suporte

Esta página foi útil?

Você também pode entrar em contato com a Equipe de vendas ou Enviar um tíquete em caso de ajuda.

comentários

tencent cloud

Data Accelerator Goose FileSystem

Production Environment Configuration Practice in Big Data Scenarios

Overview

High-Availability Architecture Deployment Configuration Based on Zookeeper

High-Availability Architecture Deployment Configuration Based on Raft

Ajuda e Suporte