Data Lake Accelerator Goose FileSystem

Feature

Diverse Data Source Support

Seamlessly integrate with diverse data sources to store structured, semi-structured, and unstructured data at any scale, while preserving raw data formats.

Elastic Compute Scaling

Decouple compute and storage to enable dynamic scaling of compute resources, empowering flexible resource orchestration tailored to workload demands.

Cost-Optimized Storage

Leverage a centralized storage pool with rapid scaling capabilities, automated cold/hot data tiering, and unified lifecycle management to minimize storage costs for big data analytics and machine learning.

Native Service Integration

Fully compatible with Tencent Cloud’s ecosystem, including Elastic MapReduce (EMR) and real-time stream computing service Oceanus, for end-to-end analytics and AI workflows.

On-Demand Data Orchestration

Automate, manually trigger, or schedule data movement between storage tiers. Access and process COS data in real time via GooseFSx with high-performance computing, then persist results to COS as needed.

High Throughput & Ultra-Low Latency

Powered by a parallel architecture optimized for high-performance workloads, linearly scale performance with capacity to deliver: Hundreds of GB/s throughput、Millions of IOPS 、Sub-millisecond latency.

Scenarios

Machine Learning

Big Data Analytics

Interactive Query

AI Training & Simulation

High-Performance Computing (HPC)

Machine Learning

In classical machine learning workflows, training datasets are typically massive and demand ultra-high intranet bandwidth.

Core Capabilities

Ultra-High Bandwidth:Delivers exceptional intranet bandwidth to meet the intensive data throughput requirements of ML training.

Diverse Data Source Support:Seamlessly integrates with diverse data repositories, supporting storage of structured, semi-structured, and unstructured data at any scale.

Performance Acceleration:Leverage a multi-tier acceleration architecture—comprising the GooseFS Data Accelerator, Metadata Accelerator , and Availability Zone Accelerator.Deliver Superior Performance to on-premises HDFS.

Big Data Analytics

Organizations leveraging open-source Hadoop ecosystems often encounter challenges such as imbalanced scaling between compute and storage resources, and fragmented data silos across heterogeneous storage systems.

Core Capabilities

Elastic Compute-Storage Scaling:Decouple compute and storage to enable independent, on-demand scaling of resources, eliminating infrastructure bottlenecks.
Multi-Source Data Integration:Unified access to data across diverse storage systems (e.g., on-premises HDFS, cloud object storage, databases) without data migration.
High-Performance Architecture:Leverage a multi-tier acceleration architecture—comprising the GooseFS Data Accelerator, Metadata Accelerator , and Availability Zone Accelerator- to achieve higher throughput and latency for compute-intensive workloads.

Interactive Query

Build cloud-native ETL and analytics clusters via container services with open-source frameworks like Flink and TensorFlow, enabling elastic compute scaling. Enhance performance through a multi-tier acceleration architecture while leveraging object storage as a cost-effective data lake for heterogeneous datasets.

Core Capabilities

Elastic Scaling:Decouple compute and storage for on-demand resource allocation.

High-Performance Architecture:Leverage a multi-tier acceleration architecture—comprising the GooseFS Data Accelerator, Metadata Accelerator , and Availability Zone Accelerator- to achieve higher throughput and latency for compute-intensive workloads.

Ecosystem Compatibility:Support Parquet/ORC formats and integrate with Spark, Presto, Flink.

AI Training & Simulation

High-performance storage for AI workloads requiring extreme throughput (hundreds of GB/s), millions of IOPS, and sub-millisecond latency.

Core Capabilities

Extreme Performance:Deliver 400+ GB/s throughput, 2M+ IOPS, and sub-ms latency.

Seamless Ecosystem Integration:POSIX-compliant with auto-mount to local directories, natively compatible with Kubernetes and containerized environments.

Unified AI Pipeline:Single storage solution for multi-platform (Windows/Linux) AI workflows – training, simulation, inference – supporting mixed I/O patterns (random/sequential, low-latency).

High-Performance Computing (HPC)

Deliver extreme storage performance for GPU-intensive workloads with linear scalability across cluster sizes.

Core Capabilities

High Throughput & Low Latency:Achieve hundreds of GB/s throughput and sub-millisecond latency via on-demand data preheating from COS to the Data Accelerator. Performance scales linearly with capacity.

Seamless Compute Integration:POSIX-compliant semantics with batch auto-mounting to local directories, enabling rapid on-demand storage provisioning.

Elastic Tiered Storage:

Independent scaling of storage layers with automated data orchestration between tiers.

Hot Data:Cached in the Data Accelerator for high-speed access

Cold Data:Persisted to COS

Help and Documentation

OverviewProvide product updates, core features, quick start, developer guide, deployment guide, Ops guide, cloud native practices and other documentation guides to help you quickly get started with GooseFS.

Quick startProvide the key path to getting started with GooseFS to help you quickly use GooseFS.

Product NewsProvide GooseFS product update description to help you learn about the capabilities of the latest version.

Core featureGooseFS feature introduction includes cache policy, unified namespace, transparent acceleration, Table management and other features.

tencent cloud