Technology Encyclopedia Home >How does HDFS achieve distributed storage of data?

How does HDFS achieve distributed storage of data?

Hadoop Distributed File System (HDFS) achieves distributed storage of data through a master-slave architecture. It consists of a NameNode and multiple DataNodes. The NameNode manages the file system namespace and regulates access to files by clients. It records the metadata of files, such as file names, directory structures, and the locations of file blocks. The DataNodes store the actual data in the form of blocks.

Here's how it works:

  1. File Splitting: When a file is uploaded to HDFS, it is split into blocks. The default block size is 128 MB, but this can be configured.

  2. Block Replication: Each block is replicated across multiple DataNodes for fault tolerance and high availability. The default replication factor is three, meaning each block is stored on three different DataNodes.

  3. Metadata Management: The NameNode maintains the metadata about the file system, including the locations of the blocks. When a client requests a file, the NameNode provides the locations of the blocks, and the client can directly access these blocks from the DataNodes.

  4. Data Storage: DataNodes are responsible for storing the actual data blocks. They also perform read and write operations as instructed by the NameNode.

Example: Suppose you have a file named "example.txt" that is 512 MB in size. When you upload this file to HDFS, it will be split into four blocks (each 128 MB). Each block will be replicated three times, resulting in a total of 12 blocks stored across different DataNodes. The NameNode will keep track of these blocks and their locations.

For a cloud-based solution that leverages similar distributed storage principles, you might consider Tencent Cloud's Object Storage Service (COS). COS offers a highly scalable and reliable storage solution that can handle large volumes of data and provides strong data durability and availability through replication across multiple facilities and, in some cases, across regions.