Using DLC (Hive) to Analyze CLS Log

Download

フォーカスモード

フォントサイズ

最終更新日: 2024-01-20 17:28:40

Overview
This document describes how to ship logs in CLS to Hive for OLAP computing. You can use the data analysis and computing services provided by Tencent Cloud Data Lake Compute to complete offline log computing and analysis.
Directions
Shipping CLS logs to COS
Creating a shipping task
1. Log in to the CLS console and select Shipping Task Management > Ship to COS on the left sidebar.
2. On the Ship to COS page, click Add Shipping Configuration. In the Ship to COS pop-up window, create a shipping task.
Pay attention to the following configuration items:
Configuration Item
Description
Directory Prefix
Log files will be shipped to the corresponding directory in the COS bucket, which is generally the address of the table location in a data warehouse model.
Partition Format
A shipping task can automatically partition data by creation time. We recommend you specify the partition format according to the Hive partitioned table format.For example, to partition by day, you can set /dt=%Y%m%d/test. Here, dt= indicates the partitioning field, %Y%m%d indicates the year, month, and day, and test indicates the log file prefix. As the name of a shipped file start with an underscore ((_)) by default, the big data computing engine will ignore such files and cause a failure to find the data. Therefore, you need to add a prefix, and the actual partition directory name will be dt=20220424 for example.
Shipping Interval
You can select 5–15 minutes. We recommend you select 15 minutes, 250 MB. In this case, the number of files will be lower, and the query performance will be high.
Shipping Format
The JSON format is recommended.
Viewing the shipping task result
Generally, you can view the log data in the COS console 15 minutes after the shipping task starts. If you set partitioning by day in the log_data logset, the directory structure will be as shown below, where specific log files are stored in partition directories.
Data Lake Compute (Hive) analysis
Using Data Lake Compute to create a foreign table and map it to a COS log directory
After log data is shipped to COS, you can use the data exploration feature in the Data Lake Compute console to create a foreign table. For the table creation statement, refer to the following example. Note that the partitioning field and the location field must match the directory structure.
The Data Lake Compute foreign table creation wizard provides advanced options to infer the data file table structure and quickly and automatically generate SQL statements. As sampled inference is used, you need to further determine whether table fields are appropriate based on the SQL statements. For example, in the sample code below, it is inferred that the __TIMESTAMP__ field is of int type, but maybe the bigint type should be used to meet the business requirements.
CREATE EXTERNAL TABLE IF NOT EXISTS `DataLakeCatalog`.`test`.`log_data` (
  `__FILENAME__` string,
  `__SOURCE__` string,
  `__TIMESTAMP__` bigint,
  `appId` string,
  `caller` string,
  `consumeTime` string,
  `data` string,
  `datacontenttype` string,
  `deliveryStatus` string,
  `errorResponse` string,
  `eventRuleId` string,
  `eventbusId` string,
  `eventbusType` string,
  `id` string,
  `logTime` string,
  `region` string,
  `requestId` string,
  `retryNum` string,
  `source` string,
  `sourceType` string,
  `specversion` string,
  `status` string,
  `subject` string,
  `tags` string,
  `targetId` string,
  `targetSource` string,
  `time` string,
  `type` string,
  `uin` string
) PARTITIONED BY (`dt` string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE LOCATION 'cosn://coreywei-1253240642/log_data/'
For shipping by partition, location must point to the cosn://coreywei-1253240642/log_data/ directory instead of the cosn://coreywei-1253240642/log_data/20220423/ directory.
To use the inference feature, you need to point the directory to the subdirectory of the data file, i.e., cosn://coreywei-1253240642/log_data/20220423/. After inference is completed, change location in the SQL statement back to cosn://coreywei-1253240642/log_data/.
Appropriate partitioning can improve the performance. However, we recommend you create no more than 10,000 partitions.
Adding a partition
You can use a SELECT statement to get data from a partitioned table only after adding partitions in the following two ways:
Adding historical partitions
Adding incremental partitions
This option can load all partition data at a time but is slow, so it is suitable for scenarios where many partitions need to be loaded for the first time.
msck repair table DataLakeCatalog.test.log_data;
After historical partitions are loaded, incremental partitions will be added periodically. For example, you can use this option to add a partition every day.
alter table DataLakeCatalog.test.log_data add partition(dt='20220424')
Analyzing data
After adding partitions, you can use Data Lake Compute for data development and analysis.
select dt,count(1) from `DataLakeCatalog`.`test`.`log_data` group by dt;
﻿

ヘルプとサポート

この記事はお役に立ちましたか？

営業担当者にお問い合わせいただくかチケットを提出してサポートを求めることができます。

フィードバック

tencent cloud

Cloud Log Service

Using DLC (Hive) to Analyze CLS Log

Overview

Directions

Shipping CLS logs to COS

Creating a shipping task

Viewing the shipping task result

Data Lake Compute (Hive) analysis

Using Data Lake Compute to create a foreign table and map it to a COS log directory

Adding a partition

Analyzing data

ヘルプとサポート

Configuration Item	Description
Directory Prefix	Log files will be shipped to the corresponding directory in the COS bucket, which is generally the address of the table location in a data warehouse model.
Partition Format	A shipping task can automatically partition data by creation time. We recommend you specify the partition format according to the Hive partitioned table format.For example, to partition by day, you can set /dt=%Y%m%d/test. Here, dt= indicates the partitioning field, %Y%m%d indicates the year, month, and day, and test indicates the log file prefix. As the name of a shipped file start with an underscore ((_)) by default, the big data computing engine will ignore such files and cause a failure to find the data. Therefore, you need to add a prefix, and the actual partition directory name will be dt=20220424 for example.
Shipping Interval	You can select 5–15 minutes. We recommend you select 15 minutes, 250 MB. In this case, the number of files will be lower, and the query performance will be high.
Shipping Format	The JSON format is recommended.