tencent cloud

Cloud Log Service

Release Notes and Announcements
Release Notes
Announcements
User Guide
Product Introduction
Overview
Features
Available Regions
Limits
Concepts
Service Regions and Service Providers
Purchase Guide
Billing Overview
Product Pricing
Pay-as-You-Go
Billing
Cleaning up CLS resources
Cost Optimization
FAQs
Getting Started
Getting Started in 1 Minute
Getting Started Guide
Quickly Trying out CLS with Demo
Operation Guide
Resource Management
Permission Management
Log Collection
Metric Collection
Log Storage
Metric Storage
Search and Analysis (Log Topic)
Search and Analysis (Metric Topic)
Dashboard
Data Processing documents
Shipping and Consumption
Monitoring Alarm
Cloud Insight
Independent DataSight console
Historical Documentation
Practical Tutorial
Log Collection
Search and Analysis
Dashboard
Monitoring Alarm
Shipping and Consumption
Cost Optimization
Developer Guide
Embedding CLS Console
CLS Connection to Grafana
API Documentation
History
Introduction
API Category
Making API Requests
Topic Management APIs
Log Set Management APIs
Index APIs
Topic Partition APIs
Machine Group APIs
Collection Configuration APIs
Log APIs
Metric APIs
Alarm Policy APIs
Data Processing APIs
Kafka Protocol Consumption APIs
CKafka Shipping Task APIs
Kafka Data Subscription APIs
COS Shipping Task APIs
SCF Delivery Task APIs
Scheduled SQL Analysis APIs
COS Data Import Task APIs
Data Types
Error Codes
FAQs
Health Check
Collection
Log Search
Others
CLS Service Level Agreement
CLS Policy
Privacy Policy
Data Processing And Security Agreement
Contact Us
Glossary

Using DLC (Hive) to Analyze CLS Log

PDF
Focus Mode
Font Size
Last updated: 2024-01-20 17:28:40

Overview

This document describes how to ship logs in CLS to Hive for OLAP computing. You can use the data analysis and computing services provided by Tencent Cloud Data Lake Compute to complete offline log computing and analysis.

Directions

Shipping CLS logs to COS

Creating a shipping task

1. Log in to the CLS console and select Shipping Task Management > Ship to COS on the left sidebar.
2. On the Ship to COS page, click Add Shipping Configuration. In the Ship to COS pop-up window, create a shipping task. Pay attention to the following configuration items:
Configuration Item
Description
Directory Prefix
Log files will be shipped to the corresponding directory in the COS bucket, which is generally the address of the table location in a data warehouse model.
Partition Format
A shipping task can automatically partition data by creation time. We recommend you specify the partition format according to the Hive partitioned table format.For example, to partition by day, you can set /dt=%Y%m%d/test. Here, dt= indicates the partitioning field, %Y%m%d indicates the year, month, and day, and test indicates the log file prefix. As the name of a shipped file start with an underscore ((_)) by default, the big data computing engine will ignore such files and cause a failure to find the data. Therefore, you need to add a prefix, and the actual partition directory name will be dt=20220424 for example.
Shipping Interval
You can select 5–15 minutes. We recommend you select 15 minutes, 250 MB. In this case, the number of files will be lower, and the query performance will be high.
Shipping Format
The JSON format is recommended.

Viewing the shipping task result

Generally, you can view the log data in the COS console 15 minutes after the shipping task starts. If you set partitioning by day in the log_data logset, the directory structure will be as shown below, where specific log files are stored in partition directories.

Data Lake Compute (Hive) analysis

Using Data Lake Compute to create a foreign table and map it to a COS log directory

After log data is shipped to COS, you can use the data exploration feature in the Data Lake Compute console to create a foreign table. For the table creation statement, refer to the following example. Note that the partitioning field and the location field must match the directory structure.
The Data Lake Compute foreign table creation wizard provides advanced options to infer the data file table structure and quickly and automatically generate SQL statements. As sampled inference is used, you need to further determine whether table fields are appropriate based on the SQL statements. For example, in the sample code below, it is inferred that the __TIMESTAMP__ field is of int type, but maybe the bigint type should be used to meet the business requirements.
CREATE EXTERNAL TABLE IF NOT EXISTS `DataLakeCatalog`.`test`.`log_data` (
`__FILENAME__` string,
`__SOURCE__` string,
`__TIMESTAMP__` bigint,
`appId` string,
`caller` string,
`consumeTime` string,
`data` string,
`datacontenttype` string,
`deliveryStatus` string,
`errorResponse` string,
`eventRuleId` string,
`eventbusId` string,
`eventbusType` string,
`id` string,
`logTime` string,
`region` string,
`requestId` string,
`retryNum` string,
`source` string,
`sourceType` string,
`specversion` string,
`status` string,
`subject` string,
`tags` string,
`targetId` string,
`targetSource` string,
`time` string,
`type` string,
`uin` string
) PARTITIONED BY (`dt` string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE LOCATION 'cosn://coreywei-1253240642/log_data/'
For shipping by partition, location must point to the cosn://coreywei-1253240642/log_data/ directory instead of the cosn://coreywei-1253240642/log_data/20220423/ directory.
To use the inference feature, you need to point the directory to the subdirectory of the data file, i.e., cosn://coreywei-1253240642/log_data/20220423/. After inference is completed, change location in the SQL statement back to cosn://coreywei-1253240642/log_data/.
Appropriate partitioning can improve the performance. However, we recommend you create no more than 10,000 partitions.

Adding a partition

You can use a SELECT statement to get data from a partitioned table only after adding partitions in the following two ways:
Adding historical partitions
Adding incremental partitions
This option can load all partition data at a time but is slow, so it is suitable for scenarios where many partitions need to be loaded for the first time.
msck repair table DataLakeCatalog.test.log_data;
After historical partitions are loaded, incremental partitions will be added periodically. For example, you can use this option to add a partition every day.
alter table DataLakeCatalog.test.log_data add partition(dt='20220424')

Analyzing data

After adding partitions, you can use Data Lake Compute for data development and analysis.
select dt,count(1) from `DataLakeCatalog`.`test`.`log_data` group by dt;


Help and Support

Was this page helpful?

Help us improve! Rate your documentation experience in 5 mins.

Feedback