mysql> set enable_profile=true;
is_report_success), you can see the Report information for the executed SQL statement on the FE Web page:

Query:Summary:Query ID: 9664061c57e84404-85ae111b8ba7e83aStart Time: 2020-05-02 10:34:57End Time: 2020-05-02 10:35:08Total: 10s323msQuery Type: QueryQuery State: EOFDoris Version: trunkUser: rootDefault Db: default_cluster:testSql Statement: select max(Bid_Price) from quotes group by Symbol
Fragment 0:Instance 9664061c57e84404-85ae111b8ba7e83d (host=TNetworkAddress(hostname:192.168.0.1, port:9060)):(Active: 10s270ms, % non-child: 0.14%)- MemoryLimit: 2.00 GB- BytesReceived: 168.08 KB- PeakUsedReservation: 0.00- SendersBlockedTimer: 0ns- DeserializeRowBatchTimer: 501.975us- PeakMemoryUsage: 577.04 KB- RowsProduced: 8.322K (8322)EXCHANGE_NODE (id=4):(Active: 10s256ms, % non-child: 99.35%)- ConvertRowBatchTime: 180.171us- PeakMemoryUsage: 0.00- RowsReturned: 8.322K (8322)- MemoryUsed: 0.00- RowsReturnedRate: 811
hostname refers to the BE node executing the Fragment.Active:10s270ms refers to the total execution time of this node.non-child: 0.14% refers to the execution time of the node itself (excluding the execution time of child nodes) as a percentage of the total time.PeakMemoryUsagerefers to the peak memory usage of the EXCHANGE_NODE.RowsReturned refers to the number of rows returned by the EXCHANGE_NODE.RowsReturnedRate=RowsReturned/ActiveTime.NODEs in the same way.
Subsequent prints of child node statistics, Here you can distinguish the parent-child relationship between nodes by indentation.FragmentBlockMgrDataStreamSenderODBC_TABLE_SINKEXCHANGE_NODESORT_NODEAGGREGATION_NODEHASH_JOIN_NODECROSS_JOIN_NODEUNION_NODEANALYTIC_EVAL_NODEOLAP_SCAN_NODEOLAP_SCAN_NODE node carries out the actual data scanning tasks. An OLAP_SCAN_NODE can generate one or more OlapScanners. Each Scanner thread is responsible for scanning part of the data.OLAP_SCAN_NODE. Some of these predicate conditions are further pushed to the Storage engine, to utilize the index of the Storage engine for data filtering. Others are retained in the OLAP_SCAN_NODE to filter data returning from the Storage engine.OLAP_SCAN_NODE node is often used to analyze the efficiency of data scanning, divided into OLAP_SCAN_NODE, OlapScanner, SegmentIterator layers based on the calling relationship.
Here is a typical Profile for a OLAP_SCAN_NODE node. Some metrics will have different meanings depending on the storage format (V1 or V2).OLAP_SCAN_NODE (id=0):(Active: 1.2ms, % non-child: 0.00%)- BytesRead: 265.00 B # The amount of data that has been read from the data file. If ten 32-bit integers are read, then the data amount is 10 x 4B = 40 Bytes. This number only represents the memory size when the data is fully expanded and does not represent the actual IO size.- NumDiskAccess: 1 # Number of disks that the ScanNode node has involved.- NumScanners: 20 # Number of Scanners that the ScanNode has generated.- PeakMemoryUsage: 0.00 # The peak memory usage during querying, currently not used- RowsRead: 7 # The number of rows returned from the Storage engine to the Scanner, excluding the rows filtered by the Scanner.- RowsReturned: 7 # The number of rows returned by the ScanNode to the upper node.- RowsReturnedRate: 6.979K /sec # RowsReturned/ActiveTime- TabletCount : 20 # The number of Tablets involved in this ScanNode.- TotalReadThroughput: 74.70 KB/sec # BytesRead divided by the total running time of this node (from Open to Close). For IO-restricted queries, this value approaches the total throughput of the disk.- ScannerBatchWaitTime: 426.886us # The time taken by the transfer thread to wait for the scanner thread to return rowbatch.- ScannerWorkerWaitTime: 17.745us # The time taken by the scanner thread to wait for available worker threads in the Thread Pool.OlapScanner:- BlockConvertTime: 8.941us # The time taken to convert the vectorized Block to the row structure's RowBlock. In V1, the vectorized Block is VectorizedRowBatch, and in V2, it's RowBlockV2.- BlockFetchTime: 468.974us # The time taken by the Rowset Reader to obtain the Block.- ReaderInitTime: 5.475ms # The time taken by the OlapScanner to initialize the Reader. In V1, this includes the time taken to assemble MergeHeap. In V2, it includes the time to generate each level Iterator and read the first set of Blocks.- RowsDelFiltered: 0 # The number of rows filtered based on the Delete information in Tablet, as well as the number of rows marked for deletion filtered under the unique key model.- RowsPushedCondFiltered: 0 # The number of rows filtered based on the predicates pushed down, such as the conditions passed from BuildTable to ProbeTable in Join computation. This number is not accurate as the filtering will stop if it yields poor results.- ScanTime: 39.24us # The time returned by the ScanNode to the upper node.- ShowHintsTime_V1: 0ns # Meaningless in V2. In V1, it reads part of the data to split the ScanRange.SegmentIterator:- BitmapIndexFilterTimer: 779ns # The time taken to filter data using bitmap index.- BlockLoadTime: 415.925us # The time taken by the SegmentReader(V1) or SegmentIterator(V2) to access the block.- BlockSeekCount: 12 # The number of performed block seek operations when reading the Segment.- BlockSeekTime: 222.556us # The time taken to perform a block seek operation when reading the Segment.- BlocksLoad: 6 # The number of Blocks read- CachedPagesNum: 30 # Only in V2, when PageCache is enabled, the number of Pages that hit the Cache.- CompressedBytesRead: 0.00 # In V1, the size of the data read from the file before decompression. In V2, the size before compression of the Pages that were read and did not hit the PageCache.- DecompressorTimer: 0ns # The time taken to decompress the data.- IOTimer: 0ns # The IO time of actual data reading from the operating system.- IndexLoadTime_V1: 0ns # Only in V1, the time taken to read the Index Stream.- NumSegmentFiltered: 0 # The number of Segments completely filtered out through column statistics and query conditions when generating the Segment Iterator.- NumSegmentTotal: 6 # The total number of Segments involved in the query.- RawRowsRead: 7 # The number of raw rows read in the Storage engine. See below for details.- RowsBitmapIndexFiltered: 0 # Only in V2, the number of rows filtered out using the Bitmap index.- RowsBloomFilterFiltered: 0 # Only in V2, the number of rows filtered out using the BloomFilter index.- RowsKeyRangeFiltered: 0 # Only in V2, the number of rows filtered out using the SortkeyIndex index.- RowsStatsFiltered: 0 # In V2, the number of rows filtered out using the ZoneMap index, includes deletion conditions. In V1, it also includes the number of rows filtered out using the BloomFilter.- RowsConditionsFiltered: 0 # Only in V2, the number of rows filtered out using various column indexes.- RowsVectorPredFiltered: 0 # The number of rows filtered by the vectorized conditional filter operation.- TotalPagesNum: 30 # Only in V2, the total number of Pages read.- UncompressedBytesRead: 0.00 # In V1, the size of the data file after decompression (If the file does not need to be decompressed, the file size is counted directly). In V2, only the sizes of Pages that did not hit the PageCache after decompression are counted (If a Page does not need to be decompressed, its size is counted directly)- VectorPredEvalTime: 0ns # The time taken for a vectorized conditional filter operation.
RowsKeyRangeFiltered.RowsBitmapIndexFiltered.RowsBloomFilterFiltered. The value of RowsBloomFilterFiltered is the difference between the total number of rows in the Segment (rather than the number of rows after the Bitmap index filtering) and the number of rows remaining after the BloomFilter filtering. Therefore, the data filtered by BloomFilter may overlap with the data filtered by Bitmap.RowsStatsFiltered.RowsConditionsFiltered is the number of rows filtered by various indexes, including the value of RowsBloomFilterFiltered and RowsStatsFiltered.RowsDelFiltered. Therefore, the actual number of rows filtered by the deletion conditions is recorded in RowsStatsFiltered and RowsDelFiltered.RawRowsRead is the final number of rows to be read after the above filtering.RowsRead is the final number of rows returned to the Scanner. RowsRead is generally less than RawRowsRead because there may be a data aggregation from the Storage engine returned to the Scanner. If the difference between RawRowsRead and RowsRead is large, it indicates that many rows have been aggregated, and the aggregate operation may be time-consuming.RowsReturned is the final number of rows returned by the ScanNode to the upper node. RowsReturned is usually less than RowsRead, because there are some predicate conditions on the Scanner that were not pushed to the Storage engine, so some additional filtering is carried out. If the difference between RowsRead and RowsReturned is large, it means many rows were filtered in the Scanner, indicating that many predicates with high selectivity were not pushed to the Storage engine. The filtering efficiency in the Scanner may be lower than that in the Storage engine.Rows***Filtered set of indicators, it would also be possible to analyze whether the query conditions are pushed down to the Storage engine and the filtering effect of different indexes. Additionally, some simple analysis can be performed in the following aspects.OlapScanner, such as IOTimer, BlockFetchTime, are cumulative to all Scanner thread indicators, so the values may be quite large. In addition, because Scanner threads read data asynchronously, these cumulative indicators only reflect the cumulative working time of the Scanner, not directly representing the processing time of the ScanNode. The proportion of time taken by the ScanNode in the entire query plan is the recorded value in the Active field. Sometimes, for example, IOTimer takes tens of seconds, while Active only takes several seconds. These situations are usually because:IOTimer being the cumulative time of multiple Scanners and the number of Scanners being relatively high.Active field may only be several milliseconds. This is because the ScanNode asynchronously scans and prepares the data while the upper node is processing data. When the upper node accesses data from the ScanNode, it can access the data that has already been prepared, hence the Active time is short.NumScanners represents the number of tasks submitted by the Scanner to the Thread Pool, scheduled by the Thread Pool in RuntimeState. The doris_scanner_thread_pool_thread_num and doris_scanner_thread_pool_queue_size parameters respectively control the size of the thread pool and the length of the queue. Too many or too few threads can affect query efficiency. You can also roughly estimate the time taken per thread by dividing some summarizing indicators by the number of threads.TabletCount represents the number of tablets to be scanned. A high number could imply a need for a lot of random reads and data Merge operations.UncompressedBytesRead indirectly reflects the amount of data read. If this value is large, it may indicate a lot of IO operations.CachedPagesNum and TotalPagesNum can provide a glimpse into the PageCache hit situation. The higher the hit rate, the less time spent on IO and decompression operations.Buffer poolFeedback