FOOM(iOS)
FOOM (Foreground Out Of Memory) occurs when an APP encounters the system memory limit in the foreground and is terminated by the SIGKILL signal. This article introduces various metric analyses related to FOOM reporting and explains how to view detailed issue reports.
Metric Analysis
Trend analysis
Trend charts display the changes in different types of FOOM rates (e.g., device FOOM rate, user FOOM rate) over time using line graphs or similar formats, with units expressed as (%). You can adjust the display format via the Display options or click Add to comparison list to include different data series for comparison.
In the comparison list area, you can view FOOM-related data details under different filter conditions, including index, time range, APP version, and other information. You can also perform operations such as delete, batch delete, trend preview, and more.
After selecting one or more data items to be deleted, click Delete or Batch Delete to complete the deletion.
Click Trend preview to view the overview of selected FOOM rates in the comparison list.
Multidimensional Analysis
Filter condition settings: You can set filter conditions such as time range (default or custom), APP version, system version, SDK version, scenario, report volume, device model, manufacturer, etc. You can also choose whether to filter out APP versions with DAU accounting for less than 1%. After completing the settings, click Inquiry to obtain the filtered data. To collapse the filter conditions, click CollapseFilter option.
Data display: This area presents FOOM-related metrics across multiple dimensions, including device FOOM rate, user FOOM rate, occurrence FOOM rate, FOOM rate per unit time, report volume, proportion, and cumulative proportion, helping users analyze FOOM metrics from different perspectives.
Problem List
The FOOM problem list is similar to the crash monitoring module. For details, see Crash Problem List. Case clustering
The platform clusters reported individual cases into issues to facilitate problem tracking and resolution by developers. Currently, the platform clusters cases based on allocation stack information extracted from FOOM reports. Due to limitations in FOOM's stack collection policy, not all reported cases contain stack information. Cases without stack data are grouped into a special issue category: NoStackProblem. For cases with stack information, clustering is performed by extracting features from the stack traces to form characteristic clusters.
Note:
Due to the low proportion of enabled stack traces, the vast majority of individual cases are grouped into the NoStackProblem category.
Issue Details
Case Details
From FOOM > Problem List go to an issue, which displays all individual cases under that issue. Besides basic information, the most critical part of individual cases is the stack allocation details data.
memory allocation stack
Stack Allocation Details
The data in the stack allocation details is the malloc logger information recorded by the SDK during the APP's runtime.
In the malloc logger records, entries are aggregated by the same allocation type and allocation stack, displaying the corresponding count and total size of stack allocations.
Note:
The stack here does not include all memory allocations, only memory operations that were recorded as allocated but not released under the current policy. Allocations not covered by the recording policy will not be included. Under the default recording policy, only allocations meeting all the following conditions are recorded:
A single allocation exceeds the threshold; the default threshold is 8K, which can be adjusted via configuration.
The cumulative allocation of the same stack exceeds the threshold; the default threshold is 512K, which can be adjusted via configuration.
Allocations made before stack allocation records are enabled will not be included: Memory allocated before the recording logic starts will not be included in the records.
Stack Allocation Tree, Flame Graph
Stack allocation trees and flame graphs are data aggregated from the stack data in stack allocation details. They do not carry special meanings but provide a more intuitive way to visually identify the proportion of memory allocations by aggregating data into a tree structure.
Stack allocation tree
Flame Graph
VMMAP
VMMAP refers to the memory allocation table of the Virtual Memory Manager (VMM), which records allocation information for all memory regions, including stack memory.
For memory issues outside the above recording policy, stack allocation details may not directly provide effective information. Therefore, VMMAP data was subsequently introduced. VMMAP data, similar to the output of the `vmmap` command, provides usage information of kernel-managed memory, including details such as `user_tag`, `virtual_size`, and `dirty_size`.
user_tag: Corresponding to "allocation type" in the stack allocation details.
virtual_size: Indicates the logical address size used by this type and does not represent the actual physical memory occupied.
dirty_size: Indicates the memory of this type that cannot be reclaimed by the operating system, occupying actual physical memory.
The SDK periodically collects this data, and the collection frequency is correlated with memory metrics. Therefore, the VMMAP information obtained reflects the peak memory state, essentially including all memory usage. Due to the collection frequency, it may not be consistent with the final peak memory.
Through VMMAP information, you can get a general idea of the main memory usage, and further analysis can be performed by combining other factors.
Memory allocation list
The FOOM Memory Allocation List feature is primarily used to query and manage issues related to memory allocation, helping developers or Ops personnel quickly locate and analyze abnormal situations in memory allocation, such as excessive memory allocation, affecting a large number of devices, etc., so as to handle them promptly and ensure the stable operation of applications.
Query
The feature supports users in querying memory allocations via multiple filter options. For details, see Query. FOOM Allocation List
After a user query, the results will be displayed in the FOOM Allocation List, including information such as issue characteristics, last reported time, number of affected devices, report count, and total memory allocation size.
Meanwhile, the following operations can be performed:
question tag distribution: View the distribution of different issue tags to facilitate statistics and analysis of problem types.
Batch Operation: Perform batch processing for multiple eligible data items to improve work efficiency.
Set: Allows configuring related settings for list display, etc., to meet personalized needs.
OOM(Android)
Background
In Android development, OOM (Out of Memory) issues typically refer to java.lang.OutOfMemoryError exceptions caused by insufficient Java heap memory. However, OOM problems are not limited to Java heap memory and may also involve exceeding resource limits for file descriptors (FD) or native memory address space, as exemplified by the following common java.lang.OutOfMemoryError exception:
java.lang.OutOfMemoryError: Could not allocate JNI Env
java.lang.Thread.nativeCreate(Native Method)
java.lang.Thread.start(Thread.java:1063)
kotlinx.coroutines.scheduling.CoroutineScheduler.int createNewWorker()(CoroutineScheduler.java:485)
kotlinx.coroutines.scheduling.CoroutineScheduler.boolean tryCreateWorker(long)(CoroutineScheduler.java:440)
kotlinx.coroutines.scheduling.CoroutineScheduler.boolean tryCreateWorker$default(kotlinx.coroutines.scheduling.CoroutineScheduler,long,int,java.lang.Object)(CoroutineScheduler.java:431)
kotlinx.coroutines.scheduling.CoroutineScheduler.void signalCpuWork()(CoroutineScheduler.java:427)
kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.void beforeTask(int)(CoroutineScheduler.java:758)
kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.void executeTask(kotlinx.coroutines.scheduling.Task)(CoroutineScheduler.java:749)
kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.void runWorker()(CoroutineScheduler.java:678)
kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.void run()(CoroutineScheduler.java:665)
While it appears to be a virtual machine memory-related issue, from a source code perspective, it actually occurs because there are no available FD resources when creating a thread, resulting in thread creation failure. Consequently, it is reported as a java.lang.OutOfMemoryError exception. This can also be verified from system logs, which show numerous "Too many open files" entries prior to the error:
03-18 13:44:17.426 28516 809 W System.err: java.net.ConnectException: failed to connect to up-hl.3g.qq.com/61.241.53.46 (port 443) after 10000ms: connect failed: EMFILE (Too many open files)
03-18 13:44:17.427 28516 809 W System.err: Caused by: android.system.ErrnoException: connect failed: EMFILE (Too many open files)
03-18 13:44:17.519 28516 765 W System.err: java.io.FileNotFoundException: /data/user/0
In addition to FD resources, insufficient Native memory can also cause thread creation failures. Crash issues caused by exceeding resource limits exhibit diverse stack traces, where the Crash stack is often merely the last straw that breaks the camel's back. Conventional clustering methods based on Crash stacks and exception types thus often yield poor results. For such issues, it is necessary to categorize them from the perspective of resource usage and implement targeted optimizations and solutions. To comprehensively understand and resolve business-related OOM issues, we have reclassified OOM into Java OOM, FD OOM, and Native OOM, while expanding the original definition of the OOM rate metric.
OOM Issue Classification
Java OOM
Java OOM is the most common type of OOM issue, typically caused by insufficient Java heap memory. When an application requests more memory than the available space in the Java heap, it throws a java.lang.OutOfMemoryError exception. This may result from Java memory leaks, large object allocations, large bitmaps, or similar factors.
java.lang.OutOfMemoryError
Failed to allocate a 176 byte allocation with 5025912 free bytes and 4908KB until OOM, target footprint 536870912, growth limit 536870912; giving up on allocation because <1% of heap free after GC.
java.lang.OutOfMemoryError: Failed to allocate a 176 byte allocation with 5025912 free bytes and 4908KB until OOM, target footprint 536870912, growth limit 536870912; giving up on allocation because <1% of heap free after GC.
java.util.Arrays.copyOf(Arrays.java:3766)
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125)
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:449)
java.lang.StringBuilder.append(StringBuilder.java:137)
......
FD OOM
FD OOM refers to OOM issues caused by exceeding file descriptor resource limits. Each process has a limited number of file descriptors available during runtime, which are used to open, read, and write files.
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Crash type: 'native'
Start time: '2024-03-18T12:32:49.406+0800'
Crash time: '2024-03-18T16:09:33.763+0800'
App version: 'x.x.x.x'
Rooted: 'No'
API level: '27'
Build fingerprint: 'OPPO/R11/R11:8.1.0/OPM1.171019.011/1575877917:user/release-keys'
ABI: 'arm64'
pid: 9096, tid: 18434, name: HalleyTempTaskT >>> com.tencent.xxxxx <<<
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
Abort message: 'Could not make wake event fd: Too many open files'
x0 0000000000000000 x1 0000000000004802 x2 0000000000000006 x3 0000000000000008
x4 0000007dce36f588 x5 0000007dce36f588 x6 0000007dce36f588 x7 0000007dce36f588
x8 0000000000000083 x9 0000000010000000 x10 0000007dce36e9b0 x11 cd4becaebf0bf9bc
x12 cd4becaebf0bf9bc x13 0000000000000020 x14 ffffffffffffffdf x15 cd4becaebf0bf9bc
x16 0000000a2bfd9fa8 x17 0000007ed34b7540 x18 0000000000000001 x19 0000000000002388
x20 0000000000004802 x21 0000000000000083 x22 000000007018bbc0 x23 0000000000004802
x24 0000000000000001 x25 00000000701d41f0 x26 0000000015f00088 x27 0000000015f000b0
x28 0000000015f00100 x29 0000007dce36e9f0
sp 0000007dce36e9b0 lr 0000007ed3460770 pc 0000007ed3460798
backtrace:
#00 pc 000000000001e798 /system/lib64/libc.so (abort+120)
#01 pc 0000000000008348 /system/lib64/liblog.so (__android_log_assert+296)
#02 pc 000000000001542c /system/lib64/libutils.so (_ZN7android6LooperC1Eb+308)
#03 pc 000000000011b5e0 /system/lib64/libandroid_runtime.so (_ZN7android18NativeMessageQueueC1Ev+160)
#04 pc 000000000011bebc /system/lib64/libandroid_runtime.so (_ZN7androidL34android_os_MessageQueue_nativeInitEP7_JNIEnvP7_jclass+28)
#05 pc 000000000065b420 /system/framework/arm64/boot-framework.oat (android.os.Binder.clearCallingIdentity [DEDUPED]+144)
#06 pc 0000000000c8731c /system/framework/arm64/boot-framework.oat (android.os.HandlerThread.run+332)
#07 pc 000000000054ad88 /system/lib64/libart.so (art_quick_invoke_stub+584)
#08 pc 00000000000dcf74 /system/lib64/libart.so (_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc+200)
#09 pc 000000000046d6a0 /system/lib64/libart.so (_ZN3artL18InvokeWithArgArrayERKNS_33ScopedObjectAccessAlreadyRunnableEPNS_9ArtMethodEPNS_8ArgArrayEPNS_6JValueEPKc+100)
#10 pc 000000000046e8cc /system/lib64/libart.so (_ZN3art35InvokeVirtualOrInterfaceWithJValuesERKNS_33ScopedObjectAccessAlreadyRunnableEP8_jobjectP10_jmethodIDP6jvalue+836)
#11 pc 0000000000496e4c /system/lib64/libart.so (_ZN3art6Thread14CreateCallbackEPv+1120)
#12 pc 0000000000074d74 /system/lib64/libc.so (_ZL15__pthread_startPv+36)
#13 pc 000000000001fce4 /system/lib64/libc.so (__start_thread+68)
Native OOM
Native OOM refers to Out of Memory issues caused by exceeding the native memory address space limit. In Android development, applications may use native code (e.g., C/C++) for high-performance tasks. Memory leaks or excessive native memory allocations in native code can deplete the native memory address space, triggering OOM problems related to native memory allocation. Common manifestations include mmap failures or VRAM exhaustion.
03-18 14:39:31.424 9236 30371 W .tencent.xxx: Large object allocation failed: Failed anonymous mmap(0x0, 2101248, 0x3, 0x22, -1, 0): Out of memory. See process maps in the log.
03-18 14:39:31.437 9236 30371 W .tencent.xxx: Throwing OutOfMemoryError "Failed to allocate a 2097172 byte allocation with 24542160 free bytes and 283MB until OOM, target footprint 264041288, growth limit 536870912" (VmSize 4039036 kB)
03-18 14:39:32.128 9236 4592 D CCodecBufferChannel: [c2.mtk.hevc.decoder#869] DEBUG: elapsed: mInputMetEos 20, hasPendingOutputsInClient 0, n=1 [in=4 pipeline=0 out=16]
03-18 14:39:32.492 9236 30526 E CursorWindow: Failed mmap: Out of memory
03-18 15:42:31.466 18578 18864 W Adreno-GSL: <sharedmem_gpuobj_alloc:2713>: sharedmem_gpumem_alloc: mmap failed errno 12 Out of memory
03-18 15:42:32.351 18578 18864 W Adreno-GSL: <sharedmem_gpuobj_alloc:2713>: sharedmem_gpumem_alloc: mmap failed errno 12 Out of memory
03-18 15:42:33.279 18578 18864 W Adreno-GSL: <sharedmem_gpuobj_alloc:2713>: sharedmem_gpumem_alloc: mmap failed errno 12 Out of memory
03-18 15:42:33.321 18578 18864 E OpenGLRenderer: GL error: Out of memory!
Metric Analysis
In the OOM rate metric system, java.lang.OutOfMemoryError represents only one category, which we refer to as the Java OOM rate. In addition to the Java OOM rate, we have now introduced the FD OOM rate and Native OOM rate, corresponding to the probability of crash issues caused by FD resources and process Native memory, respectively. The new OOM rate system is accessible in OOM Types and requires no modifications on the business side. It can be experienced by upgrading to versions after 4.3.2.
Trend analysis
Issue List
The OOM issue list is similar to the crash monitoring module. For details, see Crash Problem List. Note:
The benefit of redefining the OOM rate and categorizing OOM issues into distinct types lies in more accurately locating and resolving OOM problems. By monitoring and analyzing different types of OOM rates, we can better understand an application's resource usage across various dimensions, enabling targeted optimization measures.
Java OOM rate: Monitoring the Java OOM rate can prompt us to proactively identify memory leaks, optimize garbage collection policies, and adjust heap memory size in order to reduce the occurrence of Java OOM issues.
FD OOM rate: Monitoring the FD OOM rate helps identify file descriptor leakage or misuse, enabling timely closure of unused file descriptors to prevent FD OOM issues.
Native OOM rate: Monitoring the Native OOM rate helps identify memory leaks or excessive allocation issues in native code, enabling timely release of unused native memory to prevent Native OOM problems.
By categorizing and monitoring different types of OOM rates, we can more precisely locate and resolve OOM issues, enhancing application stability and performance. Additionally, Platform B has provided dedicated monitoring features for each OOM type. We welcome you to use these capabilities.
Issue Details
The OOM issue list is similar to the crash monitoring module. For details, see Crash Issue Details.