Overview
The information on the important events generated during a training task process has been connected to the Tencent Cloud Observability Platform EventBridge. Users can configure event rules in the EventBridge to perform alarm notifications for related events. The currently supported events are queuing task (taskScheduling), running task (taskRunning), queued and completed task (taskCompleted), task execution failure (taskExecuteFailed), task automatic restart (taskRestarted), preempted task (taskPreempted), and stopped task (taskStopped). Directions
1. Go to TCOP > Event Set, and click Cloud Service Event Set - default event set (This event set is used to receive events, including cloud service alarms and audits. There is no concept of regions. The cloud service events in all regions are delivered to Guangzhou by default. To configure alarms, you need to bind alarm rules under this event set).
2. Click Manage Event Rules to access the event rule list. You can configure event matching rules, event targets, and other information.
3. Click Create Rule, fill in the rule name (for example, TI-ONE Training Task Alarm), and select TI-ONE for the cloud service type.
You can select from the drop-down menu to view the event examples, which represent the event structure of cloud service products delivered to the EventBridge. Subsequent event matching rules can be custom-written following the above-mentioned event structure.
Event matching rules are based on form mode or custom event pattern. It is recommended to use the custom event pattern. TI-ONE currently supports targeting some instances (for example, some training tasks) and all instances (all training tasks. New training tasks are automatically added to the alarm queue of rules). Meanwhile, the platform supports matching event rules by cloud product tags or task creators. The following shows some examples:
Match by event type:
The following rules indicate that alarms will be generated when two types of events, taskExecuteFailed or taskCompleted, occur in all training tasks under the current root account.
{
"source": "tione.cloud.tencent",
"type": [
"tione:ErrorEvent:taskExecuteFailed",
"tione:ErrorEvent:taskCompleted"
]
}
Match by sub-account UIN:
The following rules indicate that alarms will be generated when any of the events taskExecuteFailed, taskCompleted, taskRestarted or taskPreempted occurs in the training tasks created by the sub-account 010013819411.
{
"source": "tione.cloud.tencent",
"data": {
"taskSubUin": "010013819411"
}
}
Match by tag:
The following rules indicate that alarms will be generated when any of the events taskExecuteFailed, taskCompleted, taskRestarted or taskPreempted occurs in training tasks satisfying both the tags Department: Algorithm Research and Environment: Test.
{
"source": "tione.cloud.tencent",
"data": {
"tags": [
{
"contain":["department: algorithm research;","environment: test;"]
}
]
}
}
The following rules indicate that alarms will be generated when any of the events taskExecuteFailed, taskCompleted, taskRestarted or taskPreempted occurs in training tasks with the tags Department: Algorithm Research or Environment: Test.
{
"source": "tione.cloud.tencent",
"data": {
"tags": [
{
"contain":["department: algorithm research;"]
},
{
"contain":["environment:test;"]
}
]
}
}
All the above supported types can be combined. Example: The following rules indicate that alarms will be generated when a taskExecuteFailed event occurs in the training tasks satisfying both the tags Department: Algorithm Research and Environment: Test and the creator sub-account is 010013819411.
{
"source": "tione.cloud.tencent",
"type": [
"tione:ErrorEvent:taskExecuteFailed"
],
"data": {
"tags": [
{
"contain":["department: algorithm research;","environment: test;"]
}
],
"taskSubUin": "010013819411"
}
}
4. After completing event matching, click Next to start configuring the event target (that is, the alarm channel). Select the trigger mode as Message Push, and the notification method as Channel Push. Select the recipient (you can choose one or more alarm reception accounts). Select Receiving Channel, including Email, SMS, phone, and Message Center. Click Finish to return to the event rule page.
5. Once the above configurations are completed, you will receive an event notification (for example, an email notification) as shown below after an event that meets the alarm triggering conditions occurs.