diff --git a/content/zh/docs/cluster-administration/cluster-wide-alerting-and-notification/alerting-policy.md b/content/zh/docs/cluster-administration/cluster-wide-alerting-and-notification/alerting-policy.md index 87be272ef..a0735b550 100644 --- a/content/zh/docs/cluster-administration/cluster-wide-alerting-and-notification/alerting-policy.md +++ b/content/zh/docs/cluster-administration/cluster-wide-alerting-and-notification/alerting-policy.md @@ -1,100 +1,103 @@ --- -title: "Alerting Policy (Node Level)" +title: "告警策略(节点级别)" keywords: 'KubeSphere, Kubernetes, Node, Alerting, Policy, Notification' -description: 'How to set alerting policies at the node level.' +description: '如何在节点级设置告警策略。' -linkTitle: "Alerting Policy (Node Level)" +linkTitle: "告警策略(节点级别)" weight: 4160 --- -## Objective +## 目标 -KubeSphere provides alert policies for nodes and workloads. This guide demonstrates how you can create alert policies for nodes in the cluster and configure mail notifications. See [Alerting Policy (Workload Level)](../../../project-user-guide/alerting/alerting-policy/) to learn how to configure alert policies for workloads. +KubeSphere为节点和工作负载提供告警策略。 本指南演示了如何为群集中的节点创建告警策略以及如何配置邮件通知。如需了解如何为工作负载配置告警策略请参阅[告警策略(工作负载级别)](../../../project-user-guide/alerting/alerting-policy/) 。 -## Prerequisites +## 前提条件 -- [KubeSphere Alerting and Notification](../../../pluggable-components/alerting-notification/) needs to be enabled. -- [Mail Server](../../../cluster-administration/cluster-settings/mail-server/) needs to be configured. +- [KubeSphere告警和通知](../../../pluggable-components/alerting-notification/)功能需要启用。 +- [邮件服务器](../../../cluster-administration/cluster-settings/mail-server/) 需要配置。 -## Hands-on Lab +## 动手实验 -### Task 1: Create an Alert Policy +### 任务 1: 创建一个告警策略 -1. Log in the console with one account granted the role `platform-admin`. +1. 使用一个被授予`平台管理员`角色的帐户登录控制台。 -2. Click **Platform** in the top left corner and select **Clusters Management**. +2. 单击左上角的**平台管理**,然后选择**集群管理**。 - ![alerting_policy_node_level_guide](/images/docs/alerting/alerting_policy_node_level_guide.png) + ![alerting_policy_node_level_guide](/images/docs/alerting-zh/alerting_policy_node_level_guide.png) -3. Select a cluster from the list and enter it (If you do not enable the [multi-cluster feature](../../../multicluster-management/), you will directly go to the **Overview** page). +3. 从列表中选择一个集群并进入(如果您未启用[多集群特性](../../../multicluster-management/),则将直接转到**总览**页面)。 -4. Navigate to **Alerting Policies** under **Monitoring & Alerting**, and click **Create**. +4. 导航到**监控告警**下的**告警策略**,点击 **创建**. - ![alerting_policy_node_level_create](/images/docs/alerting/alerting_policy_node_level_create.png) + ![alerting_policy_node_level_create](/images/docs/alerting-zh/alerting_policy_node_level_create.png) -### Task 2: Provide Basic Information +### 任务 2: 提供基本信息 -In the dialog that appears, fill in the basic information as follows. Click **Next** after you finish. +在出现的对话框中,填写如下基本信息。 完成后,单击**下一步**。 -- **Name**: a concise and clear name as its unique identifier, such as `alert-demo`. -- **Alias**: to help you distinguish alert policies better. Chinese is supported. -- **Description**: a brief introduction to the alert policy. +- **名称**: 简洁明了的名称作为其唯一标识符,例如`alert-demo`。 +- **别名**: 帮助您更好地区分告警策略。 支持中文。 +- **描述信息**: 告警策略的简要介绍。 -![alerting_policy_node_level_basic_info](/images/docs/alerting/alerting_policy_node_level_basic_info.png) +![alerting_policy_node_level_basic_info](/images/docs/alerting-zh/alerting_policy_node_level_basic_info.png) -### Task 3: Select Monitoring Targets +### 任务 3: 选择监控目标 -Select several nodes in the node list or use Node Selector to choose a group of nodes as the monitoring targets. Here a node is selected for the convenience of demonstration. Click **Next** when you finish. +在节点列表中选择节点,或使用**节点选择器**选择一组节点作为监控目标。 为了方便演示,此处选择一个节点。 完成后单击“下一步”。 -![alerting_policy_node_level_monitoring_target](/images/docs/alerting/alerting_policy_node_level_monitoring_target.png) +![alerting_policy_node_level_monitoring_target](/images/docs/alerting-zh/alerting_policy_node_level_monitoring_target.png) {{< notice note >}} -You can sort nodes in the node list from the drop-down menu through the following three ways: `Sort By CPU`, `Sort By Memory`, `Sort By Pod Utilization`. +您可以通过以下三种方式从下拉菜单中对节点列表中的节点进行排序:
+ +1. CPU使用率 +2. 内存使用率 +3. 容器组用量 {{}} -### Task 4: Add Alerting Rules +### 任务 4: 添加告警规则 -1. Click **Add Rule** to begin to create an alerting rule. The rule defines parameters such as metric type, check period, consecutive times, metric threshold and alert level to provide rich configurations. The check period (the second field under **Rule**) means the time interval between 2 consecutive checks of the metric. For example, `2 minutes/period` means the metric is checked every two minutes. The consecutive times (the third field under **Rule**) means the number of consecutive times that the metric meets the threshold when checked. An alert is only triggered when the actual time is equal to or is greater than the number of consecutive times set in the alert policy. +1. 单击**添加规则**开始创建告警规则。该规则提供丰富的配置,如度量标准类型、检查周期、连续次数、度量阈值和告警级别之类的参数。 检测周期(**规则**下的第二个字段)表示对度量进行两次连续检查之间的时间间隔。 例如,`1分钟/周期`表示每1分钟检查一次指标。 连续次数(**规则**下的第三个字段)表示检查的指标满足阈值的连续次数。 只有当实际次数等于或大于告警策略中设置的连续次数时,才会触发告警。 - ![alerting_policy_node_level_alerting_rule](/images/docs/alerting/alerting_policy_node_level_alerting_rule.png) + ![alerting_policy_node_level_alerting_rule](/images/docs/alerting-zh/alerting_policy_node_level_alerting_rule.png) -2. In this example, set those parameters to `memory utilization rate`, `1 minute/period`, `2 consecutive times`, `>` and `50%`, and `Major Alert` in turn. It means KubeSphere checks the memory utilization rate every minute, and a major alert is triggered if it is larger than 50% for 2 consecutive times. +2. 在本示例中,将这些参数分别设置为`内存利用率`,`1分钟/周期`,`连续2次`,`>50%`和`重要告警`。这意味着KubeSphere会每分钟检查一次内存利用率,如果连续2次大于50%,则会触发此重要告警。 -3. Click **√** to save the rule when you finish and click **Next** to continue. +3. 完成后,单击 **√** 保存规则,然后单击**下一步**继续。 {{< notice note >}} -You can create node-level alert policies for the following metrics: +您可以为以下指标创建节点级别的告警策略: -- CPU: `cpu utilization rate`, `cpu load average 1 minute`, `cpu load average 5 minutes`, `cpu load average 15 minutes` -- Memory: `memory utilization rate`, `memory available` -- Disk: `inode utilization rate`, `disk space available`, `local disk space utilization rate`, `disk write throughput`, `disk read throughput`, `disk read iops`, `disk write iops` -- Network: `network data transmitting rate`, `network data receiving rate` -- Pod: `pod abnormal ratio`, `pod utilization rate` +- CPU:`CPU利用率`, `CPU 1分钟平均负载`, `CPU 5分钟平均负载`, `CPU 15分钟平均负载` +- 内存: `内存利用率`, `可用内存` +- 磁盘: `inode利用率`, `本地磁盘可用空间`, `本地磁盘空间利用率`, `本地磁盘写入吞吐`, `本地磁盘读吞吐`, `磁盘读iops`, `磁盘写iops` +- 网络: `网络发送数据速率`, `网络接收数据速率` +- 容器组: `容器组异常率`, `容器组利用率` {{}} -### Task 5: Set Notification Rule +### 任务 5: 设置通知规则 -1. **Effective Notification Time Range** is used to set sending time of notification emails, such as `09:00 ~ 19:00`. **Notification Channel** currently only supports **Email**. You can add email addresses of members to be notified to **Notification List**. +1. **有效通知时间范围**用于设置通知电子邮件的发送时间,例如09:00〜19:00。 **通知渠道**目前仅支持电子邮件。 您可以将要通知的成员电子邮件地址添加到**通知列表**。 -2. **Customize Repetition Rules** defines sending period and retransmission times of notification emails. If alerts have not been resolved, the notification will be sent repeatedly after a certain period of time. Different repetition rules can also be set for different levels of alerts. Since the alert level set in the previous step is `Major Alert`, select `Alert once every 5 miniutes` (sending period) in the second field for **Major Alert** and `Resend up to 3 times` in the third field (retransmission times). Refer to the following image to set notification rules: +2. **自定义重复规则**定义了通知电子邮件的发送频率和重发次数。 如果尚未解决告警,则将在一段时间后重复发送通知。 还可以为不同级别的告警设置不同的重复规则。 由于在上一步中设置的警报级别为**重要告警**,因此在**重要告警**的第二个字段选择`每5分钟告警一次`(发送周期),并在第三个字段中选`最多重发3次`(重发次数)。 请参考下图设置通知规则: - ![alerting_policy_node_level_notification_rule](/images/docs/alerting/alerting_policy_node_level_notification_rule.png) + ![alerting_policy_node_level_notification_rule](/images/docs/alerting-zh/alerting_policy_node_level_notification_rule.png) -3. Click **Create**, and you can see that the alert policy is successfully created. +3. 单击**创建**,您可以看到告警策略已成功创建。 {{< notice note >}} -*Waiting Time for Alerting* **=** *Check Period* **x** *Consecutive Times*. For example, if the check period is 1 minute/period, and the number of consecutive times is 2, you need to wait for 2 minutes before the alert message appears. +*警报等待时间* **=** *检测周期* **x** *连续次数*。 例如,如果检测周期为1分钟/周期,并且连续次数为2,则需要等待2分钟,然后才会显示告警消息。 {{}} -### Task 6: View Alert Policy +### 任务 6: 查看告警策略 -After an alert policy is successfully created, you can enter its detail information page to view the status, alert rules, monitoring targets, notification rule, alert history, etc. Click **More** and select **Change Status** from the drop-down menu to enable or disable this alert policy. - -![alerting-policy-node-level-detail-page](/images/docs/alerting/alerting-policy-node-level-detail-page.png) +成功创建告警策略后,您可以进入其详细信息页面查看状态:告警规则、监控目标、通知规则和告警历史记录等。单击**更多操作**,然后从下拉菜单中选择**更改状态**可以启用或禁用此告警策略。 +![alerting-policy-node-level-detail-page](/images/docs/alerting-zh/alerting-policy-node-level-detail-page.png) diff --git a/static/images/docs/alerting-zh/alerting-policy-node-level-detail-page.png b/static/images/docs/alerting-zh/alerting-policy-node-level-detail-page.png new file mode 100644 index 000000000..9a3559977 Binary files /dev/null and b/static/images/docs/alerting-zh/alerting-policy-node-level-detail-page.png differ diff --git a/static/images/docs/alerting-zh/alerting_policy_node_level_alerting_rule.png b/static/images/docs/alerting-zh/alerting_policy_node_level_alerting_rule.png new file mode 100644 index 000000000..ee9f7d92d Binary files /dev/null and b/static/images/docs/alerting-zh/alerting_policy_node_level_alerting_rule.png differ diff --git a/static/images/docs/alerting-zh/alerting_policy_node_level_basic_info.png b/static/images/docs/alerting-zh/alerting_policy_node_level_basic_info.png new file mode 100644 index 000000000..7ab507e0f Binary files /dev/null and b/static/images/docs/alerting-zh/alerting_policy_node_level_basic_info.png differ diff --git a/static/images/docs/alerting-zh/alerting_policy_node_level_create.png b/static/images/docs/alerting-zh/alerting_policy_node_level_create.png new file mode 100644 index 000000000..53c09edfc Binary files /dev/null and b/static/images/docs/alerting-zh/alerting_policy_node_level_create.png differ diff --git a/static/images/docs/alerting-zh/alerting_policy_node_level_guide.png b/static/images/docs/alerting-zh/alerting_policy_node_level_guide.png new file mode 100644 index 000000000..b90bd02e3 Binary files /dev/null and b/static/images/docs/alerting-zh/alerting_policy_node_level_guide.png differ diff --git a/static/images/docs/alerting-zh/alerting_policy_node_level_monitoring_target.png b/static/images/docs/alerting-zh/alerting_policy_node_level_monitoring_target.png new file mode 100644 index 000000000..cebbd6bde Binary files /dev/null and b/static/images/docs/alerting-zh/alerting_policy_node_level_monitoring_target.png differ diff --git a/static/images/docs/alerting-zh/alerting_policy_node_level_notification_rule.png b/static/images/docs/alerting-zh/alerting_policy_node_level_notification_rule.png new file mode 100644 index 000000000..eb00158c7 Binary files /dev/null and b/static/images/docs/alerting-zh/alerting_policy_node_level_notification_rule.png differ