Prometheus 部署 |

使用 Docker

Prometheus

所有 Prometheus 服务均可在 Quay.io 或 Docker Hub 上以 Docker 镜像的形式获取。

在 Docker 上运行 Prometheus 非常简单 docker run -p 9090:9090 prom/prometheus 。这将使用示例配置启动 Prometheus，并将其暴露在 9090 端口上。

Prometheus 数据存储在/prometheus容器内部的目录中，因此每次容器重启时数据都会被清除。要保存数据，您需要为容器设置持久存储（或绑定挂载）。

运行带有持久存储的 Prometheus 容器：

docker run -d \
    --name prometheus \
    -p 9090:9090 \
    -v /path/to/config:/etc/prometheus \
    -v /path/to/data:/prometheus \
    prom/prometheus \
    --config.file=/etc/prometheus/prometheus.yml \
    --web.enable-lifecycle

tip: 这里的/path/to/config 为配置目录，/path/to/data是数据目录

配置文件说明 prometheus.yml

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    file_sd_configs:
      - files:
        - 'targets/*.json'  # 或者 *.yaml
        refresh_interval: 1m # 多久检查一次文件变化

prometheus为默认采集 target , 采集其他数据已经有许多封装好的 exporter , 这里我们选择基础采集的 node exporter来测试

新增一个job_name为 node_exporter，动态采集配置targets目录下的所有json文件

后续新增主机，只需要在该目录下的json文件中添加对应的信息即可自动上报

[
  {
    "targets": ["172.17.0.180:9100"],
    "labels": {
      "env": "prod",
      "service": "docker"
    }
  },
  {
    "targets": ["172.17.0.172:9100"],
    "labels": {
      "env": "dev"
    }
  }
]

Alertmanager

Alertmanager 支持多种接收器，例如 email、webhook等，用于在警报触发时发出通知。
启动一个 alertmanager 容器来进行尝试。

docker run -d \
    --name alertmanager \
    -p 9093:9093 \
    -v /path/to/alertmanager:/etc/alertmanager \
    quay.io/prometheus/alertmanager

现在可以通过http://localhost:9093/访问 Alertmanager 。

tip: /path/to/alertmanager 为配置持久化目录

配置文件说明 alertmanager.yml

global:
  resolve_timeout: 5m
  smtp_from: 'xxx@163.com' # 发件人
  smtp_smarthost: 'smtp.163.com:465' # 邮箱服务器的 POP3/SMTP 主机配置 smtp.qq.com 端口为 465 或 587
  smtp_auth_username: 'xxx@163.com' # 用户名
  smtp_auth_password: <secret> # 授权码
  smtp_require_tls: false
  smtp_hello: '163.com'

templates:
  - '/etc/alertmanager/templates/*.tmpl'

route:
  group_by: ['alertname'] # 告警分组
  group_wait: 5s # 在组内等待所配置的时间，如果同组内，5 秒内出现相同报警，在一个组内出现。
  group_interval: 5m # 如果组内内容不变化，合并为一条警报信息，5 分钟后发送。
  repeat_interval: 5m # 发送告警间隔时间 s/m/h，如果指定时间内没有修复，则重新发送告警
  receiver: 'email' # 优先使用 wechat 发送
  routes: #子路由，使用 email 发送
  - receiver: email
    match_re:
      serverity: email

receivers:
- name: 'email'
  email_configs:
  - to: 'xxx@qq.com' # 如果想发送多个人就以 ',' 做分割
    send_resolved: true
    html: '{{ template "email.html" . }}'   #使用自定义的模板发送
- name: 'wechat'
  wechat_configs:
  - corp_id: 'xxxxxxxxxxxxx' #企业 ID
    api_url: 'https://qyapi.weixin.qq.com/cgi-bin/' # 企业微信 api 接口 统一定义
    to_party: '2'  # 通知组 ID
    agent_id: '1000002' # 新建应用的 agent_id
    api_secret: 'xxxxxxxxxxxxxx' # 生成的 secret
    send_resolved: true

各种接收器收到的文本信息格式可以通过template模板来定义，我们在alertmanager配置目录下新建一个templates/email.tmpl

{{ define "email.html" }}
{{ range $i, $alert :=.Alerts }}
========监控报警==========<br>
告警状态：{{   .Status }}<br>
告警级别：{{ $alert.Labels.severity }}<br>
告警类型：{{ $alert.Labels.alertname }}<br>
告警应用：{{ $alert.Annotations.summary }}<br>
告警主机：{{ $alert.Labels.instance }}<br>
告警详情：{{ $alert.Annotations.description }}<br>
触发阀值：{{ $alert.Annotations.value }}<br>
告警时间：{{ $alert.StartsAt.Format "2006-01-02 15:04:05" }}<br>
========end=============<br>
{{ end }}
{{ end }}

至此，alertmanager相关配置已完成，回到prometheus进行修改

prometheus.yml新增rule规则和alertmanager target

...
rule_files:
  - 'rules/*.yml'

alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - 172.17.0.180:6800
...

在prometheus配置目录新增rule规则，新增文件rules/disk.yml

groups:
- name: disk_alerts
  rules:
  - alert: HighDiskUsage
    expr: (node_filesystem_size_bytes{fstype!=""} - node_filesystem_free_bytes{fstype!=""}) / node_filesystem_size_bytes{fstype!=""} * 100 > 80
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "High Disk Usage detected"
      description: "Filesystem usage is above 80% on {{ $labels.instance }}."

该rule为监测节点所有文件系统使用率，大于80%时邮件告警

tip: 默认prometheus rule不支持热加载，需手动执行curl -X POST http://172.17.0.180:9090/-/reload刷新

Grafana

要运行最新稳定版的 Grafana，请运行以下命令：

docker run -d \
    -p 3000:3000 \
    --name=grafana \
    grafana/grafana

tip: 这里并未执行数据持久化，生产环境注意使用-v挂载相应目录

Grafana 安装并运行后，在浏览器中访问http://localhost:3000 。使用默认用户名和admin密码admin登录，并设置新的凭据。

让我们点击侧边栏中的齿轮图标，向 Grafana 添加数据源，然后选择Data Sources

在”数据源”界面中，您可以看到 Grafana 支持多种数据源，例如 Graphite、PostgreSQL 等。选择 Prometheus 进行设置。

在 HTTP 部分输入 URL http://localhost:9090 ，然后点击Save and Test

Prometheus Grafana