本文最后更新于 1240 天前，其中的信息可能已经有所发展或是发生改变。

[TOC]

官方网站

https://prometheus.io/download/

主程序 Prometheus v2.34.0

单独下载连接
国内前置代理下载且安装一键代码(国内加速,安装路径/usr/local/prometheus)

yum -y install wget && wget http://hk.2331314.xyz:5550/https://github.com/prometheus/prometheus/releases/download/v2.34.0/prometheus-2.34.0.linux-amd64.tar.gz && tar -zxvf prometheus-2.34.0.linux-amd64.tar.gz && mv prometheus-2.34.0.linux-amd64 /usr/local/prometheus && rm -rf prometheus-2.34.0.linux-amd64.tar.gz

基本配置文件

https://hulining.gitbook.io/prometheus/prometheus/configuration/configuration

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["127.0.0.1:49090"]

  - job_name: 'Server Node'
    file_sd_configs:
      - files:
        - node_targets.yml

  - job_name: 'pushgateway'
    honor_labels: true
    static_configs:
      - targets: ['127.0.0.1:9091','www.baidu.com:9091']
        labels:
          instance: pushgateway

  - job_name: 'alertmanager'
    static_configs:
      - targets: ['127.0.0.1:9093']

配置注解

主程序端口以及IP地址端口

  - job_name: "prometheus"
    static_configs:
      #为Prometheus的地址，本地配置写入相关的端口
      - targets: ["127.0.0.1:9090"]

以文件的形式发现新节点

  - job_name: 'Server Node'
    # 以文件的形式发现新节点，不需要重启进程
    file_sd_configs:
      - files:
        # 需要在/usr/local/prometheus即主程序当前路径下创建node_targets.yml
        - node_targets.yml

创建node_targets.yml

在Prometheus的主程序目录下创建node_targets.yml的文件，并且写入需要监控的主机信息

[
    {
        "targets": [ "127.0.0.1:9182" ],
        "labels": {
            "instance": "instance-name"
        }
    },
    {
        "targets": [ "127.0.0.1:9182" ],
        "labels": {}
    },
    {
        "targets": [ "127.0.0.1:9100" ],
        "labels": {}
    },
    {
        "targets": [ "127.0.0.1:9100" ],
        "labels": {}
    }
]

Pushgateway的IP地址以及端口

  - job_name: 'pushgateway'
    # pushgateway接收到的数据不打上job和instance为Pushgateway的标签显示实际标签
    honor_labels: true
    static_configs:
      # 静态的方式进行发现，需要重启Prometheus，可做成文件发现形式
      - targets: ['127.0.0.1:9091','www.baidu.com:9091']
        labels:
          # 所有Pushgateway的命名
          instance: pushgateway

alertmanager 的IP地址以及端口

  - job_name: 'alertmanager'
    static_configs:
      - targets: ['127.0.0.1:9093']

配置进程自启动

cat >> /usr/lib/systemd/system/prometheus.service << EOF
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
    ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --web.enable-lifecycle --web.enable-admin-api --storage.tsdb.path=/usr/local/prometheus/data --storage.tsdb.retention.time=30d --storage.tsdb.retention.size=100GB
[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl restart prometheus.service && systemctl enable prometheus.service

参数	含义
--version	显示应用的版本信息
--config.file="prometheus.yml"	Prometheus配置文件路径
--web.listen-address="0.0.0.0:9090"	监听地址
--web.enable-lifecycle	通过HTTP请求启用关闭（shutdown）和重载（reload）# 支持热更新，直接执行localhost:9090/-/reload立即生效
--web.enable-admin-api	启用管理员行为API端点，# 控制对admin HTTP API的访问，其中包括删除时间序列等功能
--web.page-title="Prometheus Time Series Collection and Processing Server"	Prometheus的网页标题
--storage.tsdb.path="/usr/local/prometheus/data/"	数据的存储地址
--storage.tsdb.retention.time=STORAGE.TSDB.RETENTION.TIME	存储时间默认是15d（天）单位：y, w, d, h, m, s, ms
--storage.tsdb.retention.size=STORAGE.TSDB.RETENTION.SIZE	存储为块的最大字节数，需要使用一个单位，支持：B, KB, MB, GB,TB, PB, EB
--storage.tsdb.no-lockfile	不在data目录下创建锁文件
--storage.tsdb.wal-compression	压缩tsdb的WAL

删除指标

删除一个JOB指标

curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={job="JOB_name"}'

删除一个instance指标

curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={instance="JOB_name"}'

删除所有数据-危险操作

curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={__name__=~".+"}'

删除Job下的所有数据-危险操作

curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={__name__=~".+"}&match[]={job="targets_list"}'

不过需要注意的是上面的 API 调用并不会立即删除数据，实际数据任然还存在磁盘上，会在后面进行数据清理。

要确定何时删除旧数据，可以使用--storage.tsdb.retention参数进行配置（默认情况下，Prometheus 会将数据保留15天）。

查询语句

查询所有的指标

{job=~".+"}
{job=~".*",method="get"}

过去 5 分钟，数据指标

http_requests_total{job="prometheus"}[5m]

sum(http_requests_total{method="GET"} offset 5m)

运算符

+ 加
- 减
* 乘
/ 除
% 取模
^ 乘方，幂
== 等于
!= 不等于
>  大于
<  小于
>= 大于等于
<= 小于等于
and 交集
or  并集
unless 补集

sum (在指定维度上求和)
max (在指定维度上求最大值)
min (在指定维度上求最小值)
avg (在指定维度上求平均值)
stddev (在指定维度上求标准差)
stdvar (在指定维度上求方差)
count (统计向量元素的个数)
count_values (统计具有相同数值的元素数量)
bottomk (样本值中最小的 k个值)
topk (样本值中最大的 k个值)
quantile (在指定维度上统计 φ-quantile 分位数(0 ≤ φ ≤ 1))

by 只保留选中的指标
例子：sum(ceph_osd_up{job=~".+"})by(instance)
解析：只保留ceph_osd_up中的instance标签，其他都删除

without从结果向量中删除列出的标签，而所有其它标签均保留在结果向量中。by恰好相反，它会删除by子句中未列出的标签，即使它们的标签值在向量的所有元素之间都相同。

例子：avg(ceph_health_error_info) without (error_info) > 0
解析：在上面的查询语句中ceph_health_error_info包含job,instance,error_info,cluster_name指标，without之后删除掉error_info的指标，保留其他三个指标

常用函数

https://hulining.gitbook.io/prometheus/prometheus/querying/functions

API使用

# 重载配置文件
curl -X POST http://localhost:9090/-/reload

官方网站

主程序 Prometheus v2.34.0

基本配置文件

配置注解

主程序端口以及IP地址端口

以文件的形式发现新节点

创建node_targets.yml

Pushgateway的IP地址以及端口

alertmanager 的IP地址以及端口

配置进程自启动

相关命令参数

删除指标

删除一个JOB指标

删除一个instance指标

删除所有数据-危险操作

删除Job下的所有数据-危险操作

查询语句

查询所有的指标

过去 5 分钟，数据指标

运算符

常用函数

API使用

发送评论编辑评论

官方网站

主程序 Prometheus v2.34.0

基本配置文件

配置注解

主程序端口以及IP地址端口

以文件的形式发现新节点

创建node_targets.yml

Pushgateway的IP地址以及端口

alertmanager 的IP地址以及端口

配置进程自启动

相关命令参数

删除指标

删除一个JOB指标

删除一个instance指标

删除所有数据-危险操作

删除Job下的所有数据-危险操作

查询语句

查询所有的指标

过去 5 分钟，数据指标

运算符

常用函数

API使用

发送评论 编辑评论

发送评论编辑评论