rpki/monitor/README.md

132 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Ours RP Prometheus / Grafana Monitor
本目录提供本地开发监控栈,用于采集 `rpki_artifact_metrics` 暴露的 ours RP soak 指标。
## 前置条件
1. Docker + Docker Compose v2
2. 宿主机已启动 `rpki_artifact_metrics`,并监听 Docker 网桥可访问的地址,例如 `0.0.0.0:9556`
3. Prometheus 容器通过 `host.docker.internal:9556` 访问宿主 sidecar。
Linux Docker 下 compose 已配置:
```yaml
extra_hosts:
- host.docker.internal:host-gateway
```
## 启动
```bash
cd rpki_2/rpki/monitor
docker compose up -d
```
默认镜像使用官方 Docker Hub 镜像:
```text
prom/prometheus:v2.55.1
grafana/grafana:11.3.1
```
如需切到其它镜像源:
```bash
PROMETHEUS_IMAGE=<mirror>/prom/prometheus:v2.55.1 \
GRAFANA_IMAGE=<mirror>/grafana/grafana:11.3.1 \
docker compose up -d
```
默认端口:
- Prometheus: <http://localhost:9090>
- Grafana: <http://localhost:3000>
- Grafana 默认账号密码:`admin` / `admin`
如端口冲突:
```bash
PROMETHEUS_PORT=19090 GRAFANA_PORT=13000 docker compose up -d
```
## 停止
```bash
cd rpki_2/rpki/monitor
docker compose down
```
保留数据 volume。若要清理数据
```bash
docker compose down -v
```
## 典型本地联调命令
先启动 APNIC soak 和 metrics sidecar例如
```bash
# soak .env 关键配置
MAX_RUNS=-1
RIRS=apnic
RETAIN_RUNS=5
INTERVAL_SECS=0
# metrics sidecar
rpki_artifact_metrics \
--run-root /path/to/portable-soak \
--listen 0.0.0.0:9556 \
--poll-secs 5 \
--instance local-apnic-continuous
```
再启动监控栈:
```bash
cd rpki_2/rpki/monitor
docker compose up -d
```
## 验证
Prometheus target
```bash
curl -s 'http://localhost:9090/api/v1/targets' | python3 -m json.tool
```
Prometheus query
```bash
curl -G 'http://localhost:9090/api/v1/query' \
--data-urlencode 'query=up{job="ours-rp-artifact-metrics"}'
curl -G 'http://localhost:9090/api/v1/query' \
--data-urlencode 'query=ours_rp_run_completed_total{status="success"}'
```
Grafana health
```bash
curl -s http://localhost:3000/api/health | python3 -m json.tool
```
Grafana dashboard
- 打开 <http://localhost:3000/d/ours-rp-soak-overview/ours-rp-soak-overview>
## 主要指标
- `ours_rp_metrics_service_up`
- `ours_rp_run_completed_total`
- `ours_rp_run_duration_seconds`
- `ours_rp_run_max_rss_bytes`
- `ours_rp_vrps`
- `ours_rp_vaps`
- `ours_rp_publication_points`
- `ours_rp_repo_sync_phase_count`
- `ours_rp_large_publication_points{object_count_gt="10|50|100|..."}`
- `ours_rp_cir_objects`
- `ours_rp_ccr_state_items`