Monitoring your validator with Grafana and Prometheus

Prometheus is a monitoring platform that collects metrics from monitored targets by scraping metrics HTTP endpoints on these targets. Official documentation is available here. Grafana is a dashboard used to visualize the collected data.

6.1 Installation

Install prometheus and prometheus node exporter.

sudo apt-get install -y prometheus prometheus-node-exporter

Install grafana.

wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" > grafana.list
sudo mv grafana.list /etc/apt/sources.list.d/grafana.list
sudo apt-get update && sudo apt-get install -y grafana

Enable services so they start automatically.

sudo systemctl enable grafana-server.service prometheus.service prometheus-node-exporter.service

Create the prometheus.yml config file. Choose the tab for your eth client. Simply copy and paste.

cat > $HOME/prometheus.yml << EOF
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
   - job_name: 'node_exporter'
     static_configs:
       - targets: ['localhost:9100']
   - job_name: 'nodes'
     metrics_path: /metrics    
     static_configs:
       - targets: ['localhost:5054']
   - job_name: 'validators'
     metrics_path: /metrics
     static_configs:
       - targets: ['localhost:5064']
EOF

cat > $HOME/prometheus.yml << EOF
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
   - job_name: 'node_exporter'
     static_configs:
       - targets: ['localhost:9100']
   - job_name: 'nodes'
     metrics_path: /metrics    
     static_configs:
       - targets: ['localhost:8008']
EOF

cat > $HOME/prometheus.yml << EOF
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
   - job_name: 'node_exporter'
     static_configs:
       - targets: ['localhost:9100']
   - job_name: 'nodes'
     metrics_path: /metrics    
     static_configs:
       - targets: ['localhost:8008']
EOF

cat > $HOME/prometheus.yml << EOF
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
   - job_name: 'node_exporter'
     static_configs:
       - targets: ['localhost:9100']
   - job_name: 'validator'
     static_configs:
       - targets: ['localhost:8081']
   - job_name: 'beacon node'
     static_configs:
       - targets: ['localhost:8080']
   - job_name: 'slasher'
     static_configs:
       - targets: ['localhost:8082']
EOF

cat > $HOME/prometheus.yml << EOF   
scrape_configs:
   - job_name: 'node_exporter'
     static_configs:
       - targets: ['localhost:9100']
   - job_name: 'Lodestar'
     metrics_path: /metrics    
     static_configs:
       - targets: ['localhost:8008']
EOF

Setup prometheus for your execution client. Start by editing prometheus.yml

nano $HOME/prometheus.yml

Append the applicable job snippet for your execution client to the end of prometheus.yml. Save the file.

Spacing matters. Ensure all job_name snippets are in alignment.

   - job_name: 'geth'
     scrape_interval: 15s
     scrape_timeout: 10s
     metrics_path: /debug/metrics/prometheus
     scheme: http
     static_configs:
     - targets: ['localhost:6060']

   - job_name: 'besu'
     scrape_interval: 15s
     scrape_timeout: 10s
     metrics_path: /metrics
     scheme: http
     static_configs:
     - targets:
       - localhost:9545

   - job_name: 'nethermind'
     static_configs:
       - targets: ['localhost:6060']

   - job_name: 'erigon'
     scrape_interval: 10s
     scrape_timeout: 3s
     metrics_path: /debug/metrics/prometheus
     scheme: http
     static_configs:
       - targets: ['localhost:6060']

Move it to /etc/prometheus/prometheus.yml

sudo mv $HOME/prometheus.yml /etc/prometheus/prometheus.yml

Update file permissions.

sudo chmod 644 /etc/prometheus/prometheus.yml

Finally, restart the services.

sudo systemctl restart grafana-server.service prometheus.service prometheus-node-exporter.service

Verify that the services are running properly:

sudo systemctl status grafana-server.service prometheus.service prometheus-node-exporter.service

🔥 Grafana Security: SSH Tunnels

Do not expose Grafana (port 3000) to the public internet as this invites a new attack surface! A secure solution would be to access Grafana through a ssh tunnel.

Example of how to create a ssh tunnel in Linux or MacOS:

ssh -N -v <user>@<staking.node.ip.address> -L 3000:localhost:3000

Example of how to create a ssh tunnel in Windows with Putty:

Navigate to Connection > SSH > Tunnels > Enter Source Port 3000 > Enter Destination localhost:3000 > Click Add

Now you can access Grafana on your local machine by pointing a web browser to http://localhost:3000

📶 6.2 Setting up Grafana Dashboards

Open http://localhost:3000 or http://<your validator's ip address>:3000 in your web browser.
Login with admin / admin
Change password
Click the configuration gear icon, then Add data Source
Select Prometheus
Set Name to "Prometheus"
Set URL to http://localhost:9090
Click Save & Test
Download and save your consensus client's json file. More json dashboard options available below. [ Lighthouse | Teku | Nimbus | Prysm | Prysm > 10 Validators | Lodestar ]
Download and save your execution client's json file [ Geth | Besu | Nethermind | Erigon ]
Download and save a node-exporter dashboard for general system monitoring
Click Create + icon > Import
Add the consensus client dashboard via Upload JSON file
If needed, select Prometheus as Data Source.
Click the Import button.
Repeat steps 12-15 for the execution client dashboard.
Repeat steps 12-15 for the node-exporter dashboard.

🔥 Troubleshooting common Grafana issues

Symptom: Your dashboard is missing some data.

Solution: Ensure that the execution or consensus client has enabled the appropriate metrics flag.

Geth: geth --http --metrics --pprof
Besu: besu --metrics-enabled=true
Nethermind: Nethermind.Runner --Metrics.Enabled true
Erigon: erigon --metrics
Lighthouse beacon-node: lighthouse bn --validator-monitor-auto
Nimbus: nimbus_beacon_node --metrics --metrics-port=8008
Teku: --metrics-enabled=true --metrics-port=8008
Lodestar beacon-node: lodestar beacon --metrics true

Example of Grafana Dashboards for each consensus client.

Beacon Chain JSON Download link: https://raw.githubusercontent.com/sigp/lighthouse-metrics/master/dashboards/Summary.json

Validator Client JSON download link: https://raw.githubusercontent.com/sigp/lighthouse-metrics/master/dashboards/ValidatorClient.json

Credits: https://github.com/sigp/lighthouse-metrics/

JSON Download link: https://raw.githubusercontent.com/Yoldark34/lighthouse-staking-dashboard/main/Yoldark_ETH_staking_dashboard.json

Credits: https://github.com/Yoldark34/lighthouse-staking-dashboard

Example of Grafana Dashboards for each execution client.

Credits: https://gist.github.com/karalabe/e7ca79abdec54755ceae09c08bd090cd

Example of Node-Exporter Dashboard

General system monitoring

Includes: CPU, memory, disk IO, network, temperature and other monitoring metrics。

Credits: starsliao

⚠️ 6.3 Setup Alert Notifications

Setup alerts to get notified if your validators go offline.

Get notified of problems with your validators. Choose between email, telegram, discord or slack.

Visit https://beaconcha.in/
Sign up for an account.
Verify your email
Search for your validator's public address
Add validators to your watchlist by clicking the bookmark symbol.

On the menu of Grafana, select Notification channels under the bell icon.
Click on Add channel.
Give the notification channel a name.
Select Telegram from the Type list.
To complete the Telegram API settings, a Telegram channel and **bot **are required. For instructions on setting up a bot with @Botfather, see this section of the Telegram documentation. You need to create a BOT API token.
Create a new telegram group.
Invite the bot to your new group.
Type at least 1 message into the group to initialize it.
Visit https://api.telegram.org/botXXX:YYY/getUpdates where XXX:YYY is your BOT API Token.
In the JSON response, find and copy the Chat ID. Find it between **chat **and title. Example of Chat ID: -1123123123
```
"chat":{"id":-123123123,"title":
```
Paste the Chat ID into the corresponding field in Grafana.
Save and test the notification channel for your alerts.
Now you can create custom alerts from your dashboards. Visit here to learn how to create alerts.

PreviousStep 5: Installing consensus client NextMobile App Node Monitoring by beaconcha.in

Last updated 1 year ago