thaarok/go-opera-prometheus.md

## go-opera-prometheus.md

      
    Raw
  

              go-opera-prometheus.md
            
          
    Prometheus installation

sudo apt install prometheus prometheus-node-exporter
tee /etc/ufw/applications.d/prometheus <<EOF
[Prometheus]
title=Prometheus UI
description=Prometheus monitoring web UI.
ports=9090/tcp
EOF
sudo ufw allow from 85.195.116.121 to any app prometheus

In Grafana: Add data source -> Prometheus -> URL: http://server:9090/
Enable Prometheus metrics in Fantom Opera

Run the Fantom Opera with following parameters:
opera ... --pprof --metrics --metrics.expensive

You can skip --metrics.expensive if you are not interested in StateDB timers or per-method RPC stats.
You can check available metrics and necessary scrape_timeout using:
time curl http://localhost:6060/debug/metrics/prometheus

Let Prometheus grab data from Opera - append into /etc/prometheus/prometheus.yml:
  - job_name: opera
    scrape_interval: 30s
    scrape_timeout: 30s
    metrics_path: '/debug/metrics/prometheus'
    static_configs:
      - targets: ['localhost:6060']

sudo systemctl restart prometheus

The job_name needs to be unique, otherwise Prometheus will not start!
Google Cloud Monitoring using Ops Agent

# /etc/google-cloud-ops-agent/config.yaml
# apply changes: sudo service google-cloud-ops-agent restart
# from https://cloud.google.com/monitoring/agent/ops-agent/prometheus#oagent-config-json-exporter
logging:
  service:
    pipelines:
      default_pipeline:
        receivers: []
metrics:
  receivers:
    prometheus:
        type: prometheus
        config:
          scrape_configs:
            - job_name: 'sonic'
              scrape_interval: 30s
              scrape_timeout: 30s
              metrics_path: /debug/metrics/prometheus
              static_configs:
                - targets: ['localhost:6060']
  service:
    pipelines:
      prometheus_pipeline:
        receivers:
          - prometheus

Interesting metrics


p2p_peers - the amount of Opera nodes the node is connected to


rpc_success - the amount of successful RPC requests


rpc_failure - the amount of failed RPC requests


rpc_duration_all - the amount of time consumed by one RPC request (nanoseconds)


rpc_duration_${Operation}_success_count - the amount of requests of given RPC method


rpc_duration_${Operation}_failure_count - the amount of failed requests of given RPC method


txpool_slots - current amount of used memory slots (each pending or queued tx consumes one or more 32kB slots)


txpool_pending - current amount of pending txs (waiting to be included into the chain)


txpool_queued - current amount of queued txs (nonce out of order, waiting for previous tx of the account)


txpool_local - current amount of local txs (recieved from RPC, not from P2P, pending or queued)


txpool_reheap - time consumed by Reheap operation


txpool_valid - total amount of added valid txs (multiple txs of one account in one batch is counted as one tx) (???)


txpool_invalid - total amount of discarded invalid txs (invalid signature, underpriced, nonce too low, insufficient balance for value+gas*gasPrice)


txpool_underpriced - total amount of txs removed because underpricing (when adding into the pool, or when making space for a more valuable one)


txpool_overflowed - total amount of remote txs discarded, because failed to make a space for it (but they was not underpriced)


txpool_queued_discard - txs discarded when inserting into queue (tx for the sender+nonce already exists and insufficient price bump)


txpool_pending_discard - tx discarded when inserting into pending (tx for the sender+nonce already exists and insufficient price bump)


txpool_queued_replace - txs replaced using price bump in queue


txpool_pending_replace - txs replaced using price bump in pending


txpool_queued_ratelimit - txs dropped from queue because of rate limiting


txpool_pending_ratelimit - txs dropped from pending because of rate limiting


txpool_queued_nofunds - txs dropped from queue becase of insufficient sender balance


txpool_pending_nofunds - txs dropped from pending becase of insufficient sender balance


txpool_queued_eviction - txs dropped from queue because account inactive too long (lifetime exceeded)


go-opera-norma specific metrics:

chain_txs_processed - the total amout of txs in the chain (for on-chain txs/sec)
txpool_received - the total amount of txs added into the txpool (excluding invalid and already included ones)

Some details in Ethereum blog.
Tip: when --pprof is enabled, you can also use http://localhost:6060/debug/pprof/ where you can browse currently running gorutines or memory allocation.
Importing events with metrics

Metrics can be available also during events import:
opera --datadir /var/opera/mainnet --pprof --metrics --metrics.expensive import events ./exported-events-file

Exporting events first:
opera --datadir /var/opera/mainnet/ export events ./exported-events-file

Prometheus Retention

For prometheus installed using APT can be configured in config file:
sudo nano /etc/default/prometheus

ARGS="--storage.tsdb.retention.time=365d"