Configure webhook in Prometheus

Configure webhook endpoint in Prometheus so that Incident Response can communicate with Prometheus using the endpoint.

Before you begin

Ensure you have installed Prometheus. For information on how to install Prometheus, see https://prometheus.io/docs/prometheus/latest/installation/.

Role required: Responder, Manager, or Administrator

About this task

Configure Alertmanager in Prometheus to route alerts from Prometheus to Incident Response. Alerting rules in Prometheus server sends alerts to the Alertmanager. Configure webhook endpoint in Alertmanager so that Alertmanager can use the endpoint to communicate with  Incident Response.

Note: While this integration with a third-party product is supported, the documentation here is based upon information provided by that third-party. More current information about the operation of that third-party’s system may be available from them directly.

Procedure

  1. Open the command prompt.
  2. Open the alertmanager.yml file.
    You get this file when you download Prometheus in your system.
  3. In the alertmanager.yml file, configure the receiver as webhook.
  4. Use the webhook endpoint that you have generated as the URL for webhook configuration.
    Example:
    global:
     resolve_timeout: 5m
    
    route:
     group_by: ['alertname']
     group_wait: 10s
     group_interval: 10s
     repeat_interval: 1h
     receiver: 'web.hook'
    receivers:
    - name: 'web.hook'
     webhook_configs:
     - url: '<Lightstep prometheus webhook url>'
    inhibit_rules:
     - source_match:
       severity: 'critical'
      target_match:
       severity: 'warning'
      equal: ['alertname', 'dev', 'instance']
  5. Open the prometheus configuration file prometheus.yml.
  6. In the file, specify the Alertmanager instance where the Prometheus server sends alerts and then save the file.
    Example:
    global:
     scrape_interval:   15s # By default, scrape targets every 15 seconds.
     # Attach these labels to any time series or alerts when communicating with
     # external systems (federation, remote storage, Alertmanager).
     external_labels:
      monitor: 'codelab-monitor'
    rule_files:
     - 'prometheus.rules.yml'
    # Alerting specifies settings related to the Alertmanager
    alerting:
     alertmanagers:
      - static_configs:
       - targets:
        # Alertmanager's default port is xxxx
        - localhost:xxxx
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
     # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
     - job_name: 'prometheus'
      # Override the global default and scrape targets from this job every 5 seconds.
      scrape_interval: 5s
      static_configs:
       - targets: ['localhost:xxxx']
     - job_name:    'windows_lab'
      # Override the global default and scrape targets from this job every 5 seconds.
      scrape_interval: 5s
      static_configs:
       - targets: ['ip-xxx-xx-x-xxx.us-east-2.compute.internal:xxxx' ]
        labels:
         group: 'secopslab'
     - job_name:    'stack1'
      # Override the global default and scrape targets from this job every 5 seconds.
      scrape_interval: 5s
      metrics_path: '/actuator/prometheus'
      static_configs:
       - targets: [ 'stack1-web1.secops-eng.com:xxxx, 'stack1-app1.secops-eng.com:xxxx', 'stack1-app1.secops-eng.com:xxxx' ]
        labels:
            group: 'secopslab'
  7. Open the Prometheus rules file prometheus.rules.yml.
  8. In the file, create alerting rules to define alert conditions and then save the file.
    Example:
    groups:
    - name: AllInstances
     rules:
     - alert: cpuUsage
      # Condition for alerting
      expr: system_cpu_usage*100 > 1
      for: 1m
      # Annotation - additional informational labels to store more information
      annotations:
       title: 'Instance {{ $labels.instance }} cpu utilization more'
       description: '{{ $labels.instance }} of job {{ $labels.job }} has cpu utilization greater than 1percent for more than 1 minute.'
      # Labels - additional labels to be attached to the alert
      labels:
       severity: 'critical'
       metricName: 'CPU usage'
       resource: ''
       type: ''
    - name: cpu-node
     rules:
     - record: job_instance_mode:node_cpu_seconds:avg_rate5m
      expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))
  9. Start the Prometheus and the Alertmanager servers.
    The alerts get created in Incident Response when alert conditions are met.

What to do next

In the Prometheus console also, you can view or verify the alerts.
Alerts in the Prometheus console.