Monitoring Mikrotik with Grafana and Prometheus: A Complete Setup Guide
Network monitoring prevents costly downtime and performance issues. MikroTik devices power enterprise networks worldwide, but standard monitoring tools often lack the depth needed for proactive management.
Why Monitor Your MikroTik Infrastructure?
- Performance optimization: Identify bandwidth bottlenecks before they impact users
- Proactive issue detection: Spot hardware failures and configuration problems early
- Compliance monitoring: Track SLA metrics and generate automated reports
- Capacity planning: Make data-driven decisions about network upgrades
The Grafana + Prometheus Advantage for MikroTik
- Open-source solution: No licensing costs for monitoring infrastructure
- Real-time metrics: Sub-second data collection and visualization
- Enterprise scalability: Monitor hundreds of devices from a single platform
- Flexible alerting: Custom notifications via email, Slack, or webhook
What This Guide Covers
- Complete step-by-step setup for Ubuntu Server 22.04
- SNMP and API monitoring configuration
- Dashboard templates for common use cases
- Production security and performance optimization
- Troubleshooting guide for common problems
Table of Contents
- Prerequisites and Architecture Overview
- Preparing Your MikroTik Devices
- Installing Prometheus
- SNMP Exporter Setup
- Grafana Installation
- Creating MikroTik Dashboards
- Setting Up Alerting
- Advanced Monitoring Scenarios
- Troubleshooting Common Issues
- Production Deployment Best Practices
- Conclusion
Prerequisites and Architecture Overview
MikroTik Monitoring Architecture
The monitoring stack consists of four main components working together:
- MikroTik devices: RouterOS 6.x or 7.x with SNMP enabled
- Prometheus server: Collects and stores time-series metrics
- SNMP Exporter: Translates SNMP data to Prometheus format
- Grafana: Creates dashboards and manages alerts
System Requirements
- Operating system: Ubuntu Server 20.04 LTS or newer (22.04 recommended)
- Memory: 4GB RAM minimum, 8GB for production environments
- Storage: 50GB+ depending on retention period and device count
- Network: Direct connectivity to all monitored MikroTik devices
- CPU: 2+ cores recommended for multiple device monitoring
MikroTik RouterOS Compatibility
- RouterOS 6.x: Full SNMP v2c support, limited API functionality
- RouterOS 7.x: Enhanced SNMP features and improved API performance
- Performance considerations: Lower-end devices may need adjusted scrape intervals
Preparing Your MikroTik Devices for Monitoring
Enable SNMP on MikroTik RouterOS
Connect to your MikroTik device via SSH or Winbox and run these commands:
# Enable SNMP service
/snmp set enabled=yes
# Create read-only community
/snmp community add name=monitoring-ro addresses=10.0.0.0/8
# Verify SNMP configuration
/snmp print
/snmp community print
SNMP Security Best Practices
- Use specific IP ranges: Restrict community access to monitoring server subnets
- Avoid default communities: Never use “public” or “private” in production
- Generate complex strings: Use 16+ character random community names
- Regular rotation: Change community strings quarterly for security
Configure SNMPv3 for Enhanced Security
# Create SNMPv3 user with authentication
/snmp set enabled=yes
/snmp community add name="" security=private
# Add SNMPv3 user
/user add name=snmp-monitor group=read password=SecurePassword123
# Configure authentication and privacy
/snmp set auth-key=AuthKey123456 priv-key=PrivKey654321
Firewall Configuration for SNMP
Create firewall rules to allow SNMP access from your monitoring server:
# Allow SNMP from monitoring server
/ip firewall filter add chain=input protocol=udp dst-port=161 \
src-address=10.1.1.100 action=accept comment="SNMP Monitoring"
# Block all other SNMP traffic
/ip firewall filter add chain=input protocol=udp dst-port=161 \
action=drop comment="Block SNMP"
Key MikroTik SNMP OIDs for Monitoring
- System uptime: 1.3.6.1.2.1.1.3.0
- CPU usage: 1.3.6.1.4.1.14988.1.1.3.11.0
- Memory usage: 1.3.6.1.2.1.25.2.3.1.6
- Interface statistics: 1.3.6.1.2.1.2.2.1
- Temperature: 1.3.6.1.4.1.14988.1.1.3.10.0
- Voltage: 1.3.6.1.4.1.14988.1.1.3.8.0
Test SNMP Connectivity
From your monitoring server, verify SNMP access:
# Install SNMP tools
sudo apt update
sudo apt install snmp snmp-mibs-downloader
# Test SNMP connectivity
snmpwalk -v2c -c monitoring-ro 192.168.1.1 1.3.6.1.2.1.1.1.0
# Test MikroTik-specific OIDs
snmpget -v2c -c monitoring-ro 192.168.1.1 1.3.6.1.4.1.14988.1.1.3.11.0
Installing and Configuring Prometheus
Create Prometheus User and Directories
# Update system packages
sudo apt update && sudo apt upgrade -y
# Create prometheus system user
sudo groupadd --system prometheus
sudo useradd -s /sbin/nologin --system -g prometheus prometheus
# Create directory structure
sudo mkdir /var/lib/prometheus
sudo mkdir -p /etc/prometheus/{rules,rules.d,files_sd}
# Set permissions
sudo chown prometheus:prometheus /var/lib/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus
Download and Install Prometheus
# Download latest Prometheus release
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
# Extract and install
tar -xvf prometheus-2.45.0.linux-amd64.tar.gz
cd prometheus-2.45.0.linux-amd64
# Copy binaries
sudo cp prometheus /usr/local/bin/
sudo cp promtool /usr/local/bin/
# Set permissions
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
# Copy configuration files
sudo cp -r consoles /etc/prometheus
sudo cp -r console_libraries /etc/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
Create Prometheus Configuration File
sudo nano /etc/prometheus/prometheus.yml
Add this configuration:
global:
scrape_interval: 30s
evaluation_interval: 30s
external_labels:
monitor: 'mikrotik-monitor'
rule_files:
- "rules/*.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'mikrotik-snmp'
static_configs:
- targets:
- 192.168.1.1 # MikroTik device IP
- 192.168.1.2 # Add more devices as needed
metrics_path: /snmp
params:
module: [mikrotik]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9116 # SNMP exporter address
Create Prometheus System Service
sudo nano /etc/systemd/system/prometheus.service
Add this service configuration:
[Unit]
Description=Prometheus Monitoring System
Documentation=https://prometheus.io/docs/
After=network-online.target
Wants=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--storage.tsdb.retention.time=30d \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle \
--log.level=info
[Install]
WantedBy=multi-user.target
Start and Enable Prometheus
# Reload systemd
sudo systemctl daemon-reload
# Start Prometheus
sudo systemctl start prometheus
# Enable auto-start
sudo systemctl enable prometheus
# Check status
sudo systemctl status prometheus
# Test web interface
curl http://localhost:9090/metrics
SNMP Exporter Setup and Configuration
Download and Install SNMP Exporter
# Download SNMP exporter
cd /tmp
wget https://github.com/prometheus/snmp_exporter/releases/download/v0.24.1/snmp_exporter-0.24.1.linux-amd64.tar.gz
# Extract files
tar -xvf snmp_exporter-0.24.1.linux-amd64.tar.gz
cd snmp_exporter-0.24.1.linux-amd64
# Install binary
sudo cp snmp_exporter /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/snmp_exporter
# Create configuration directory
sudo mkdir /etc/snmp_exporter
sudo chown prometheus:prometheus /etc/snmp_exporter
Download MikroTik SNMP Configuration
# Download pre-built configuration for MikroTik
sudo wget -O /etc/snmp_exporter/snmp.yml \
https://raw.githubusercontent.com/prometheus/snmp_exporter/main/snmp.yml
# Set permissions
sudo chown prometheus:prometheus /etc/snmp_exporter/snmp.yml
Create Custom MikroTik SNMP Configuration
For advanced monitoring, create a custom configuration:
sudo nano /etc/snmp_exporter/snmp.yml
Add MikroTik-specific configuration:
mikrotik:
walk:
- 1.3.6.1.2.1.1 # System information
- 1.3.6.1.2.1.2.2.1 # Interface statistics
- 1.3.6.1.4.1.14988.1.1.3 # MikroTik system stats
- 1.3.6.1.2.1.25.2.3.1 # Memory usage
metrics:
- name: sysUpTime
oid: 1.3.6.1.2.1.1.3
type: gauge
help: System uptime in hundredths of a second
- name: mikrotikCpuUsage
oid: 1.3.6.1.4.1.14988.1.1.3.11
type: gauge
help: CPU usage percentage
- name: mikrotikTemperature
oid: 1.3.6.1.4.1.14988.1.1.3.10
type: gauge
help: System temperature in Celsius
- name: mikrotikVoltage
oid: 1.3.6.1.4.1.14988.1.1.3.8
type: gauge
help: System voltage
version: 2
auth:
community: monitoring-ro
Create SNMP Exporter Service
sudo nano /etc/systemd/system/snmp_exporter.service
Add service configuration:
[Unit]
Description=SNMP Exporter
Documentation=https://github.com/prometheus/snmp_exporter
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/snmp_exporter \
--config.file=/etc/snmp_exporter/snmp.yml \
--web.listen-address=0.0.0.0:9116 \
--log.level=info
[Install]
WantedBy=multi-user.target
Start SNMP Exporter Service
# Start and enable service
sudo systemctl daemon-reload
sudo systemctl start snmp_exporter
sudo systemctl enable snmp_exporter
# Verify operation
sudo systemctl status snmp_exporter
# Test SNMP exporter
curl "http://localhost:9116/snmp?target=192.168.1.1&module=mikrotik"
Grafana Installation and Configuration
Install Grafana from Official Repository
# Install prerequisites
sudo apt install -y software-properties-common
# Add Grafana GPG key
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
# Add Grafana repository
echo "deb https://packages.grafana.com/oss/deb stable main" | \
sudo tee /etc/apt/sources.list.d/grafana.list
# Update package list
sudo apt update
# Install Grafana
sudo apt install grafana
# Start and enable Grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
Configure Grafana Security Settings
# Edit Grafana configuration
sudo nano /etc/grafana/grafana.ini
Update these settings for security:
[server]
http_port = 3000
domain = your-domain.com
root_url = https://your-domain.com/grafana
[security]
admin_user = admin
admin_password = YourSecurePassword123
secret_key = YourSecretKey456789
disable_gravatar = true
[auth.anonymous]
enabled = false
[users]
allow_sign_up = false
allow_org_create = false
[auth]
disable_login_form = false
disable_signout_menu = false
Configure Grafana Data Sources
- Open browser and navigate to http://your-server:3000
- Login with admin credentials you configured
- Go to Configuration > Data Sources
- Click “Add data source”
- Select “Prometheus”
- Configure settings:
- URL: http://localhost:9090
- Access: Server (default)
- Scrape interval: 30s
- Query timeout: 60s
Test Prometheus Connection
Click “Save & Test” to verify the connection. You should see “Data source is working” message.
Creating Comprehensive MikroTik Dashboards
Network Overview Dashboard
Create a high-level dashboard showing network health:
Key Panels to Include:
- Device Status Panel: Shows online/offline status for all devices
- Total Bandwidth Usage: Aggregate traffic across all interfaces
- Critical Alerts Summary: Current alerts requiring attention
- System Uptime: Device availability over time
Sample Queries:
# Device uptime
up{job="mikrotik-snmp"}
# Interface bandwidth utilization
rate(ifHCInOctets[5m]) * 8
# CPU usage across devices
avg by (instance) (mikrotikCpuUsage)
Interface Performance Dashboard
Focus on network interface metrics and performance:
Essential Interface Metrics:
- Bandwidth utilization graphs: In/out traffic with historical trends
- Packet rate monitoring: PPS (packets per second) statistics
- Error rate analysis: Input/output errors and discards
- Interface status: Up/down status with change notifications
Interface Performance Queries:
# Interface input bytes rate
rate(ifHCInOctets[5m]) * 8
# Interface output bytes rate
rate(ifHCOutOctets[5m]) * 8
# Interface utilization percentage
(rate(ifHCInOctets[5m]) * 8) / ifHighSpeed * 100
# Interface error rate
rate(ifInErrors[5m]) + rate(ifOutErrors[5m])
System Resources Dashboard
Monitor hardware and system performance metrics:
System Health Panels:
- CPU utilization trends: Historical CPU usage with peak identification
- Memory usage monitoring: RAM utilization and available memory
- Temperature monitoring: Hardware temperature with critical thresholds
- Storage utilization: Disk space usage and growth trends
System Resource Queries:
# CPU usage percentage
mikrotikCpuUsage
# Memory utilization
(hrStorageUsed / hrStorageSize) * 100
# System temperature
mikrotikTemperature
# System voltage
mikrotikVoltage
# Free memory
hrMemorySize - hrMemoryUsed
Wireless Performance Dashboard
For wireless-enabled MikroTik devices:
Wireless-Specific Metrics:
- Client connection statistics: Connected clients and session duration
- Signal strength monitoring: RSSI values and signal quality
- Channel utilization: RF spectrum usage and interference
- Throughput analysis: Wireless bandwidth utilization per client
Dashboard Configuration Best Practices
- Use consistent time ranges: Set default time range to 1 hour with 6-hour and 24-hour options
- Implement proper thresholds: Red for critical, yellow for warning, green for normal
- Group related metrics: Organize panels logically by function or device type
- Add contextual information: Include device names, locations, and purposes
- Enable auto-refresh: Set 30-second refresh intervals for real-time monitoring
Variable Configuration for Dynamic Dashboards
Create template variables for flexible dashboard views:
- Device selection: Allow filtering by specific MikroTik devices
- Interface filtering: Show specific interfaces or interface types
- Time range variables: Quick selection of common time periods
- Location grouping: Filter devices by physical location
Setting Up Alerting and Notifications
Create Prometheus Alerting Rules
sudo nano /etc/prometheus/rules/mikrotik.yml
Add essential alerting rules:
groups:
- name: mikrotik.rules
rules:
- alert: MikroTikDeviceDown
expr: up{job="mikrotik-snmp"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "MikroTik device {{ $labels.instance }} is down"
description: "Device has been unreachable for more than 2 minutes"
- alert: HighCPUUsage
expr: mikrotikCpuUsage > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is {{ $value }}% for more than 5 minutes"
- alert: HighMemoryUsage
expr: (hrStorageUsed / hrStorageSize * 100) > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is {{ $value }}% for more than 5 minutes"
- alert: InterfaceDown
expr: ifOperStatus == 2
for: 1m
labels:
severity: warning
annotations:
summary: "Interface down on {{ $labels.instance }}"
description: "Interface {{ $labels.ifDescr }} has been down for more than 1 minute"
- alert: HighTemperature
expr: mikrotikTemperature > 70
for: 3m
labels:
severity: critical
annotations:
summary: "High temperature on {{ $labels.instance }}"
description: "Device temperature is {{ $value }}°C for more than 3 minutes"
- alert: BandwidthUtilizationHigh
expr: (rate(ifHCInOctets[5m]) * 8 / ifHighSpeed * 100) > 90
for: 5m
labels:
severity: warning
annotations:
summary: "High bandwidth utilization on {{ $labels.instance }}"
description: "Interface {{ $labels.ifDescr }} utilization is {{ $value }}%"
Install and Configure AlertManager
# Download AlertManager
cd /tmp
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
# Extract and install
tar -xvf alertmanager-0.26.0.linux-amd64.tar.gz
cd alertmanager-0.26.0.linux-amd64
# Copy binary
sudo cp alertmanager /usr/local/bin/
sudo cp amtool /usr/local/bin/
# Set permissions
sudo chown prometheus:prometheus /usr/local/bin/alertmanager
sudo chown prometheus:prometheus /usr/local/bin/amtool
# Create configuration directory
sudo mkdir /etc/alertmanager
sudo chown prometheus:prometheus /etc/alertmanager
Configure AlertManager
sudo nano /etc/alertmanager/alertmanager.yml
Add notification configuration:
global:
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'monitoring@yourcompany.com'
smtp_auth_username: 'monitoring@yourcompany.com'
smtp_auth_password: 'your-app-password'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
routes:
- match:
severity: critical
receiver: 'critical-alerts'
- match:
severity: warning
receiver: 'warning-alerts'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/'
- name: 'critical-alerts'
email_configs:
- to: 'admin@yourcompany.com'
subject: 'CRITICAL: MikroTik Alert - {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Instance: {{ .Labels.instance }}
Severity: {{ .Labels.severity }}
{{ end }}
- name: 'warning-alerts'
email_configs:
- to: 'network-team@yourcompany.com'
subject: 'WARNING: MikroTik Alert - {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
Instance: {{ .Labels.instance }}
{{ end }}
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
Create AlertManager Service
sudo nano /etc/systemd/system/alertmanager.service
Add service configuration:
[Unit]
Description=Alertmanager
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/alertmanager \
--config.file=/etc/alertmanager/alertmanager.yml \
--storage.path=/var/lib/alertmanager/ \
--web.external-url=http://localhost:9093
[Install]
WantedBy=multi-user.target
Start AlertManager
# Create data directory
sudo mkdir /var/lib/alertmanager
sudo chown prometheus:prometheus /var/lib/alertmanager
# Start and enable service
sudo systemctl daemon-reload
sudo systemctl start alertmanager
sudo systemctl enable alertmanager
# Verify operation
sudo systemctl status alertmanager
Configure Slack Notifications
For Slack integration, add this receiver to AlertManager configuration:
- name: 'slack-alerts'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
channel: '#network-alerts'
title: 'MikroTik Alert: {{ .GroupLabels.alertname }}'
text: |
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }}
*Description:* {{ .Annotations.description }}
*Instance:* {{ .Labels.instance }}
*Severity:* {{ .Labels.severity }}
{{ end }}
Advanced Monitoring Scenarios
Multi-Site MikroTik Monitoring
Configure monitoring across multiple locations with centralized visibility:
Federated Prometheus Setup
- Central Prometheus server: Aggregates metrics from remote sites
- Site-specific exporters: Deploy SNMP exporters at each location
- VPN connectivity: Secure tunnels for remote monitoring access
- Hierarchical dashboards: Site overview and drill-down capabilities
Federation Configuration
# Add to central Prometheus configuration
scrape_configs:
- job_name: 'federate'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="mikrotik-snmp"}'
- '{__name__=~"job:.*"}'
static_configs:
- targets:
- 'site1-prometheus:9090'
- 'site2-prometheus:9090'
- 'site3-prometheus:9090'
High-Availability Monitoring Setup
Ensure monitoring system resilience and eliminate single points of failure:
Prometheus High Availability
- Identical Prometheus instances: Run multiple servers with same configuration
- External storage: Use remote storage solutions for data persistence
- Load balancer: Distribute queries across Prometheus instances
- AlertManager clustering: Prevent duplicate alert notifications
Grafana High Availability
# Install and configure MySQL/PostgreSQL backend
sudo apt install mysql-server
# Update Grafana configuration for external database
[database]
type = mysql
host = localhost:3306
name = grafana
user = grafana
password = secure_password
# Enable clustering
[remote_cache]
type = redis
connstr = addr=localhost:6379
Custom Metrics and Exporters
Extend monitoring beyond standard SNMP metrics:
Custom Script Exporter
#!/bin/bash
# Custom MikroTik monitoring script
# /opt/monitoring/mikrotik_custom.sh
DEVICE_IP=$1
COMMUNITY=$2
# Check VPN tunnel status
VPN_STATUS=$(snmpget -v2c -c $COMMUNITY -Oqv $DEVICE_IP 1.3.6.1.4.1.14988.1.1.1.2.1.4.1)
# Check DHCP pool utilization
DHCP_USED=$(snmpget -v2c -c $COMMUNITY -Oqv $DEVICE_IP 1.3.6.1.4.1.14988.1.1.6.1.1.6.1)
DHCP_TOTAL=$(snmpget -v2c -c $COMMUNITY -Oqv $DEVICE_IP 1.3.6.1.4.1.14988.1.1.6.1.1.7.1)
# Output Prometheus metrics
echo "mikrotik_vpn_status{device=\"$DEVICE_IP\"} $VPN_STATUS"
echo "mikrotik_dhcp_used{device=\"$DEVICE_IP\"} $DHCP_USED"
echo "mikrotik_dhcp_total{device=\"$DEVICE_IP\"} $DHCP_TOTAL"
Node Exporter Integration
Monitor the monitoring server itself:
# Install Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar -xvf node_exporter-1.6.1.linux-amd64.tar.gz
sudo cp node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
# Create service file
sudo nano /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
# Start service
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter
API-Based Monitoring Alternative
Use MikroTik API for enhanced monitoring capabilities:
Python API Monitoring Script
#!/usr/bin/env python3
# MikroTik API monitoring script
import librouteros
from prometheus_client import start_http_server, Gauge
import time
# Prometheus metrics
cpu_usage = Gauge('mikrotik_cpu_usage', 'CPU Usage', ['device'])
memory_usage = Gauge('mikrotik_memory_usage', 'Memory Usage', ['device'])
interface_rx = Gauge('mikrotik_interface_rx_bytes', 'Interface RX bytes', ['device', 'interface'])
def collect_metrics():
try:
# Connect to MikroTik
api = librouteros.connect(host='192.168.1.1',
username='monitoring',
password='monitoring_password')
# Get system resources
resources = api('/system/resource/print')
cpu_usage.labels(device='192.168.1.1').set(resources[0]['cpu-load'])
# Get interface statistics
interfaces = api('/interface/print', stats=True)
for interface in interfaces:
interface_rx.labels(
device='192.168.1.1',
interface=interface['name']
).set(interface['rx-byte'])
except Exception as e:
print(f"Error collecting metrics: {e}")
if __name__ == '__main__':
start_http_server(8000)
while True:
collect_metrics()
time.sleep(30)
Troubleshooting Common Issues
SNMP Connectivity Problems
Problem: SNMP timeout errors
- Check network connectivity: Use ping and traceroute to verify path
- Verify SNMP service: Confirm SNMP is enabled on MikroTik device
- Test community string: Use snmpwalk to validate credentials
- Check firewall rules: Ensure UDP port 161 is accessible
Diagnostic Commands:
# Test basic connectivity
ping 192.168.1.1
# Test SNMP connectivity
snmpget -v2c -c monitoring-ro 192.168.1.1 1.3.6.1.2.1.1.1.0
# Check SNMP exporter logs
sudo journalctl -u snmp_exporter -f
# Verify Prometheus targets
curl http://localhost:9090/api/v1/targets
Problem: Incorrect or missing metrics
- Verify OID support: Check if device supports specific MIB objects
- Update SNMP configuration: Ensure correct module configuration
- Check device firmware: Some OIDs require specific RouterOS versions
- Validate MIB files: Ensure proper MIB compilation and loading
Performance and Scaling Issues
Problem: High memory usage on Prometheus server
- Adjust retention period: Reduce data retention from default 15 days
- Optimize scrape intervals: Increase intervals for less critical metrics
- Use recording rules: Pre-calculate common queries
- Implement metric filtering: Drop unnecessary metrics at ingestion
Memory Optimization Configuration:
# Update Prometheus configuration
global:
scrape_interval: 60s # Increased from 30s
evaluation_interval: 60s
# Add retention settings
--storage.tsdb.retention.time=7d
--storage.tsdb.retention.size=10GB
Problem: Slow dashboard loading times
- Optimize Prometheus queries: Use efficient PromQL expressions
- Implement query caching: Enable Grafana query result caching
- Reduce data resolution: Use lower resolution for long time ranges
- Limit concurrent queries: Set appropriate query concurrency limits
Query Optimization Examples:
# Inefficient query
sum(rate(ifHCInOctets[5m])) by (instance)
# Optimized query with recording rule
mikrotik:interface_bandwidth_total
# Recording rule definition
- record: mikrotik:interface_bandwidth_total
expr: sum(rate(ifHCInOctets[5m])) by (instance)
Dashboard and Visualization Problems
Problem: Missing or incorrect data in panels
- Verify data source connection: Test Prometheus connectivity in Grafana
- Check query syntax: Validate PromQL expressions in Prometheus UI
- Confirm metric existence: Search for metrics in Prometheus graph interface
- Review time ranges: Ensure appropriate time windows for data availability
Problem: Template variable issues
- Check variable queries: Ensure queries return expected values
- Validate variable usage: Confirm proper variable syntax in panels
- Review dependencies: Check variable dependency chains
- Clear browser cache: Resolve cached variable value issues
Alerting and Notification Issues
Problem: Alerts not firing
- Check alert rule syntax: Validate PromQL expressions in alert rules
- Verify evaluation intervals: Ensure rules are evaluated regularly
- Review alert conditions: Confirm thresholds and duration settings
- Check AlertManager connectivity: Test communication between components
Problem: Duplicate or missing notifications
- Review routing rules: Check AlertManager routing configuration
- Verify receiver configuration: Confirm notification channel setup
- Check grouping settings: Ensure appropriate alert grouping
- Review inhibition rules: Verify alert suppression logic
Production Deployment Best Practices
Security Hardening Checklist
Network Security
- Implement network segmentation: Isolate monitoring infrastructure
- Use VPN connections: Secure communication for remote sites
- Configure firewall rules: Restrict access to monitoring ports
- Enable SSL/TLS: Encrypt web interface communications
Authentication and Authorization
- Change default passwords: Use complex, unique passwords for all services
- Implement RBAC: Role-based access control for Grafana users
- Enable audit logging: Track user actions and system changes
- Regular access reviews: Quarterly review of user permissions
SNMP Security Measures
# SNMPv3 configuration for enhanced security
/snmp set enabled=yes
/user add name=snmpv3-user group=read password=ComplexPassword123
/snmp set engine-id=80:00:00:00:01:02:03:04:05
Performance Optimization Guidelines
Resource Allocation
- CPU allocation: 2-4 cores for monitoring up to 100 devices
- Memory requirements: 8GB+ RAM for production environments
- Storage planning: 1GB per device per month for standard metrics
- Network bandwidth: Factor in SNMP polling and dashboard access
Monitoring Configuration Optimization
# Optimized scrape intervals by priority
scrape_configs:
- job_name: 'critical-devices'
scrape_interval: 30s
static_configs:
- targets: ['core-router-1', 'core-router-2']
- job_name: 'standard-devices'
scrape_interval: 60s
static_configs:
- targets: ['access-switch-1', 'access-switch-2']
- job_name: 'edge-devices'
scrape_interval: 300s
static_configs:
- targets: ['remote-ap-1', 'remote-ap-2']
Backup and Recovery Procedures
Automated Backup Script
#!/bin/bash
# /opt/monitoring/backup.sh
BACKUP_DIR="/opt/backups/monitoring"
DATE=$(date +%Y%m%d_%H%M%S)
# Create backup directory
mkdir -p $BACKUP_DIR/$DATE
# Backup Prometheus data
sudo systemctl stop prometheus
tar -czf $BACKUP_DIR/$DATE/prometheus_data.tar.gz /var/lib/prometheus/
sudo systemctl start prometheus
# Backup configurations
tar -czf $BACKUP_DIR/$DATE/configs.tar.gz \
/etc/prometheus/ \
/etc/grafana/ \
/etc/alertmanager/
# Backup Grafana database
sudo -u grafana grafana-cli admin export-dashboard \
--output-dir $BACKUP_DIR/$DATE/grafana_dashboards/
# Clean old backups (keep last 30 days)
find $BACKUP_DIR -type d -mtime +30 -exec rm -rf {} \;
echo "Backup completed: $BACKUP_DIR/$DATE"
Recovery Testing
- Monthly recovery tests: Validate backup integrity and restore procedures
- Document procedures: Maintain detailed recovery runbooks
- Test different scenarios: Complete failure, partial corruption, configuration loss
- Measure recovery times: Track RTO (Recovery Time Objective) metrics
Maintenance and Operations
Regular Health Checks
#!/bin/bash
# /opt/monitoring/health_check.sh
# Check Prometheus health
if ! curl -s http://localhost:9090/-/healthy > /dev/null; then
echo "ERROR: Prometheus unhealthy"
fi
# Check Grafana health
if ! curl -s http://localhost:3000/api/health > /dev/null; then
echo "ERROR: Grafana unhealthy"
fi
# Check SNMP exporter
if ! curl -s http://localhost:9116/metrics > /dev/null; then
echo "ERROR: SNMP exporter unhealthy"
fi
# Check disk space
DISK_USAGE=$(df /var/lib/prometheus | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 80 ]; then
echo "WARNING: High disk usage: $DISK_USAGE%"
fi
echo "Health check completed"
Update Procedures
- Test updates in staging: Validate new versions before production deployment
- Schedule maintenance windows: Plan updates during low-traffic periods
- Create rollback plans: Document procedures for reverting changes
- Monitor post-update: Verify functionality after applying updates
Conclusion
Key Benefits Achieved
This comprehensive monitoring setup provides enterprise-grade visibility into your MikroTik infrastructure:
- Proactive monitoring: Identify issues before they impact users
- Historical analysis: Track performance trends and capacity planning
- Automated alerting: Immediate notification of critical issues
- Centralized visibility: Single dashboard for entire network infrastructure
- Cost-effective solution: Open-source tools with enterprise features
Scaling Your Monitoring Infrastructure
As your network grows, this monitoring foundation scales effectively:
- Add new devices: Simple configuration updates to monitor additional MikroTik devices
- Expand metrics: Custom exporters for specialized monitoring requirements
- Integrate systems: Connect with existing network management tools
- Advanced analytics: Machine learning integration for predictive monitoring
Next Steps for Advanced Implementation
- Implement automated remediation: Scripts triggered by specific alert conditions
- Deploy configuration management: Ansible or Terraform for infrastructure as code
- Add performance baselines: Statistical analysis for anomaly detection
- Integrate with ticketing systems: Automatic incident creation for critical alerts
Community Resources
- Prometheus documentation: https://prometheus.io/docs/
- Grafana community dashboards: https://grafana.com/grafana/dashboards/
This monitoring solution transforms network operations from reactive troubleshooting to proactive management. The combination of Prometheus metrics collection, Grafana visualization, and comprehensive alerting provides the foundation for reliable, high-performance network operations.
Regular maintenance, security updates, and continuous improvement ensure your monitoring infrastructure remains effective as your network evolves. The investment in proper monitoring pays dividends through reduced downtime, improved performance, and enhanced user satisfaction.
Check our list of MikroTik guides.