What is IT Monitoring?
IT monitoring is the process of gathering metrics about the operations of an IT environment’s hardware and software to ensure that the equipment is available and performing to the level expected and necessary to maintain your business. This is achieved by using trends to validate infrastructure updates before applications or services are affected and using real-time alerting to allow administrators to respond immediately to potential problems. IT monitoring follows a hierarchical philosophy based on three fundamental layers:
Foundation layer. The foundation layer of system monitoring monitors the physical and/or virtual devices in the system. These are often called ‘hosts’ and could include Windows servers, Linux servers, Cisco routers, Nokia firewall, or VMware virtual machines. In this layer information about the hosts is gathered using a combination of agents, application program interfaces (APIs), or other standardized communication protocols that access data from hardware and software1. Pinging the hosts ensures that they are up, so that they can be monitored.
IT monitoring layer. Once it is confirmed that a host is up, the next layer of network monitoring entails monitoring items running on the host. On Linux servers these items could be swap space, CPU usage, or service running. On Windows servers these items could be memory usage, C:/ space, or CPU usage. On VMware virtual machines these items could be datastore free, temperature checks, or the number of VMs. These items are referred to as service checks. Here the raw data is processed and analyzed through monitoring software where trends are established and alarms are generated.
Interface layer. After monitoring the hosts and the items running on the hosts, the interface layer is where the accumulated and analyzed data is interpreted through graphs, charts, and dashboards. Reports can be generated that to show the historical health of monitored services, technical data, downtime analyses, performance information, and more.
Best practices for IT monitoring include:
Focusing on infrastructure and apps. While there are ample data points that can be collected and analyzed throughout the system, the metrics of most importance to IT administrators are those related to infrastructure and application performance. Maintain a clear perspective on what it is that needs active monitoring by limiting distractions caused by an influx of superfluous data.
Configure alerts judiciously. Carefully curate alerts so that they go directly to those who need to know and can take quick action, such as IT administrators. Generally, IT should receive an alert about a problem before a supervisor or a customer. Be certain to only configure alerts that are necessary- such as ones that IT can take action on. Receiving an overwhelming amount of needless alerts could cause distractions that delay action on critical alerts.
1 Bigelow, 2020, “The definitive guide to enterprise IT monitoring”