Monitoring end-devices on a network is an idea that is not novel, and many Network Monitoring System (NMS) solutions are available for traditional IT implementations. However, the deployment of an NMS to monitoring non-traditional Operational Technology (OT) devices such as those used to support power grids presents a number of unique challenges. Recently, West Monroe Partners was engaged to deliver such a solution to support the monitoring of OT devices for a central United States electric utility distribution company.
Operations equipment has traditionally been serially-connected which resulted in it lacking common security controls evident in today’s IT equipment such as access control, syslog and monitoring capabilities. Recently requirements have evolved to move these OT devices to TCP/IP connectivity, necessitating the need for these devices to be on par with those of IT devices, in regards to Ethernet, security, and rich SNMP capability. With government promotions and grants (at least in the U.S, such as the ARRA 2009) provoking development of smart grid infrastructure, many IT professionals and NOC administrators are investigating use cases for Network Monitoring Tools to provide visibility into the availability and performance of their TCP/IP connected OT devices. Many are noticing that the wide range of equipment adopted in OT infrastructures makes monitoring end devices a sometimes daunting task.
During a recent project with a power utility, West Monroe was asked to assist in the deployment of a common NMS to monitor the OT infrastructure. As part of the deployment, the OT group wanted to monitor devices such as, field Distribution Automation devices (i.e. a wireless mesh with 1000+ devices), serial-to-Ethernet converters, and cellular modems in order to gain knowledge of the health and status of the OT network. Monitoring of the OT devices had never been done within the environment before, so the WMP team and the client didn’t know what to expect in terms of alerts and status..
The power utility had deployed network segmentation to protect sensitive operational devices and data per NERC-CIP compliance. Given concerns around a non-NERC-CIP system monitoring NERC-CIP classified devices, IT security deemed that devices that were classified as NERC-CIP had to be removed from scope, and therefore not monitored. Workarounds such as remote pollers or deploying a separate instance were discussed, but neither was carried out as part of this deployment.
The largest challenged faced in delivering this solution resulted from a lack of publicized MIBs for the Distributions Automation devices. Queries from the team determined that the vendor had deliberately repressed Monitoring Information Base (MIB) information from public access in favor of limited SNMP polling to their proprietary monitoring application. Vendors looking to monetize the ability to monitor (through proprietary MIBs) might find themselves at a disadvantage for a utility that won’t select these solutions in favor of those they can integrate into existing NMS platforms and or common NMS platforms. This will become a more common scenario as utilities focus on the single view of their network health and capacity provided from a single common network monitoring platform..
These issues resulted in the project falling back to Up/Down monitoring of field DA devices via ICMP (ping) only. This protocol, unfortunately, does not give the rich statistics that SNMP Version 3 (SNMPv3) provides such as capacity, performance and health monitoring.
Overall, this deployment was not as easy as it would be in a traditional IT based IP network. Nearly all of the traditional devices were able to authenticate with the NMS on the first attempt, but some troubleshooting and configuration updates were required for all devices to successfully authenticate. Most obstacles faced when adding devices were related to the device’s lack of support for SNMP v3 and errors when configuring the SNMPv3 agent. Packet captures allowed the team to diagnose these errors quickly and is invaluable in a deployment of this nature. After entering the SNMPv3 protocol credentials in Wireshark, we examined the payload of the login handshake to determine the root cause of authentication failures. In this case, correcting issues tied to unique agent identifiers allowed us to move forward using SNMPv3.
Once the system was deployed and in operation, several insights were gained.
First, data collection is always a good thing. If nothing else, an NMS provides a repository of statistics that can validate event correlation when performing analysis. The team used this data when attempting to baseline alert thresholds for devices on the OT network. The ability of software to visually represent network information and dependencies allows for non-technical individuals to extract value from the system.
Secondly, a network monitoring deployment should be considered during the design phase of network planning and device selection process. Otherwise, compliance and integration related issues may be faced down the road that makes typical deployments infeasible, or at the least, more expensive. Said in a nutshell – NERC-CIP can complicate the deployment as can a lack of publically available monitoring information.
Thirdly, when working with a protocol that is unfamiliar, lab testing is essential to validate that configuration changes to a production environment will not impact connectivity.
Lastly, an NMS can provide more than just current performance details. The historical performance data and alert triggers are a wealth of information that, when combined into automated reports can quickly highlight abnormalities in the network that would be otherwise missed when viewing the current state.
The WMP team has extensive experience in deploying NMS solutions for IT and now OT networks. If you are considering or have experience deploying an NMS, especially in non-traditional network environments, or would like to have a specific question answered, please submit comment to start a discussion!