The Lighthouse: The challenges of Operational Technology cyber security
By Lew Folkerth, Principal Reliability Consultant, External Affairs
In this recurring column, I explore various questions and concerns related to the NERC Critical Infrastructure Protection (CIP) Standards. I share my views and opinions with you, which are not binding. Rather, this information is intended to provoke discussion within your entity. It may also help you and your entity as you strive to improve your compliance posture and work toward continuous improvement in the reliability, security, resilience and sustainability of your CIP compliance programs. There are times that I also may discuss areas of the standards that other entities may be struggling with and share my ideas to overcome their known issues. As with lighthouses, I can’t steer your ship for you, but perhaps I can help shed light on the sometimes-stormy waters of CIP compliance.
What is Operational Technology?
Operational Technology (OT) is the general term for the computers that control the world around us.
Although you may not realize it, you use OT many times a day. When you press the accelerator in your car, you send a command to the car’s OT computer to make the engine deliver more power. When you drive through an intersection on a green light, you’re trusting an OT system with your safety. When you press the “Up” button on an elevator, you send a command to the OT system that controls the elevator.
Our modern civilization relies on OT to keep essential services working. The electric grid, pipelines, water treatment plants, transportation systems, and many more all depend on OT to deliver reliable services.
That’s because OT is the computer hardware and software used in the real-time control and monitoring of physical processes. OT is also called Industrial Control Systems (ICS), Supervisory Control and Data Acquisition (SCADA), Energy Management Systems (EMS), Distributed Control Systems (DCS), Programmable Logic Controllers (PLC), and many other names. These systems come in many varieties, sizes, and purposes. From a control center with multiple staffed consoles and mapboards, to individual devices controlling physical processes, all these systems are considered to be OT systems.
IT and OT differences
How is OT different from Information Technology or “IT”? You also likely use IT every day. Documents, spreadsheets, presentations, email, messaging – the lifeblood of any office environment – are examples of IT.
While there are many commonalities between IT and OT, there are also some substantial differences. IT is, as its name implies, primarily about information. OT, on the other hand, is about control. Whether it’s your household thermostat controlling the temperature of your house, or a ground fault relay protecting an electric transmission line, OT systems control the world around us.
IT generally emphasizes confidentiality. Information has value, and many times its value depends on who has the information. Business plans, pricing strategies and equipment designs are more valuable when available only to those who need to know them.
OT generally emphasizes availability. A 30-minute outage of an email system during a business day might go unnoticed, but a 30-minute outage of a traffic light during rush hour will cause chaos, and a 30-minute outage of an EMS is a reportable event.
IT tends to have a relatively dynamic environment. Laptops change, servers get updated, new versions of software with new features are common.
OT tends to be more static. Think about your household thermostat. How often do you get a new one? In the same way, OT systems may be in service for 10 to 20 years without a significant change.
While IT tends to deal with servers and workstations, such as your company laptop, OT deals with sensors and actuators and the control systems that drive them. For example, the badge reader on your building senses when you swipe your badge and sends the information on the badge to the control system for that door. The control system verifies that you are authorized to enter the building and sends a door open command to the lock on the door.
Challenges to OT cyber security
Getting the basics right
To update an OT system, or even work on it at all, you may need an outage window, which is a planned time when changes can be made without affecting the operation of the asset the OT system controls. Some OT systems are essential to the operation of the asset they support. For example, a steam generating plant needs a boiler feed pump to keep the boiler supplied with water within very narrow parameters. Any outage of the boiler feed pump’s control system could force the plant into emergency shutdown. Any work on the control system would likely need to be done at a scheduled outage of the plant, which can sometimes take months to occur.
Many control centers are intolerant of an outage. Methods to update control center assets need to take this into account. One technique is to make an update on an offline redundant system, then switch over to that system to run the control center. Some control centers have periods of time during which no changes to OT systems are permitted, such as an electric transmission control center during periods of high load.
Operating personnel in a control center, in general, may be resistant to reboots, upgrades, or other outages of their systems. These events need to be planned out and scheduled carefully.
Another challenge to cyber security basics is the widely different types, brands, and ages of OT equipment that are possible at a site. These devices may have varying capabilities and security concerns. This diversity can be caused by many factors, including vendor requirements for a particular capability, or equipment installations performed at various times using equipment available at the time of the installation. This wide variety of equipment, perhaps even at a single site, can cause issues with cyber security updates and configuration. Some devices may do a good job at being secure, others may need supporting equipment to maintain security. Equipment diversity can present training issues and can make incident response more complicated.
Hazardous locations
Safety must always be the first priority of any work performed.
OT systems can be located in hazardous environments. Chemical plants, refineries, electric substations, and many others pose hazards to personnel working at those sites. Requirements for personal protective equipment (PPE) at each site must be clearly understood and observed. This may include hard hats, safety glasses, hearing protection, gloves, flame retardant clothing, boots or shoes with dielectric soles, and other equipment based on the site in question.
Hazardous locations may require safety-specific training before entry is granted. These sites may also require clearance before entry is permitted.
Remote locations
Physical locations may have widely differing ways of controlling physical access. Some locations may have card access systems. Others may use physical keys, combination locks, or other methods of controlling entry. Some locations may have multiple layers of physical access control, such as a metal lock on a perimeter fence and a card access system on the structure housing the control systems. The staff performing updates or maintenance on OT systems at these locations will need to be granted access to all layers of physical security needed to access these systems.
Many remote locations are not staffed, being controlled only by the OT assets at the site. These sites may need to be visited by two-person teams for safety reasons. Sites that are located in remote areas or high crime areas might need even more than two people.
Third-party access
Access to OT assets by third parties brings with it a significant cyber security risk. Unauthorized or unsupervised physical access can permit unintended changes to OT equipment such as modified switch settings.
Similarly, unauthorized electronic access can have severe repercussions in security and reliability.
Both physical and electronic access to OT systems should be closely controlled and monitored. Vendors or other third parties may need access in order to perform contracted maintenance or other services, but such access should be tightly controlled and supervised.
Migration to cloud services
Service providers are turning, seemingly en masse, to cloud-provisioned systems. Services such as work management systems and network security monitoring are examples of this trend. In the future, we may find that even some control systems may make the migration to the cloud to enhance resilience and reliability. This has been shown at electric companies in Ukraine, where the war has disrupted the physical control centers of some electric utilities but did not disrupt operations due to backup systems in the cloud.
Static environment
One of the characteristics of OT systems is that they may have very long lifetimes compared to IT equipment. It is not uncommon to see equipment in service that is no longer supported by its vendor, or where the vendor no longer exists. This means security personnel may need to deal with a lack of security patches, and the unavailability of replacement parts or hardware when failures occur.
Field device communications
Communications to field devices can pose a difficult security problem. Many, especially older, OT devices lack robust user authentication. This forces access control at the network level rather than the user level. Because of this and other factors, OT devices should not be directly exposed to the internet.
Within control system networks, network encryption can cause performance issues, and encrypted networks make network monitoring difficult. Encryption can also introduce latency and other timing concerns not present in unencrypted networks. These factors may make it difficult to protect OT networks by encrypting traffic.
Challenges common to IT and OT
OT faces a number of challenges along with IT.
Artificial Intelligence (AI) presents both opportunities and risks. AI can greatly increase human productivity but must be managed carefully. AI can also be used by an attacker to develop more sophisticated attacks against an organization.
Quantum computing has entered a stage of rapid growth and development. When a sufficiently capable quantum computer is built, it will be able to break the encryption of many of today’s computers. Replacement cryptographic techniques are in development and should be deployed when they become available.
As more OT systems take advantage of cloud computing, cyber security of cloud processes and environments will be needed. See this article for more details: The emerging risk of NOT using cloud services.
Some OT cyber security standards require internal network security monitoring to be able to detect, respond and recover from an intrusion. Internal network monitoring is different from the way most monitoring has been done in the past. Internal monitoring focuses on our internal networks as opposed to monitoring at edge devices such as firewalls.
Counterfeit equipment and malicious software updates are some of the risks addressed by supply chain security. OT systems can be severely impacted by these types of attacks, as they are difficult to detect and replacing equipment is more challenging.
Further study
For additional information about cyber security of OT assets, see these links:
CISA: Defending OT Operations Against Ongoing Pro-Russia Hacktivist Activity
NARUC: Cybersecurity Baselines for Electric Distribution Systems and DER
My Lighthouse articles in the RF Resource Center.
Requests for Assistance
If you are an entity registered within the RF Region and believe you need assistance in sorting your way through this or any compliance related issue, remember RF has the Assist Visit program. Submit an Assist Visit Request via the RF website here.
Feedback
Please provide any feedback you may have on these articles. I may be reached at lew.folkerth@rfirst.org.