The move toward digital transformation throughout industry is clearly important. Yet, except in the IT departments of many plants and facilities, there still seems to be enormous misunderstanding as to which physical assets and systems at sites should be considered cyber-related and, thus, undergo appropriate security checks and steps.
The impacts of cyber-holes in today’s operations aren’t minor. They include loss of critical systems, damaged systems, and potential safety hazards, all of which are a concern for RAM professionals. It’s noteworthy, however, that implementation of successful cybersecurity programs is not too-far removed from the implementation of reliability programs that already exist at some sites. Accordingly, the ever-increasing deployment of smart, “connected” products throughout an enterprise should further justify the application of RAM (reliability, availability, and maintenance) tools such as Reliability Centered Maintenance (RCM). This article puts these issues in context.
WHAT YOU DON’T KNOW CAN HURT YOUR OPERATIONS
The implementation of cyber-information and cyber-physical systems (CIS and CPS), often referred to as “IoT” and “IIoT” (“Internet of Things” and “Industrial Internet of Things,” respectively) has the potential to improve overall operations, production, service, maintenance, and other aspects of industry with relatively low expense. IoT sensors and devices are critical to Machine Learning (ML) and Augmented Intelligence (AI), as well for enabling remote operation and monitoring. Unfortunately, any devices, pieces of equipment, and systems with IP and MAC addresses have the potential to be exploited deliberately or by accident unless appropriate steps are taken. That means just about everything in a plant is at risk.
Consider this: Even energy-monitoring equipment, variable frequency drives (VFDs), and safety devices have IP and MAC addresses that can open cyber-holes into an operation if they aren’t set up correctly. And many aren’t. These are just a few examples of the countless pieces of plant assets that can allow access to a site’s system. Couple this situation with vendors who either neglect to mention the seriousness of cyber risks or, perhaps, describe how to avoid implementation of security protocols, and you have a recipe for disaster.
Simply put, purchased, devices, components, equipment, and systems aren’t always as secure as end users might expect. In fact, some products have been found to deliberately deliver information to hostile organizations or governments that, in turn, sell or share it with others. An example of this included popular “smart-bulbs,” of which one brand was discovered to share Wi-Fi and related passwords with hackers.
While many are calling for certification programs or processes to ensure the trustworthiness of CIS and CPS, and some efforts in that regard are underway, IoT and IIoT certification typically isn’t seen as a high priority at this time. Underwriters Labs (UL), though, is one organization that has met with some success in marketing this type of cybersecurity program. The challenge, however, is that standards are also methods of identifying how cybersecurity for a device should work, which, in turn, offers a roadmap for hackers.
UNDERSTANDING CYBERATTACK IMPACTS
Sadly, there have been plenty of accounts (published and unpublished) regarding the direct effect of hacks into weakly secured systems. They include stories about companies assumed to have had better-than-average security.
Cyberattacks can take many forms, including the changing or removing of safeties, “spoofing” of information and feedback while manipulating devices, disabling of response systems, harmful operation, and denial of service. As for the specific effect of such activities on safety, what would be the impact, say, if one of your site’s critical compressors was turned on and off repeatedly? Or, if a vessel near personnel was over-pressurized to the point of rupture?
DEALING WITH CYBER-SYSTEM CHALLENGES
There are a great many systems and conditions that present challenges to the ability to control cybersecurity issues and the trustworthiness of cyber systems. These include:
♦ Legacy systems (previous generations) with either no security or compromised security.
♦ Education of maintenance and reliability professionals.
♦ Vendor security-related updates and upgrades that aren’t performed or, if performed,
generate holes in secure systems.
♦ No universally recognized certification program for ensuring trustworthiness of devices.
♦ No universally recognized practices or processes for the checking and ensuring of
♦ Human elements and actions that “go around” secure systems.
♦ The ability of hostile elements to exploit weak systems by detecting them with
specialized search engines, i.e., Shodan.io, and software.
In addition to the above, one of the more gnawing issues is the fact that small- to medium-size companies have received little to no support or education in the area of cybersecurity. This is particularly concerning since these operations are frequently targeted directly, yet, typically, have the fewest resources for protecting themselves and educating personnel. Interestingly, many successful attacks on larger, more cyber-secure and aware organizations are initiated through third-party vendors with exploitive emails and other tactics that can only be combatted with highly sophisticated forms of cyber protection.
It’s important to understand that that while half of cyberattacks are launched by outsiders, roughly 25% are instigated by disgruntled employees or other “insiders.” Motivations behind such attacks are (by percentage): strictly opportunistic/easy to hack (49%); dissatisfaction with employer (15%); and industrial espionage/financial crime/terrorism/data theft (23%). In effect, fewer than one out of four attacks are considered financially inspired, meaning that the potential risk exists across the spectrum.
INDUSTRY MAINTENANCE-DEVELOPMENT PRACTICE
Several years ago, a literature review turned up several academic articles discussing a “new concept” of Failure Modes, Effects and Vulnerability Analysis (FMEVA) for CIS/CPS. However, upon further examination, that concept seemed to be more of an attempt at the appearance of something new versus the actual way basic FMEA is utilized in physical asset management. For example, some key items related to systems undergoing RCM were outright ignored. To be clear, the impact of cyber vulnerabilities should already be part of the RCM process.
The general discussion and existing frameworks that have been developed to guide organizations, in particular public organizations, in cybersecurity have been quite complicated and daunting. This is where the existing process surrounding RCM comes into play. It is important to remember that not all failures can or should be prevented. Risk management is good policy.
The steps necessary to evaluating physical assets involve:
1. Asset Census
2. Criticality Analysis
3. RCM Process
4. Maintenance Effectiveness Review.
The Criticality Analysis determines the level of analysis, or if an analysis will be performed, based upon:
♦ Impacts on personal safety
♦ Regulatory or environmental impacts
♦ Mission or operations
♦ All others (cost impact or special).
The systems determined critical based upon a company’s criteria from above are then applied to the seven steps of RCM.
1. Function: What are the functions and associated desired standards of performance of the
asset in its operating context?
2. Functional Failures: In what ways can it fail to fulfill its function? This would include the
potential impacts of cyber issues, easily determined if the system has an IP or MAC address.
3. Failure Modes: What causes each functional failure (including cyber issues)?.
4. Failure Effects: What happens when failures occur (including compromising other systems)?.
5. Failure Consequences: In what way does each failure matter?
6. Tasks and Intervals: What should be done to predict or prevent each failure?
7. Default Actions: What should be done if a suitable proactive task cannot be found? Can the
system be made more resilient?
The FMEA itself follows several basic steps:
1. List the function being evaluated
2. Define the functional failure
3. Determine failure modes
4. Failure effects at:
The purpose of this article is not to describe how to develop an RCM or FMEA process. Instead, it is primarily aimed at bringing readers to the realization that the cyber portion of a plant’s systems, which have access to an Intra or Internet, need to be considered when developing a maintenance program. Tools are available to ensure that the system is developed to be resilient, is secure, or both.
KEEPING THE REAL GOAL IN MIND
Each organization has vulnerabilities that make it susceptible to cyberattacks. The application of RCM must consider the CIS and CPS impacts on these vulnerabilities, otherwise the RCM process is not being fully utilized. In that case, the potential for damage to the system and impact on personnel safety would be substantial.
Reliability and maintenance professionals working closely with IT departments can improve the reliability and resilience of plant cyber-information and cyber-physical systems dramatically. Just because operation may not appear to be a potential target doesn’t mean it will not be attacked. The goal is to ensure that RCM processes are fully exploited before a company or organization is tested by a malicious entity.TRR
ABOUT THE AUTHOR
Howard Penrose, Ph.D., CMRP, is Founder and President of Motor Doc LLC, Lombard, IL and, among other things, a Past Chair of the Society for Reliability and Maintenance Professionals, Atlanta (smrp.org). Email him at firstname.lastname@example.org, or email@example.com, and/or visit motordoc.com.
Tags: reliability, availability, maintenance, RAM, digital transformation, cyber-information systems, cyber-physical systems, reliability-centered maintenance, RCM, cybersecurity, cyberattacks, IoT, IIoT, intranet, safety, Failure Modes Effects and Vulnerabilities Analysis, FMEVA, Underwriters Laboratory, UL