Select Page

In 1963, the USS Thresher, the jewel in the crown of the U.S. Navy’s nuclear-submarine fleet, sank during a sea trial after its first shipyard overhaul. We lost 129 brave sailors in that accident. The investigation concluded that relying on “skill-of-the-craft”  to guide execution of maintenance work—rather than relying on standardized procedures—led to the errors and defects that caused this tragic mishap. This difficult lesson resulted in the creation of the “Sub-Safe” program, which utilizes procedures and checklists to guide work and minimize human error.

Similar learning has taken place in aviation. As a passenger flying with any major airline today, you would, on average, have to take a commercial flight every day for 25,400 years to die in an aviation crash. Every aspect of aircraft design, manufacture, operation, and maintenance is highly controlled with procedures and checklists. Why are the aviation, nuclear submarine and nuclear power industries so procedure intensive?  It’s simple: reliability is not a luxury in those industries, it’s a mandatory requirement because failure is simply not an option. It should not be an option in other sectors, either.

As a reliability engineering consultant working in the manufacturing and process industries, I review numerous preventive and corrective maintenance plans and operational instructions. I can sum them up in a word: “Vague.” 

I’ve seen some really bad examples of work instructions that include “inspect the machine,” “repair the machine,” “check electrical,” etc. Consider the examples in Fig. 1. An instruction that says “check oil pressure” is wrong.  We need to clearly define the fit, tolerance, quantity, and quality details that are necessary to execute that PM—we need to define the oil pressure that’s acceptable.  It could be that oil pressure above a particular value is okay; or that below a particular value is okay; or, as is the case in my example, that oil pressure needs to fall in between a specified upper and lower limit.

Fig. 1.  Example of common mistakes in preventive and corrective work orders.

I’ve heard people argue that “My tradespeople are well trained and experienced. They should know what to do and what’s right.” Such logic is fundamentally flawed and simply wrong for the following three reasons:

1. Paper is for Remembering. Is it truly reasonable to expect people to remember every torque value for every grade of bolt; every grease quantity of what type of grease is required in the drive-end and non-drive-end of every motor in the plant; the required tension on every v-belt in the plant; the acceptable temperature, pressure and flow ranges or limits for every machine in the plant? While I could go on and on, I think Albert Einstein summed it up well when he said “Paper is to write things down that we need to remember. Our brains are used to think.” It’s settled science: We need procedure to remember all the details that are required to improve human performance and minimize the human factors of failure, which account for about 80% of all failures in the plant. My own research has concluded that lack of or ineffective procedures are the number-one cause for human failure in the plant (Fig. 2.).

Fig. 2. My own research suggests that lack of or ineffective procedures is
the largest contributor to failures in the plant.

2. People + Processes = Success
It may well be that your very top tradespeople can work effectively without procedures and checklists. However, there is a reason why even the most experienced airline pilots finger read and completely and faithfully execute their pre-flight inspections and checklists. To that end, even the world’s most highly skilled surgeons have acknowledged that they, like all humans, are susceptible to slips and lapses and have turned to procedures and checklists to ensure top performance every time. This is not a question of skill, knowledge, or experience. It is one of discipline and professionalism to counteract distraction and the possibility of error. The person may receive a call, have a break or get pulled away in the midst of executing a job. Might he or she forget where they left off? Additionally, creating human-performance-management systems that are designed for people whose skills are at the 90th percentile is a near-guarantee of failure. Even managing to the average, or 50th percentile, isn’t enough. You must create human-performance-management systems geared to the skills of the 10th percentile to assure success.

3. Corporate Amnesia. Corporate amnesia is a phenomenon whereby an organization literally forgets how to run its business. How does this occur? Generally, your most senior people are your most experienced personnel; they have most well-developed skillsets for operating and maintaining the plant. If your senior people retire in large numbers, they walk out with the intellectual property that’s required to run the plant. Clear procedures geared for the 10th-percentile people convert that tacitly held “tribal” knowledge and know-how into formalized intellectual property. Once formalized, you’re in a position to drive consistent quality, manage organizational change and best practice, and enable management of change (MOC) and continuous improvement, continuity, and sustainability.

I’m a big fan of creativity and creative people for analyzing and solving complex problems and innovating new ideas and opportunities. These activities represent the artistic side of plant reliability engineering and management. For day-to-day operation and maintenance of the plant, however, we don’t want to rely on the artist’s approach. We want to adopt the jet pilot’s approach, which is driven by checklists and procedures. I believe that just 10% of the organizational and process rigor employed in the aviation industry would yield 90% in improving the reliability, performance, safety, and environmental objectives we have in a typical plant or factory from managing the human factors of failure.TRR


Drew Troyer has more than 30 years of experience in the RAM arena. Currently a Principal with T.A. Cook Consultants, he was a Co-founder and former CEO of Noria Corporation. A trusted advisor to a global blue chip client base, this industry veteran has authored or co-authored more than 250 books, chapters, course books, articles, and technical papers and is popular keynote and technical speaker at conferences around the world. Drew is a Certified Reliability Engineer (CRE), Certified Maintenance & Reliability Professional (CMRP), holds B.S. and M.B.A. degrees, and is Master’s degree candidate in Environmental Sustainability at Harvard University. Contact him directly at 512-800-6031 or

Tags: reliability, availability, maintenance, RAM, workforce development, safety, human performance, USS Thresher