Select Page

The goal of any reliability-focused asset management system is to avoid defects. Doing so requires a comprehensive management system that includes defect avoidance, defect detection and, finally, defect correction. Defect management for most organizations is oriented toward defect elimination, which is supplemented by a defect-detection process. Regrettably, defect avoidance, which is arguably the greatest dividend-generating aspect of the defect-management process, rarely gets much attention.

Let’s discuss the creation of a closed-loop system for avoiding, detecting, analyzing, and correcting defects in the plant. As shown  in Fig. 1,  it includes a continuous-improvement loop that promotes desirable organizational behaviors to drive the process .

Fig. 1. This closed-loop system for avoiding, detecting, analyzing, and correcting defects
includes continuous-improvement loop that promotes organizations behaviors to drive the process

Your best approach for managing defects is to avoid them in the first place. We avoid defects by proactively controlling the forcing functions that lead to failure. By taking aim at mechanical and electrical balance, misalignment, mechanical and electrical fasteners, and lubrication selection, application, and contamination control, we replace shaky, noisy, hot, and dirty machine operations with equipment that run smooth, quiet, cool, and clean. These machines simply last longer—in many cases, much longer—and produce fewer defects to detect, analyze, and correct. It’s the law of large numbers. Fortunately, managing these underlying root causes of machine failure is comparatively easy.

Click Here And Refer To My Article “Optimum Reference States For Precision Maintenance” For More Detail.

Naturally, we also must operate equipment correctly to avoid defects. Machines must be started and stopped with precision. Product and process changeovers must also be performed with a high degree of precision. Moreover, machines must be operated within their sustainable range of operating loads and speeds. It is very important that organizational leadership focus on proactive rewards that will extrinsically and intrinsically promote proactive behaviors.

Even the most robust efforts to avoid defects aren’t 100% effective. We must constantly survey machines with various forms of inspections and monitoring. This surveillance may take the form of sensory or simple gauge-assisted inspection rounds conducted by operators or craftspeople. Or, it could be in the form of condition monitoring activities, such as vibration analysis, ultrasonic analysis, lubricant analysis, thermographic analysis, motor analysis, or non-destructive testing performed by specialized technicians. In all instances, equipment surveillance detects proactive or predictive opportunities.

Proactive detection is ideal because it reveals a forcing condition that, if left uncorrected, will increase machine degradation, and, in turn, allows us to intervene to eliminate the undesirable condition. Predictive detection is valuable because it reveals early-stage damage that may be addressed very early, which, in turn, avoids collateral damage, enables planning and scheduling of maintenance, and has a minimal impact on production throughput.

Occasionally, a defect requires detailed analysis so that it may be better understood and eliminated. Typically, defect elimination requires a root-cause analysis (RCA).

Root-cause analysis differs from simple why-why, (or five-why) analysis, in terms of both philosophy and execution. Why-why analysis should be a routine activity that’s a part of the culture. Operators and craftspeople who routine ask themselves, “Why did this happen?” are in a position to make minor adjustments in equipment operation, inspection, and care. Why-why analysis is a simple process of asking “why” in a progressive series, until a root cause is found, and, thus, be addressed. More serious problems require a more serious intervention, which is where RCA enters the scene.

A problem may be considered serious either as a function of magnitude, which is a really significant failure or near miss, or as a function of frequency, which is a problems that occurs so often that its pervasive. Risk is magnitude multiplied by frequency, and is the basis for determining if RCA is required.

When conducting an RCA, one does not progress through a simple series of “why” questions. Rather, it is necessary to identify the universe of possible causes. I like the taxonomy of failure causes that’s defined in DOE NE 1004, which is US Department of Energy guide for conducting RCA in a nuclear power plant. It’s generally applicable, easy to use, and free. (Email me at for a PDF of the document). Then, you go through a series of why-why analysis for each possible cause.

The goal of an RCA is to eliminate causes that you know DID NOT contribute to the failure. That, hopefully, leaves you with a manageable set of causal factors that you DO know contributed to the failure mode under investigation along with those factors that you simply can’t eliminate from the suspect list. The outcome of the RCA is to make changes in design of the asset, operation of the asset, or maintenance and care of the asset that will prevent recurrence of failure or reduce the risk to an acceptable level.

Whether a failure requires RCA, it’s going to require correction. Here, the key is to make sure the job is planned properly to include understandable work instructions, an accurate and complete Bill of Materials (BOM), high-quality parts, a clear definition of required tools (including any special tools), and required skills to execute the task effectively (including those of specialty contractors where required).

The work instructions must specify fit, tolerance, quantity, and quality details. For example, it’s not enough to merely say, “Install threaded fasteners.” The size and grade of the bolt, nut, and washers must be defined in the BOM. Likewise, the torque value, tightening sequence, and fastener-lubrication requirements must be defined in the work instructions. Other precision elements associated with the job must also be plainly defined and specified. Furthermore, the plan must include a clear definition for the successful completion of the job, so that the supervisor or a master tradesperson may conduct a job-quality inspection.

The defect-management process must be a closed loop. On completion of defect correction work, planners and reliability engineers require feedback.

Include the following feedback to planners:

    • What instructions were unclear?
    • What tools were required but not specified?
    • What parts were needed but not included in the BOM or job kit?
    • What risks were encountered completing the job?

Include the following feedback for reliability engineers:

    • What caused the failure to occur?
    • Could the failure been avoided by modifying the design of the equipment or the parts used?
    • Could the failure have been avoided if the machine was operated differently?
    • What difficulties did you encounter when completing the job?
    • If we had to buy the machine again today, what changes would you suggest?

A closed-loop process for defect management will pay big dividends for your organization. This is particularly true if you focus on the proactive elements. By controlling the physical forcing functions that cause machines to run shaky, noisy, hot, and dirty, focusing on building skills and know-how, and rewarding proactive behaviors, you can create a strong culture of defect avoidance. The payoff? Your equipment will be more reliable and less costly to own and operate, and your plant will be safer and produce fewer environmental impacts. What’s holding you back?TRR


Drew Troyer has more than 30 years of experience in the RAM arena. Currently a Principal with T.A. Cook Consultants, he was a Co-founder and former CEO of Noria Corporation. A trusted advisor to a global blue chip client base, this industry veteran has authored or co-authored more than 250 books, chapters, course books, articles, and technical papers and is popular keynote and technical speaker at conferences around the world. Drew is a Certified Reliability Engineer (CRE), Certified Maintenance & Reliability Professional (CMRP), holds B.S. and M.B.A. degrees, and is Master’s degree candidate in Environmental Sustainability at Harvard University. Contact him directly at 512-800-6031 or

Tags: reliability, availability, maintenance, RAM, fasteners, lubrication, alignment, balance, vibration, root-cause analysis