Planning for Failure and Its Investigation

In the world of maintenance and asset management, equipment failure is an all too common occurrence. Unfortunately, setting in place a strategy and methodology to understand and learn from each equipment failure is not a common occurrence.

It’s through an acute understanding of failure from which the best preventive, predictive, and proactive programs are realized. Reliability Centered Maintenance (RCM) methodology relies heavily on the up-front Failure Modes and Effects Analysis (FMEA) process that analyzes how the equipment or machine is to be operated, loaded, and stressed to pre-determine the mode in which each component or system might fail, how it will fail, and evaluate the effect or consequence of each failure. This is obviously a valuable, but timely and expensive exercise not readily available to many maintenance departments.

There is an alternative. In lieu of an RCM program, a less onerous and inexpensive failure-understanding method easily employed by all maintenance departments is the Failure Scene Investigation (FSI) approach to asset failure.

IMPLEMENTING AN FSI APPROACH
When a machine breakdown occurs, everyone agrees on the importance of a fast turnaround to get the machine up and running, but at what cost to the business? In a best-practice organization, a maintainer is expected to approach each and every breakdown in a logical manner that requires him or her to perform a series of functions that include the following:

1. Assess the safety of the situation.

2. Assess the failure.

3. Assess if the maintainer’s maintenance ticket allows him or her to complete the repair

4. Assess and develop a repair strategy (can the work be completed immediately or are special
parts and tools required to complete the repair at another time).

5. Perform the repair.

6. Document time and parts used.

7. Hand machine back to operations.

8. Assess and determine if the event was a maintenance failure.

9. Assess and determine the probable root cause of failure.

10. Assess if the current preventive-maintenance (PM) or predictive-maintenance (PdM) program
should have caught the failure and make recommendations to the planner.

If your organization only performs functions 1 through 7 above, it has plenty in common with the majority of maintenance departments that excel in a reactive approach. Assuring maximum asset availability calls for a cold, calculated, proactive maintenance approach that demands recognition and understanding of each and every equipment failure.

By nature, every maintainer must be Failure Scene Investigator, responsible for equipment reliability through better understanding of equipment failure. Any time a piece of equipment or component fails, it leaves behind an evidence trail that can be documented and analyzed to determine the root failure cause and, in turn, fuel development of a suitable failure avoidance/management strategy.

Sadly, given the haste to “keep the equipment running at all costs,” many failure scenes are contaminated by maintainers themselves, with vital evidence ignored and simply thrown into the trash, and no photos taken or analysis performed. Incorporating a forensic (criminal) style of investigation, the following eight steps lay out an innovative procedure manual for better understanding and dealing with equipment failure.

1. Secure the scene. Prior to performing any “hands on” work, collaborate with operations to ensure
the area is safe and collectively perform a qualitative evaluation of the failure scene before commencing
repairs and/or restarting the equipment.

2. Photograph the scene. “A picture is worth a thousand words!” Photos and videos allow reliability
engineers, planners and other maintainers to revisit the failure scene again and again well after the
equipment is back up and running. This also provides excellent training material for preventing future
failures. (Note: Always place a 6-in. rule against photographed items to help assess scale.)

3. Perform on-scene diagnostics. Now is the time for the maintenance/reliability group to conduct
on-scene diagnostics that can include (including infrared signatures, oil analysis signatures, temperatures,
water evidence, and the like).

4. Bag and tag all physical failure evidence. Once all local physical evidence of tampering and breakage
has been photographed, tagged and bagged, the actual failed components can be dismantled and replaced.
Any parts for repair must be photographed. Any parts requiring replacement must also be bagged and
tagged. Use bubble wrap, heavy-duty freezer bags and heavy-duty cling wrap (for larger items) to protect
components and evidence.

5. Interview witnesses. Operators can describe any abnormal sound, smell, or vibration emanating from
the equipment prior to failure.

6. Code the failure on the work order. Complete the work order with a report of the findings, making
sure to include any failure symptom codes on it.

7. Perform necessary laboratory analysis. Examine all past failure records and diagnostic readings and
conduct any necessary destructive testing and metallurgical and/or oil analysis, etc., by sending out to
a recognized lab.

8. Analyze findings. Write up a Root Cause Analysis of Failure (RCAF) report. Discuss findings and
recommendations with planner and supervisor to update the proactive-mintenance program accordingly.

Adopting the FSI methodology described above requires a disciplined “planned and scheduled” approach to be in place, one wherein a maintainer is afforded time to perform both the initial on-site analysis and the RCAF follow-up. This, in itself, is another critical step toward RAM excellence.TRR

ABOUT THE AUTHOR
Ken Bannister has 40+ years of experience in the RAM industry. For the past 30, he’s been a Managing Partner and Principal Asset Management Consultant with Engtech industries Inc., where he has specialized in helping clients implement best-practice asset-management programs worldwide. A founding member and past director of the Plant Engineering and Maintenance Association of Canada, he is the author of several books, including three on lubrication, one on predictive maintenance, and one on energy reduction strategies, and is currently writing one on planning and scheduling. Contact him directly at 519-469-9173 or kbannister@theramreview.com.

Tags: reliability, availability, maintenance, RAM lubrication, equipment failure, failure analysis, RCM, RCAF

Planning for Failure and Its Investigation

FEATURED CATEGORIES