August 13, 2020, 12:24 pm

Managing Human Error in Maintenance

Article Index

Human-ErrorNumerous research studies have shown that over 50% of all equipment fails prematurely after maintenance work has been performed on it. In the most embarrassing cases, the maintenance work performed was intended to prevent the very failures that occurred. Building on the latest academic research, and based on practical experience, this paper outlines the key things that maintenance managers can do to reduce or eliminate the impact of human error in maintenance.

The key points that will be covered include:

  • Human error is inevitable - we ignore it at our peril
  • The role of an optimum PM program in minimising the impact of human error
  • Maintenance Quality Management - essential elements for managing maintenance error


In their ground-breaking work that led to the establishment of the technique that we now know as Reliability Centred Maintenance, Nowlan and Heap(i) found, when analysing the failures of hundreds of mechanical, structural and electrical aircraft components, that these failures occurred with 6 distinct patterns, as illustrated below.

6 patterns

The interesting finding, in the context of this paper, is that more than two-thirds of all components demonstrated early-life failure.  It has been estimated that maintenance errors ranked second to only controlled flight into terrain accidents in causing onboard aircraft fatalities between 1982 and 1991 (despite the application of RCM techniques in the airline industry during this period).(ii)

A study of coal-fired power stations indicated that 56% of forced outages occur less than a week after a planned or maintenance shutdown.(iii)

Other studies have been conducted which confirm these findings, but, until recently, there has been little research performed that has investigated the reasons for this.  Several plausible theories have been proposed – possible explanations that I have heard include:

  • “Human Error” – the repair/replace task was not successfully completed due to a lack of knowledge or skill on the part of the person performing the repair.
  • “System Error” – the equipment was returned to service after a high-risk maintenance tasks without the repair having been properly inspected/tested.
  • “Design Error” – the capability of the component being replaced is too close to the performance expected of it, and therefore lower capability (quality) parts fail during periods of high performance demand.  The remaining higher capability (quality) parts are capable of withstanding all performance demands placed on it.  This could be envisaged in the following graph:

Design Error

  • “Parts Error” – the incorrect part or an inferior quality part has been supplied.

More recently, James Reason(iv) has compiled a table summarising the results of three surveys – two performed by the Institute of Nuclear Power Operations (INPO) in the USA, and one by the Central Research Institute for the Electrical Power Industry (CRIEPI) in Japan.  In all three of these studies, more than half of all identified performance problems were associated with maintenance, calibration and testing activities.  In comparison, on average only 16% of problems occurred while these power stations were operating under normal conditions.

Reason also quoted the results of a Boeing Study(v) which indicated that the top seven causes of inflight engine shutdowns (IFSDs) in Boeing aircraft were as follows:

  • Incomplete installation (33%)
  • Damaged on installation (14.5%)
  • Improper installation (11%)
  • Equipment not installed or missing (11%)
  • Foreign Object Damage (6.5%)
  • Improper fault isolation, inspection, test (6%)
  • Equipment not activated or deactivated (4%)

We can see from this, that only one of these causes was unrelated to maintenance activities, and that maintenance activities contributed to at least 80% of all IFSDs.

If poor quality maintenance causes so many incidents in highly regulated and hazardous industries such as Nuclear Power Generation and Civil Aviation, what proportion of failures may be being caused by Maintenance within your organisation?

What are the outcomes of maintenance-induced failures?  Clearly, depending on the industry in which you operate, there are potentially significant safety and environmental risks.  There is a long list of catastrophic failures in which, the inadequate performance of a maintenance task played a significant role.  Some of these include:

  • Flixborough
  • Three Mile Island
  • Piper Alpha
  • American Airlines Flight 191
  • Bhopal
  • Japan Airlines Flight 123
  • Clapham Junction
  • Etc. etc.

But besides the obvious safety risks, perhaps the bigger consequences are economic.  General Electric has estimated that each in-flight engine shutdown costs airlines in the region of US$500,000.  What could maintenance-induced failures be costing your organisation?

Clearly, we need to do something to reduce the number of equipment failures that are being caused, not prevented, by maintenance.  This paper suggests that the most appropriate approach is:

  • Admit that human error is inevitable (even in Maintenance!) and design our systems and processes around this inevitability
  • Use appropriate tools to ensure that we are not unnecessarily over-maintaining plant and equipment (and therefore increasing the risk associated with the fact that this work may not be performed correctly), and
  • Work to improve the quality with which maintenance activities are performed – including error-proofing where possible.

Related Articles

Asset Performance Management (APM) – Key implementation issues and how to avoid them
This is the second article of series of four articles that we will publish on Asset Performance Management Systems. In our first article we noted...
Asset Performance Management (APM) – What is an Asset Performance Management system?
Over recent years, Assetivity has seen an increasing uptake of Asset Performance Management (APM) Systems in capital intensive industries.  We...
Enterprise Asset Management (EAM) and Asset Performance Management (APM) Systems - Making sense of your data
Can you make sense of your asset related data? Can you use this data to optimise your business? Can you connect data from the various asset related...
Availability vs Reliability – Which is more important?
There is often confusion amongst those new to Maintenance and Reliability regarding the difference between Availability and Reliability. This article...
Improving Equipment and People Productivity in the Mining Industry
Here is a copy of a presentation given by Sandy Dunn at the IMARC conference in September 2014.  In this presentation he talks about past...
Maintenance and Reliability Improvement Program
PanAust is a leading copper and gold producer in Southeast Asia and has a portfolio of pre-development projects in Laos, Chile and Papua New Guinea....
Reliability: Creating Competitive Advantage in a Cost-cutting Environment
Following a period of boom, the main challenge that Maintenance, Operations and Reliability leaders usually face is to survive the inevitable cost...