August 13, 2020, 11:47 am

Managing Human Error in Maintenance

Article Index

Maintenance Quality Management – Key Principles

Following Reason and Hobbs(vii), the following are the principles that a Maintenance Quality Management system must embrace:

  • Human error is both universal and inevitable.  Human error is not a moral issue – making them is as much a part of human life as eating and breathing
  • Errors are not intrinsically bad.  Success and failure spring from the same roots.  We are error-guided creatures.  Errors mark the boundaries of the path to successful action
  • You cannot change the human condition, but you can change the conditions in which humans work.  There are two parts to an error – a mental state and a situation.  We have limited control over people’s mental states, but we can control the situations in which they have to work.
  • The best people can make the worst mistakes.  No one is immune to error – if only a few people were responsible for most of the errors, then the solution would be simple, but some of the worst mistakes are made by the most experienced people.
  • People cannot easily avoid those actions they did not intend to commit. Blame and punishment is not appropriate when peoples’ intentions were good, but their actions did not go as planned.  This does not mean, however, that people should not be accountable for their actions, and be given the opportunity to learn from their mistakes. 
  • Errors are consequences, rather than causes.  Errors are the product of a chain of actions and conditions which involve people, teams, tasks, workplace and organisational factors.  Discovering a human error is the beginning of the search for causes, not the end.
  • Many errors fall into recurrent patterns. More than half of maintenance errors are recognised as having happened before – often many times.  Targeting these recurrent errors is the most effective way of addressing human error issues.
  • Safety-significant errors can occur at all levels in the system.  Indeed, the higher in an organisation that an error is made, the more significant the consequences.
  • Error Management is all about managing the manageable.  Situations are manageable – human nature, in its broadest sense, is not.
  • Maintenance Quality Management is about making good people excellent.  Maintenance QualityManagement is not about making a few error-prone people better – rather it is a way of making the larger proportion of well-trained and motivated people excellent
  • There is no one best way.  Different Maintenance QualityManagement methods will apply in different situations, and in different organisations.
  • Effective Maintenance Quality Management aims at Continuous Reform rather than Local Fixes.  The temptation is to resolve errors one at a time, as they arise, but as errors tend to be systemic in nature, a more appropriate method is to deal with human error systematically, and continuously.

There are a number of Maintenance Quality Management tools that can be applied.  The exact combination of these that is most appropriate for any organisation varies, but they could include:

Person Measures

Provide training in error-provoking factors.  Training maintenance personnel in order to give them an understanding and awareness of the factors and situations that may lead them to be more error-prone is a starting point in successfully addressing human error.  They should understand such factors as the limitations of human performance, the limitations of short term memory, the impact of fatigue, the impact of interruptions, the impact of pressure and stress, the types of errors that they can make, and the situations in which these errors are most likely to arise.  Once maintainers are aware of their own limitations, then they can start to detect the warning signals that indicate a higher risk of an error being made, and can take steps to avoid this from happening.

Implement measures to reduce the number of deliberate violations.  Traditional approaches to the avoidance of violations tend to focus on scaring people into compliance.  This may have its place, but an additional, effective approach is to create a social environment within the workplace where deliberate violations bring disapproval from within peoples’ peer groups.  There are a number of approaches that are being tried, both within and external to the workplace, which appear to be successfully creating this social environment, but overnight success stories are rare.

Encourage mental rehearsal of tasks before they are performed.  There is significant evidence to suggest that achieving the right degree of mental readiness for a task before it begins has a significant positive impact on the quality and reliability with which this task is performed.  This is based on studies of surgeons and Olympic athletes.(viii)

Control Distractions.  Anticipating the distractions that are likely to occur, and developing a strategy for dealing with them before they occur is most likely to enhance the quality of task performance.

Avoid Place-Losing Errors.  Through such techniques as inserting place-markers at appropriate points in the procedure.

Team Measures

Provide teamwork training.  Significant accidents have occurred as a result of poorly functioning teams.  Most notable of these was an aircraft accident involving KLM and PanAm 747s at Tenerife, which resulted in the loss of more than 500 lives.  Effective teamwork training will focus on:

  • Communication skills
  • Crew development and leadership skills
  • Workload management, and
  • Technical proficiency

Workplace and Task Measures

Ensure that personnel only perform tasks when they are properly trained, skilled and qualified.  It goes without saying that quality work practices can only be put in place when maintenance personnel have the requisite technical skills and capabilities required to perform the work that is allocated to them.

Fatigue Management.  Ensure that a well-designed shift roster is in place which minimises the impact of fatigue.  Ensure also, that there are adequate controls in place for managing overtime work

Assign tasks appropriately.  There is evidence to suggest that there is a link between the frequency with which a task is performed, and the likelihood that the task will be performed correctly.  Both infrequently performed, and very frequently performed tasks tend to be those at greatest risk of human error.  Infrequently performed tasks are generally more at risk because of the lack of experience of the person performing the task, while on very frequently performed tasks fall victim to skill-based slips and lapses, as the person performing the work operates on “auto-pilot”.  Intelligent allocation of work to individuals takes this into account, and can assist in minimising human error.

Ensure that equipment, and tasks, are properly designed.  In order to minimise the likelihood of error in performing maintenance tasks, the equipment should be designed for maintainability.  This should include consideration of such factors as:

  • Easy access to components
  • Components that are functionally related should be grouped together
  • Components should be clearly labelled
  • There should be minimal requirement for special tools
  • It should not be necessary to perform high-precision work in the field
  • Equipment should be designed to permit easy fault diagnosis

Enforce good housekeeping standards.  Housekeeping practices are a good indicator of attitudes and culture relating to quality.  The correct standards are those that avoid dangerous slovenliness, without resorting to anally-retentive cleanliness.

Ensure Spare Parts and Tools are managed well.  Maintenance cannot perform high quality work if the parts and tools that they need are not available when required.  This leads to potentially dangerous short-cuts and workarounds being put in place.  An important aspect of Maintenance Quality Management is ensuring that Tool Management and Spare Parts Management processes and practices support the achievement of high quality work.

Write, and Use, Effective Maintenance Work Instructions.  Omission of necessary steps is the most common form of maintenance error.  Some estimates suggest that omissions account for more than half of all human factors problems in maintenance.  The development, and use, of effective maintenance work instructions is an important tool in managing these types of errors.

Organisational Measures

Put in place effective processes for analysing, and learning from, past failures.  It is vitally important that any significant failures should be investigated using an effective Root Cause Analysis process.  This Root Cause Analysis process, to be effective should fully investigate all of the contributing causes to the failure, whether these be physical causes, human causes, or organisational causes.  The most effective solutions to preventing these failures from happening again, will be those that deal effectively with the organisational causes of failures.

However, in order to effectively analyse those failures that are occurring as a result of human failures, it is also necessary to engender a “Reporting Culture” within the organisation – where all failures, no matter how seemingly insignificant, are reported.  This, in turn, particularly when we are dealing with human errors, requires the development of a high level of trust between management and those at lower levels in the organisation.  People must not feel that reporting human failures is likely to lead to adverse personal consequences.  Those who have researched so-called “High Reliability Organisations” (HROs) have noted that high levels of failure reporting is a significant feature of those organisations.(ix)

Put in place proactive processes for assessing the risk of future maintenance errors.  Avoiding the recurrence of past failures is an admirable, but insufficient, goal for those seeking to achieve high quality maintenance outcomes.  One possible proactive method that could be employed to proactively manage Maintenance Quality is to perform a risk assessment of maintenance activities, in order to assess whether the likelihood of human error is high.  Possible areas that could be assessed in this risk assessment would include:

  • The knowledge, skills and experience of maintenance personnel at all levels
  • Employee morale
  • The availability of tools, equipment and parts to perform maintenance tasks
  • Workforce fatigue, stress and time pressures
  • Shift rosters
  • The adequacy of maintenance procedures and work instructions

One example of a risk assessment process that is used in the aviation industry is Managing Engineering Safety Health (MESH) which was developed initially by British Airways in the early 1990s, and has been further developed and adapted by Singapore Airlines.(x)

In addition, more specific review and assessment of error detection and containment defences can be performed.  This could ask questions such as:

  • Are there adequate processes in place for independent inspection of high-risk tasks?
  • Are functional tests and checks ever omitted or abbreviated, for any reason?
  • Have tasks ever been signed off as completed, when this was subsequently found not to be the case?
  • After maintenance, is equipment adequately tested before being returned to service?

Ultimately, even putting both proactive and reactive measures in place will not guarantee the absence of human error, but together, these strengthen the organisation’s intrinsic resistance to human error.


The impact of human error on maintenance quality and costs, safety and equipment reliability is huge.  Yet we are only just starting to develop a better understanding of what causes error in maintenance activities, and to develop better tools and techniques to avoid or minimise the consequences of this error.  This paper has attempted to outline some of the latest research findings, and provide you with some ideas that you may find useful in addressing maintenance error within your organisation. 


[i] Nowlan FS & Heap H – Reliability-centered Maintenance.  Springfield, Virginia: National Technical Information Service, US Department of Commerce, 1978.

[ii] Davis RA – Human Factors in the Global Marketplace – Keynote address, Annual Meeting of the Human Factors and Ergonomics Society, Seattle, 12 October 1993

[iii] Smith A – Reliability Centered Maintenance – Boston, McGraw Hill, 1992

[iv] Reason J – Managing the Risks of Organizational Accidents – Ashgate Publishing, 1997

[v] Boeing – Maintenance Error Decision Aid, Seattle: Boeing Commercial Airplane Group, 1994

[vi] Reason J & Hobbs A – Managing Maintenance Error, Ashgate Publishing, 2003

[vii] Reason J & Hobbs A – Managing Maintenance Error, Ashgate Publishing, 2003

[viii] Orlick T – In Pursuit of Excellence – Ottowa, Zone of Excellence, 2000

[ix] See for example, Karl E.Weick & Kathleen M. Sutcliffe, “Managing the Unexpected – Assuring High Performance in an Age of Complexity”, Jossey-Bass, 2001

[x] See Reason J – Managing the Risks of Organizational Accidents – Ashgate Publishing, 1997

If you enjoyed this article and want to receive notifications of future articles that we publish, please sign up for our newsletter here.


Related Articles

Enterprise Asset Management (EAM) and Asset Performance Management (APM) Systems - Making sense of your data
Can you make sense of your asset related data? Can you use this data to optimise your business? Can you connect data from the various asset related...
Availability vs Reliability – Which is more important?
There is often confusion amongst those new to Maintenance and Reliability regarding the difference between Availability and Reliability. This article...
Improving Equipment and People Productivity in the Mining Industry
Here is a copy of a presentation given by Sandy Dunn at the IMARC conference in September 2014.  In this presentation he talks about past...
Maintenance and Reliability Improvement Program
PanAust is a leading copper and gold producer in Southeast Asia and has a portfolio of pre-development projects in Laos, Chile and Papua New Guinea....
Reliability: Creating Competitive Advantage in a Cost-cutting Environment
Following a period of boom, the main challenge that Maintenance, Operations and Reliability leaders usually face is to survive the inevitable cost...