April 27, 2019, 12:07 am

Maintenance - Risky Business?

Maintenance is inherently risky.  In fact, you could argue that the only activity that is riskier than doing maintenance is NOT doing maintenance!

The nature of these risks is wide and varied, and includes:

  • Financial Risks (are your maintenance costs under control?)
  • Operational Risks (what is the impact of a maintenance duration overrun?)
  • Environmental Risks (what would be the impact of a spillage which occurred during a maintenance activity?)
  • Safety Risks (will the maintenance be performed safely?)
  • Quality Risks (will the equipment operate reliably after maintenance?)

In the context of maintenance, undertaking a risk assessment is the equivalent of looking before stepping into the street.  A risk assessment is a tool that assists you in ensuring important steps are not overlooked, or underestimated.  There is no doubt undertaking equipment maintenance is risky. We have to consider health, safety, and environmental implications.  Hurting people or damaging the environment is not only bad from a personal stand point; it can severely impact your business's reputation.

In addition, what about the risk maintenance activities pose to your business's bottom line? For example, can your business tolerate a duration overrun on a major shutdown or overhaul?  In most cases the answer is 'no'.  Every hour of maintenance is an hour of lost production.  Given this, wouldn't it be nice to know ahead of time which maintenance activities were most likely to contribute to a duration overrun?  That way you could concentrate your effort on just those activities and come up with an action plan that removed the potential, or at the very least mitigated the overrun's impact.

Risk Analysis

There are two approaches to analysing risk, these are qualitative and quantitative.

Analysing risk using a qualitative approach ranks or separates tasks into descriptive categories such as low, medium, high; or, not important, important, and very important.  In terms of determining maintenance risks, a qualitative approach is best used to determine tasks, jobs, or entire work streams that should to be further analysed via a more detailed method.

Quantitative Risk Analysis is that more detailed method.  A quantitative approach looks at the probability of a given event occurring and quantifies the impact (or range of impacts) of that event on your business.  Obtaining accurate data is the biggest challenge with applying quantitative techniques, and because of this, a quantitative approach can still produce results that are very subjective in nature.

So how do we go about applying greater objectivity in our risk analysis process?  As unlikely as it may seem, the answer might be found amongst the glitz, glamour, and bright lights of Monaco's, Monte Carlo casino.

A 'Monte Carlo' simulation is a mathematical method used to approximate the distribution of potential results based on optimistic, expected, and pessimistic inputs.  Each simulation randomly pulls sample values for each input variable from a defined probability distribution.  These sample values are then used to calculate the results.  This is repeated until the statistical probability of a given event is sufficiently accurate.

In major shutdowns and other significant maintenance projects, 'Monte Carlo' simulations are most usually performed using task durations from the project plan.  By using this technique you can determine the statistical probability of meeting a certain task duration.  Unacceptable probability ranges in duration are then identified allowing mitigating actions to be assigned.  This enables effective risk management of schedule over run.

Unfortunately MS Project doesn't come with a built in risk analysis tool.  To use a Monte Carlo simulation with MS Project you'll need to obtain an add-on tool.  The good news is there's no shortage of these commercially available.

  • @RISK for Project
  • http://www.palisade.com
  • RiskyProject
  • http://www.intaver.com
  • Steelray Project Analyzer
  • http://www.steelray.com
  • Full Monte
  • http://barbecana.com

Now, there is no doubt these are all great tools.  They all have the ability to assist you in determining things such as:

  • the probability of 'on schedule' completion;
  • the probability of project completion within budget;
  • the probability of a certain task moving onto the critical path; and,
  • Which tasks have the biggest effect on the project duration. 

However, before you make any significant capital outlay, I suggest you look at adopting a clever little MS Project macro developed by Jack Dahlgren.  Jack has kindly made his work freely available at http://masamiki.com/project/blackjack.htm.

Maintenance Risky Business Fig 1

 Figure 1: MS Project 'Monte Carlo' Table

Figure 1: MS Project 'Monte Carlo' Table shows the Microsoft project table used to enter the required data for the Monte Carlo simulation. Once all required data is entered, the Monte Carlo Simulation can be run with the results being output in Excel.

Essentially Jack's code picks up user marked tasks that require analysis.  Optimistic and pessimistic durations are entered against those tasks.  These durations, in conjunction with the estimated task duration, are then used as the points of a triangular distribution.  The macro iterates through all the marked tasks producing a data output to Excel.  This data can then be pivoted and graphed allowing detailed analysis.

The real benefit of Jack's work lies in its adaptability.  With a little Visual Basic for Applications (VBA) knowledge, you can easily modify the code to meet your particular needs.  Once adapted, it will assist in risk analysis of critical path and near critical path work streams in any number of maintenance activities mapped using Microsoft Project.

Maintenance Risky Business Fig 2

Figure 2: 'Monte Carlo' Analysis Results Table

Figure 2: 'Monte Carlo' Analysis Results Table is just one example of the outputs that can be produced.  It shows the calculated expected duration of each task, its standard deviation, the required task duration to ensure 75% chance of on time completion, the required task duration to ensure a 95% chance of on time completion and recommends review of tasks based on statistical measures.  This output can then be used to determine areas within work streams where effort should be focussed and actions are needed to ensure unacceptable risk to schedule overrun is reduced.

We can then start applying some typical risk mitigation approaches:

  • Acceptance:       Not changing the shutdown or project plan to deal with a risk.  This is usually done when the risk is determined to be either very unlikely or have a very low impact.
  • Avoidance:         Changing the plan to eliminate a risk or protect a project's objectives from the impact of the risk.
  • Contingency:     Develop a plan of action to deal with the time or cost impacts of a risk occurring.
  • Risk Reduction: Actions to reduce the likelihood of a risk occurring or the impact upon the project if it did.
  • Risk Transfer:    Shift the impact of a risk (along with its ownership) to a third party.

Once all identified areas of risk are addressed, we can tackle our maintenance activity with the confidence we've done all we can to ensure we haven't overlooked important steps, or underestimated a particular task.  The risk of activity overrun will be reduced, and most importantly, we'll have added value to our businesses bottom line.

Brad Coulding

Assetivity, Principal Consultant

If you enjoyed this article and want to receive notifications of future articles that we publish, please sign up for our newsletter here.

Related Articles

Design for Reliability
In the previous article in this series we offered the view that assuring equipment reliability is a critical business activity, with potentially...
Reliability is Everybody’s Responsibility
As pilot-in-command, James finished his routine scan of his instruments and then sat back.  It was a beautiful day for flying; clear, cloudless...
Enterprise Asset Management (EAM) and Asset Performance Management (APM) Systems - Making sense of your data
Can you make sense of your asset related data? Can you use this data to optimise your business? Can you connect data from the various asset related...
How to Prioritise Maintenance Work Orders?
 I have recently received a question from one of our newsletter subscribers.  He writes: “Firstly, I want to say, I really enjoy reading...
Availability vs Reliability – Which is more important?
There is often confusion amongst those new to Maintenance and Reliability regarding the difference between Availability and Reliability. This article...
Sign up to our Mailing List

Receive useful Maintenance & Asset Management articles, tools and news

assetivity logo