Maintenance is inherently risky. In fact, you could argue that the only activity that is riskier than doing maintenance is NOT doing maintenance.

The nature of these risks is wide and varied, and includes:

  • Financial Risks (are your maintenance costs under control?)
  • Operational Risks (what is the impact of a maintenance duration overrun?)
  • Environmental Risks (what would be the impact of a spillage which occurred during a maintenance activity?)
  • Safety Risks (will the maintenance be performed safely?)
  • Quality Risks (will the equipment operate reliably after maintenance?)

In the context of maintenance, undertaking a risk assessment is the equivalent of looking before stepping into the street. A risk assessment is a tool that assists you in ensuring important steps are not overlooked, or underestimated. There is no doubt undertaking equipment maintenance is risky. We have to consider health, safety, and environmental implications. Hurting people or damaging the environment is not only bad from a personal stand point; it can severely impact your business’s reputation.

In addition, what about the risk maintenance activities pose to your business’s bottom line? For example, can your business tolerate a duration overrun on a major shutdown or overhaul? In most cases the answer is ‘no’. Every hour of maintenance is an hour of lost production. Given this, wouldn’t it be nice to know ahead of time which maintenance activities were most likely to contribute to a duration overrun? That way you could concentrate your effort on just those activities and come up with an action plan that removed the potential, or at the very least mitigated the overrun’s impact.

Risk analysis

There are two approaches to analysing risk, these are qualitative and quantitative.

Analysing risk using a qualitative approach ranks or separates tasks into descriptive categories such as low, medium, high; or, not important, important, and very important. In terms of determining maintenance risks, a qualitative approach is best used to determine tasks, jobs, or entire work streams that should to be further analysed via a more detailed method.

Quantitative Risk Analysis is that more detailed method. A quantitative approach looks at the probability of a given event occurring and quantifies the impact (or range of impacts) of that event on your business. Obtaining accurate data is the biggest challenge with applying quantitative techniques, and because of this, a quantitative approach can still produce results that are very subjective in nature.

So how do we go about applying greater objectivity in our risk analysis process? As unlikely as it may seem, the answer might be found amongst the glitz, glamour, and bright lights of Monaco’s, Monte Carlo casino.

A ‘Monte Carlo’ simulation is a mathematical method used to approximate the distribution of potential results based on optimistic, expected, and pessimistic inputs. Each simulation randomly pulls sample values for each input variable from a defined probability distribution. These sample values are then used to calculate the results. This is repeated until the statistical probability of a given event is sufficiently accurate.

In major shutdowns and other significant maintenance projects, ‘Monte Carlo’ simulations are most usually performed using task durations from the project plan. By using this technique you can determine the statistical probability of meeting a certain task duration. Unacceptable probability ranges in duration are then identified allowing mitigating actions to be assigned. This enables effective risk management of schedule over run.

Unfortunately MS Project doesn’t come with a built in risk analysis tool. To use a Monte Carlo simulation with MS Project you’ll need to obtain an add-on tool. The good news is there’s no shortage of these commercially available.

Now, there is no doubt these are all great tools. They all have the ability to assist you in determining things such as:

  • the probability of ‘on schedule’ completion;
  • the probability of project completion within budget;
  • the probability of a certain task moving onto the critical path; and,
  • Which tasks have the biggest effect on the project duration. 
Maintenance Risky Business Fig 1
Figure 1: MS Project ‘Monte Carlo’ Table

Figure 1: MS Project ‘Monte Carlo’ Table shows the Microsoft project table used to enter the required data for the Monte Carlo simulation. Once all required data is entered, the Monte Carlo Simulation can be run with the results being output in Excel.

Essentially Jack’s code picks up user marked tasks that require analysis. Optimistic and pessimistic durations are entered against those tasks. These durations, in conjunction with the estimated task duration, are then used as the points of a triangular distribution. The macro iterates through all the marked tasks producing a data output to Excel. This data can then be pivoted and graphed allowing detailed analysis.

The real benefit of Jack’s work lies in its adaptability. With a little Visual Basic for Applications (VBA) knowledge, you can easily modify the code to meet your particular needs. Once adapted, it will assist in risk analysis of critical path and near critical path work streams in any number of maintenance activities mapped using Microsoft Project.

Maintenance Risky Business Fig 2
Figure 2: ‘Monte Carlo’ Analysis Results Table

Figure 2: ‘Monte Carlo’ Analysis Results Table is just one example of the outputs that can be produced. It shows the calculated expected duration of each task, its standard deviation, the required task duration to ensure 75% chance of on time completion, the required task duration to ensure a 95% chance of on time completion and recommends review of tasks based on statistical measures. This output can then be used to determine areas within work streams where effort should be focussed and actions are needed to ensure unacceptable risk to schedule overrun is reduced.

We can then start applying some typical risk mitigation approaches:

AcceptanceNot changing the shutdown or project plan to deal with a risk. This is usually done when the risk is determined to be either very unlikely or have a very low impact.
AvoidanceChanging the plan to eliminate a risk or protect a project’s objectives from the impact of the risk.
ContingencyDevelop a plan of action to deal with the time or cost impacts of a risk occurring.
Risk ReductionActions to reduce the likelihood of a risk occurring or the impact upon the project if it did.
Risk TransferShift the impact of a risk (along with its ownership) to a third party.

Once all identified areas of risk are addressed, we can tackle our maintenance activity with the confidence we’ve done all we can to ensure we haven’t overlooked important steps, or underestimated a particular task. The risk of activity overrun will be reduced, and most importantly, we’ll have added value to our businesses bottom line.

If you enjoyed this article and want to receive notifications of future articles that we publish, please sign up for our newsletter here.

Back to top