CGN Edge Blog

Perils Of Problem Management (Root Cause Analysis) (2 Of 2)

May 24, 2018 Posted by: CGN Team
Hayyal Ighneim

Perils Of Problem Management (Root Cause Analysis) (2 Of 2)

When we last left our heroes (you), we were discussing the traps or perils of conducting root cause analysis to drive out major IT “Problems” or issues.  I can’t emphasize enough that this process improvement methodology, or undertaking, like any other, is worth executing even if there are some common traps.  Therefore, in today’s blog I’m going to conclude with the last two major traps or perils.

Peril #4: Identifying a person or group as the root cause

Sure we rely on spinning disks, routers, networks, and assembly lines; all machines that could fail. In my previous post, ‘Perils,’ I mentioned  that when these pieces of equipment, hardware or software fail, there is a deeper root cause.  The same can be true of ‘operator’ error, or human failure.

LEAN has its roots in manufacturing, as does much of root cause analysis and Problem Management. One of the key lessons that Quality Management Systems teach us is that it is more often than not a ‘system’ failure, not a person that is at the root cause of system outages or malfunctions.

If your root cause investigation, for example, leads you to a conclusion that one of the operators  of the system (humans), failed; there are critical questions you as a leader or root cause investigation team must ask.  There is a deeper root cause.  Questions might include: “Did he/she have the right training?  Were their procedures well documented?  Did they understand ‘why’ they are pulling levers, or are they robotic in their activities?  Are they incentivized/disincentivized properly?”

Yes, people make mistakes, but consider that we have examples of people operated processes, such as air travel, that use checklists and rigor to stamp out pilot error to a near zero factor.  Sure airlines lose luggage, but that’s because they have made a business decision to focus their effort on safety rather than convenience. Depending on what your operators are doing, you may decide to invest little or much in their training and abilities to make sure your system is fail-safe.

In other words, I am suggesting that when you identify root cause as operator error, consider carefully the ROI of driving out that error vs. the acceptable risk level.  This relates of course to a previous peril, “Pick the right problems.”

Peril # 5: 

This peril perhaps carries the biggest temptation.  That is, upon investigation you discover the root cause of the failure is an outside process.  Perhaps another team, upstream or downstream in your process map.  Perhaps it is a leadership failure or a failure due to the culture of the ‘troops.’

The reality is that these are often true causes of failure.   However, even if the above are accurately identified as root causes, you must not close up your notebooks and live with this failure indefinitely.  We all have a responsibility to affect change and drive out these problems, even if they are outside of our direct sphere of control.  We must raise awareness and not blame.  We must collaborate, not bifurcate.

So when the root cause isn’t you or your team, it means you have a little more work to do.  But it’s worth doing.

Stay tuned for more of my blogs.  Next topic: “Why IT can’t communicate effectively.”

By Hayyal Ighneim, Manager, IT Consulting