Robust Barriers and Good Design
02.24.06 22:15 Filed in: Philosophy
| Operation
of Complex Systems
Lurking about my favorite root cause analysis
forum, reading the conversations about poka-yoke
and robust barriers triggered a few thoughts,
which are captured here. These are meant to
encourage and challenge our collective thinking,
not criticize, though sometimes it is difficult
to sound that way.
The issue of sound design, which is at the heart of poka-yoke and is central to the concept of robust barriers, seems to be worth discussing. Good design is the practice of anticipating the problems, breakdowns, and errors that can occur when equipment, systems, or processes are operated or executed, and incorporating features that PRECLUDE the possibility of errors occurring as a matter of design. It is a design issue - it takes place before the installation and operation of equipment or the execution of processes. Note the emphasis on preclude - this is important, and is echoed in the material available on poka-yoke, that the possibility of that error is eliminated - made impossible.
Errors have occurred in hospitals where oxygen and nitrous oxide connectors allowed patients to be administered the wrong gas, resulting in several deaths. It was argued that the connectors were painted different colors to differentiate the two gases and avoid a mixup. Clearly this was not effective.
Painting O2 and nitrous oxide connections different colors is interior decorating - it can help alert people to differences, but it is not good design. Creating connectors that are different shapes that precludes connecting to the wrong source is better design. Creating different shaped connectors that do not rely on a fragile alignment pin or tab that can break off is even better design, because it anticipates two failures - one of operator error (plugging into wrong system) and equipment breakage (no tab to fail, the square plug won't fit in the round hole). Creating different shaped connectors with no fragile tab that also automatically isolate when disconnected addresses a third failure, that of uncontrolled gas release. The design gets better all the time.
Same thing with electrical appliance plugs. The error of connecting an ungrounded equipment case to the hot wire in the wall socket, which could kill someone, is prevented by, among other things, a polarized plug, or in other cases, the three prong plug. These designs anticipates errors in usage - user tries to plug into wall the wrong way, but does not prevent installation errors - the electrician could have wired the outlet incorrectly.
In a certain nuclear power plant control room, control switches for different pieces of equipment are similar in shape and co-located, resulting in several significant errors where the operator turned off the wrong component. To minimize the possiblity of this happening, covers were installed over the one type of switches, to alert the operator that this was one type of component, not the other. This has prevented the error so far, and has been touted as an example of poka-yoke, or a robust barrier, though I would argue that it is not.
A cover over a switch is not a robust barrier for mis-operation. It works when it works because it is a second action required on the part of the switch operator that is unusual, thereby causing him to examine his work process by triggering the question, "Do I have the right switch?" I would consider it a workaround, however effective, and not good design, in the application where it was used to prevent deliberate operation of the wrong switch. My speculation is that if you covered all switches with the same covers, the error rate would go back up, because there would be nothing unique to remind operators to second check themselves. That doesn't make it bad or ineffective necessarily, but I wouldn't consider it robust. It is easily defeated by deliberate or inadvertent mis-operation.
By contrast, the switch cover would be considered good design if the switch were in a location that resulted in it being bumped accidently. (See the multiple NTSB reports on train crashes where a contributing factor was the engineer's knee bumping the switch that turned off dynamic braking in the other engines on multi-engine trains.)
Interestingly enough, robust barriers and good design tend to work well to prevent errors of 'how' and 'where', but not so well on errors of 'when' or 'what'. That is, the gas connector and the polarized plug force correct connections (issues of 'how'). They don't control when - you can plug in gas or electric whenever you want. The switch cover protects against accidental bumps (an inadvertent 'when' error), but not against intentional errors, errors of 'what' (I wanted pump A but I operated pump B) or 'when' (it is OK to switch off pumps when shutdown, but not when operating).
Maybe it would be better said that 'when' and 'what' errors require more complex solutions, from a design perspective. The Rod Worth Minimizer (RWM), and Rod Block Monitor (RBM) at a nuclear BWR are good examples. If you only want control rods moved under certain conditions (when) and in certain configurations (what) then the RWM and RBM represent a very complex system that enforces those constraints, usually right up to the moment that they are overridden!
The struggle to develop good design that prevents 'when' and 'what' errors is well-documented, and has been a challenge. The Airbus crash at the Paris air show is an example of a failure in this battle. The design of a complex system, designed to eliminate many common pilot errors in flight, did not anticipate a unique but not unusual flight condition, where the airframe was in a landing configuration, but the pilot wanted to 'go-around' and applied full power. The design of the fly-by-wire control system prevented the pilot from manipulating the aircraft as desired, resulting in the deaths of all on board.
An example of how 'what' errors are being effectively addressed includes the maintenance lock-out system on electrical breakers, where you cannot close a breaker until the lock is removed, and you can't remove it until the technician is done with maintenance, because he has the only key. An example of 'when' errors being addressed effectively is the two key system for submarine ballistic missile launches, which requires both keys to be turned at separate locations within a fraction of a second. You can't launch a missile unless two separate individuals agree on when (and multiple other criteria!).
In conclusion - components of good design include anticipating problems in advance (or recognizing them in failure and recovering through organizational learning), and modifying the design to preclude those failures or errors.
-----------
This article and incorporated images are ©2006 Brad Williamson All Rights Reserved. Permission is granted for reproduction not for profit in its entirety including this copyright notice.
-----------
The issue of sound design, which is at the heart of poka-yoke and is central to the concept of robust barriers, seems to be worth discussing. Good design is the practice of anticipating the problems, breakdowns, and errors that can occur when equipment, systems, or processes are operated or executed, and incorporating features that PRECLUDE the possibility of errors occurring as a matter of design. It is a design issue - it takes place before the installation and operation of equipment or the execution of processes. Note the emphasis on preclude - this is important, and is echoed in the material available on poka-yoke, that the possibility of that error is eliminated - made impossible.
Errors have occurred in hospitals where oxygen and nitrous oxide connectors allowed patients to be administered the wrong gas, resulting in several deaths. It was argued that the connectors were painted different colors to differentiate the two gases and avoid a mixup. Clearly this was not effective.
Painting O2 and nitrous oxide connections different colors is interior decorating - it can help alert people to differences, but it is not good design. Creating connectors that are different shapes that precludes connecting to the wrong source is better design. Creating different shaped connectors that do not rely on a fragile alignment pin or tab that can break off is even better design, because it anticipates two failures - one of operator error (plugging into wrong system) and equipment breakage (no tab to fail, the square plug won't fit in the round hole). Creating different shaped connectors with no fragile tab that also automatically isolate when disconnected addresses a third failure, that of uncontrolled gas release. The design gets better all the time.
Same thing with electrical appliance plugs. The error of connecting an ungrounded equipment case to the hot wire in the wall socket, which could kill someone, is prevented by, among other things, a polarized plug, or in other cases, the three prong plug. These designs anticipates errors in usage - user tries to plug into wall the wrong way, but does not prevent installation errors - the electrician could have wired the outlet incorrectly.
In a certain nuclear power plant control room, control switches for different pieces of equipment are similar in shape and co-located, resulting in several significant errors where the operator turned off the wrong component. To minimize the possiblity of this happening, covers were installed over the one type of switches, to alert the operator that this was one type of component, not the other. This has prevented the error so far, and has been touted as an example of poka-yoke, or a robust barrier, though I would argue that it is not.
A cover over a switch is not a robust barrier for mis-operation. It works when it works because it is a second action required on the part of the switch operator that is unusual, thereby causing him to examine his work process by triggering the question, "Do I have the right switch?" I would consider it a workaround, however effective, and not good design, in the application where it was used to prevent deliberate operation of the wrong switch. My speculation is that if you covered all switches with the same covers, the error rate would go back up, because there would be nothing unique to remind operators to second check themselves. That doesn't make it bad or ineffective necessarily, but I wouldn't consider it robust. It is easily defeated by deliberate or inadvertent mis-operation.
By contrast, the switch cover would be considered good design if the switch were in a location that resulted in it being bumped accidently. (See the multiple NTSB reports on train crashes where a contributing factor was the engineer's knee bumping the switch that turned off dynamic braking in the other engines on multi-engine trains.)
Interestingly enough, robust barriers and good design tend to work well to prevent errors of 'how' and 'where', but not so well on errors of 'when' or 'what'. That is, the gas connector and the polarized plug force correct connections (issues of 'how'). They don't control when - you can plug in gas or electric whenever you want. The switch cover protects against accidental bumps (an inadvertent 'when' error), but not against intentional errors, errors of 'what' (I wanted pump A but I operated pump B) or 'when' (it is OK to switch off pumps when shutdown, but not when operating).
Maybe it would be better said that 'when' and 'what' errors require more complex solutions, from a design perspective. The Rod Worth Minimizer (RWM), and Rod Block Monitor (RBM) at a nuclear BWR are good examples. If you only want control rods moved under certain conditions (when) and in certain configurations (what) then the RWM and RBM represent a very complex system that enforces those constraints, usually right up to the moment that they are overridden!
The struggle to develop good design that prevents 'when' and 'what' errors is well-documented, and has been a challenge. The Airbus crash at the Paris air show is an example of a failure in this battle. The design of a complex system, designed to eliminate many common pilot errors in flight, did not anticipate a unique but not unusual flight condition, where the airframe was in a landing configuration, but the pilot wanted to 'go-around' and applied full power. The design of the fly-by-wire control system prevented the pilot from manipulating the aircraft as desired, resulting in the deaths of all on board.
An example of how 'what' errors are being effectively addressed includes the maintenance lock-out system on electrical breakers, where you cannot close a breaker until the lock is removed, and you can't remove it until the technician is done with maintenance, because he has the only key. An example of 'when' errors being addressed effectively is the two key system for submarine ballistic missile launches, which requires both keys to be turned at separate locations within a fraction of a second. You can't launch a missile unless two separate individuals agree on when (and multiple other criteria!).
In conclusion - components of good design include anticipating problems in advance (or recognizing them in failure and recovering through organizational learning), and modifying the design to preclude those failures or errors.
-----------
This article and incorporated images are ©2006 Brad Williamson All Rights Reserved. Permission is granted for reproduction not for profit in its entirety including this copyright notice.
-----------
|
