Repairing systems and components when they break or otherwise malfunction, is one of the most common things that happen in any production, processing, or business operation.
Repairs can involve shutting down a system entirely, but work may continue if redundant, parallel operations are available. Many nuclear power plant systems included a high degree of redundancy so operations could be extremely robust. Plants can often run for many months straight without interruption. Business processes can be designed the same way.
Repairs to physical systems may require replacing all or part of any machine, assembly, or fitting, or making physical repairs through welding, patching, pulling dents, tightening and replacing fasteners, and so on.
Repairs to computer and communication systems may involve simlar fixes, but may also include changes to configurations, permissions, or even code.
Some repairs will be obvious. Identifying the proper thing to repair may require extensive reseach and troubleshooting. In those cases it is important to understant the full scope of the system so you can examine all possibilities.
Systems and components can (and should be) designed to fail gracefully, in a way that will not cause undue damage or risk.
FMEA (Failure Mode and Effects Analysis) is an organized and deliberate way to identify as many possible modes of failure as possible, so contingency plans can be made to handle every eventuality. Designs can also be changed in an attempt to reduce the chance of some kinds of failures.
Systems can be monitored so action can potentially be taken before failures occur. Naturally, this is not possible for every kind of failure.
The proper tooling and spare parts and materials should be kept on hand and supplies replenished when necessary, especially to address the most common failures.
Next: Approach 13: Improved Management Techniques
Prev: Approach 11: General Technical Solution
Introduction | ||
|
||
Approaches | ||
|
||
Considerations | ||
|