IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, VOL. 34, NO. 6, NOVEMBER/DECEMBER 1998 1387 Reliability and Availability Assessment of Electrical and Mechanical Systems Stephen J. Briggs, Michael J. Bartos, and Robert G. Arno Abstract— Ensuring reliable electric power to critical equipment is vital to the operation of many industrial, commercial, and military facilities. Accurate assessments of electrical system reliability are needed to determine if improvements are necessary and, in many cases, these assessments are required as a part of any new construction. However, it is generally impossible to perform actual testing on the system, and hand calculations are also impossible for all but the most trivial systems. There are computer modeling tools which calculate system reliability or availability for electrical or mechanical systems based on the reliability of individual components, how those components are interconnected, and performance criteria for the system based on facility requirements. Other computer modeling tools are frequently cumbersome to use, require access to a mainframe computer, and base the results on suspect data compiled in the 1970’s. The Reliability and Availability Modeling Program is designed to perform reliability analysis using component operational and maintenance data on 234 items in the categories of power generation, power distribution, and heating, ventilation, and air conditioning. Index Terms—Availability, electrical systems, mechanical systems, modeling, reliability. I. INTRODUCTION RAMP is a tool which calculates system reliability or availability for electrical or mechanical systems based on the reliability of individual components, how those components are interconnected, and performance criteria for the system based on mission requirements. RAMP can be used as a design tool for most systems; by trying a number of different system configurations and computing the system reliability for each, the design engineer can find a minimum cost method to achieve the desired reliability. Other computer modeling tools, such as the GO program1 are frequently cumbersome to use and may base the results on suspect data compiled in the 1970’s. The component data is a culmination of a 24 000-h effort which collected data on 234 items in the categories of power generation, power distribution, and heating, ventilation, and air conditioning (HVAC). The minimum requirements for data collection were as follows: • minimum of five years of operational data collected; • minimum sample population of 40 with a maximum site allocation of ten items each; • minimum of 3 500 000 operating hours total for each component. E NSURING reliable electric power to critical equipment is vital to the military and facilities. Accurate assessments of electrical system reliability and availability are needed to determine if improvements are necessary and, in many cases, these assessments are required as a part of any new construction. The Power Reliability Enhancement Program (PREP) of the U.S. Army Center for Public Works sponsored the development of a computer software package called the Reliability and Availability Modeling Program (RAMP) and an extensive data-gathering effort. They also publish a Draft Design Features Manual (DFM)  for Utility Systems Design Requirements for Command, Control, Communications, Computers, and Intelligence (C4I) Facilities. The DFM requires the use of a computer program to assure that overall system reliability goals are met and recommends the use of RAMP and the reliability and availability data gathered by the PREP effort. Paper ICPSD 95–75, presented at the 1995 Industry Applications Society Annual Meeting, Lake Buena Vista, FL, October 8–12, and approved for publication in the IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS by the Power Systems Engineering Committee of the IEEE Industry Applications Society. Manuscript released for publication November 25, 1996. S. J. Briggs is with the Construction Engineering Research Laboratory, Corps of Engineers, Department of the Army, Champaign, IL 61826-9005 USA. M. J. Bartos is with the National Military Command Center, SAM-LAFN, Pentagon, Washington, DC 20330 USA. R. G. Arno is with IIT Research Institute, Rome, NY 13442 USA. Publisher Item Identifier S 0093-9994(98)08111-0. II. RAMP SAMPLE PROBLEM Fig. 1 shows a sample electrical distribution system. Before this system can be converted to a RAMP model, the modeler must decide what the necessary and sufficient conditions are for proper functioning at each point in the system. For purposes of example, assume the following conditions are true. • Any one of the three diesel generators can supply the load for Bus A. • Either Bus A or commercial power can supply Bus B. • The maintenance bypass connecting Buses D and E is not an acceptable path for power flow; the static switch or UPS must be functional to supply Bus E. (Different assumptions about system success might allow for power flow through the bypass.) • While the critical load is served off of Bus E, we are also interested in the reliability at noncritical Bus C. • The tie-breaker between Buses C and D is normally open. • Bus C can be served either directly from Bus B, or through Bus D and the tie-breaker. • Bus D can be served either directly from bus B, or through bus C and the tie-breaker. 1 Electric Power Software Center, University Computing Company, Dallas, TX. 0093–9994/98$10.00 1998 IEEE 1388 IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, VOL. 34, NO. 6, NOVEMBER/DECEMBER 1998 2) Component Nodes: A component node models a piece of equipment, the successful output of which requires a successful input and proper functioning of the component. B. Logical Nodes Fig. 1. One-line diagram of sample distribution system. The RAMP model contains information on component reliability, how those components are interconnected, and what is required of the system for successful completion of the mission. Once each component in the system has been identified, the physical connectivity of the system is used along with the mission requirements to create a logical model of the system. The basic building block of a RAMP model is the node. Nodes can be either logical or physical, depending on their function in the model. All nodes have zero or more inputs and exactly one output. We use the term “signal” to refer to node outputs. Physical nodes correspond to actual pieces of equipment and have a reliability associated with them. Logical nodes correspond to interconnections within the system or represent constraints imposed by mission requirements. A. Physical Nodes 1) Input Nodes: These nodes are signal sources for RAMP; they do not have any inputs and are not dependent on other nodes for successful operation. 1) AND Node: This is a logical node that represents the constraint “the output is valid if and only if all inputs are valid.” 2) OR Node: This is a logical node that represents the constraint “the output is valid if and only if at least one input is valid.” 3) M–N Node: This represents a point in the system with inputs where the output is functional if at least of the inputs are functional. A bus with six possible sources of power that required any four of them to be functional would be – node. RAMP internally calculates represented by an the reliability of an – node as a combination of AND and OR nodes. A macro is a grouping of nodes into a larger aggregate for convenience. For example, the modeler can define a macro to describe the physical and logical structure of a diesel generator and then simply enter the generator’s macro as one node in the model. Macros may be nested. RAMP expands out macros into their individual nodes when it reads the input files—macros are provided as a convenience feature only and in no way affect the performance or limitations of RAMP. Using the data compiled in the Appendix, the analyst should determine proper node types for each item on the one line. The analyst should then convert the one line to a graphical representation of the system operation using the node types and kind numbers, which are used by the program as reference numbers into the data table. Control loops in which feedback signals propagate from downstream components to upstream components are not allowed in the model. If reliability is affected by items in the control loop, the influence of those items must be reduced to a series operator. This limitation also applies to other cases (such as some circuit breaker failure modes) where failure of downstream components may impact the reliability of upstream components. Nodes which are not independent must have their dependency included explicitly in the model. For example, a substation might have two sources of utility power where the two sources were not independent. In this case, the two utility inputs would have to be modeled as two independent sources which would each be ANDed with a third source representing the dependent failure modes. A structure is created which resembles a tree of nodes. At the top are input node, which represent sources of signals, such as batteries or sources from outside the area of interest such as power coming in at the main service entrance. Connected below the input nodes, and depending on the inputs for proper operation, are other nodes, which represent components and interconnections in the system. Signals flow down through these nodes until they reach the bottom or output nodes, which have no nodes connected to their outputs. The RAMP model corresponding to the sample electrical system is shown in Fig. 2. For the purposes of this illustration, BRIGGS et al.: RELIABILITY AND AVAILABILITY ASSESSMENT OF ELECTRICAL AND MECHANICAL SYSTEMS 1389 we assume cable will have a reliability of one and neglect it; in reality, it would be included as specific components in the model. In general, RAMP needs to be told what signals are of interest, i.e., which signals should have their output reliabilities computed and printed out. Any node in the model, the output signal of which is not used elsewhere in the model will automatically have its reliability computed and printed out. Also, the user may explicitly request reliability information at a specific node. RAMP will solve the system for the requested reliabilities. III. RAMP SOFTWARE DESCRIPTION Fig. 2. RAMP model of sample distribution system. Note RAMP has added a fictitious node at the bottom of the model to facilitate its computations. The numbers in the nodes are kind numbers that refer to specific components in the data table. The boxed region in the middle corresponds to the model of two buses connected with a tie-breaker. The “perfect operator” has reliability one and is needed to provide an input to the two bus operators and the tie-breaker operator. There are several methods used to perform reliability calculations ; RAMP uses a combination of network reduction and state-space methods to determine the system reliability. RAMP uses component reliability or availability values (each between 0 and 1) to express the reliability of each component. These values can be calculated from mean time between failures (MTBF) and mean time to repair (MTTR) values. RAMP is written in ANSI C and has been compiled and tested on a variety of 32-bit PC’s and workstations. RAMP reads data from two files; the first contains the reliability data for individual components (the kind data), and the other contains the system configuration information. The system description file contains information about the physical connections between various pieces of equipment and also logical information about the proper functioning of the system. RAMP searches for all nodes that are outputs of the system, that is, their output is not connected to another node. If there is more than one such output node, RAMP creates a fictitious output node which has all the other output nodes for inputs. This is done so that the system can have a unique output and is required by the system parsing algorithm. At this point, the system is ready for RAMP to begin reliability calculations. RAMP treats the system as a tree of nodes. It begins at the unique bottom and processes the tree up from that point. It searches up the tree for the tops of branches (represented by input components) and then returns down the tree and computes reliabilities. As RAMP traverses the tree, it attempts to recursively simplify the system using a combination of network reduction and direct evaluation techniques. There are two distinct cases where RAMP is capable of performing network reduction: series combinations and isolated subsystems. RAMP can combine components connected in series into one new component. Components are connected in series if and only if the following statements are true: 1) the output from the parent component is connected only to the child component and 2) the child component has no inputs other than the parent component. RAMP can replace any number of components connected in series with one new component. Fig. 3 shows the RAMP model of the one line after RAMP has combined all series elements. RAMP can distinguish and isolate subsystems characterized by isolation from the rest of the system except for one input signal and one output signal. Every node in the subsystem must be dependent on the one input signal and the output 1390 IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, VOL. 34, NO. 6, NOVEMBER/DECEMBER 1998 TABLE I EXCERPT OF OLD RELIABILITY NUMBERS FROM THE DESIGN FEATURES MANUAL  COMPILED IN 1980 (SOME DATA ADAPTED FROM IEEE STANDARD 493-1980 ) of the system output component. This Boolean function is 1 if the output of the system is functional and 0 if the output of the system is invalid. It expresses all the dependence of the system output on the individual component reliabilities. RAMP evaluates the Boolean function by direct sampling over all possible input state vectors to compute a numerical reliability. RAMP first determines all the independent variables in the function. It then begins to examine all possible combinations of working and failed inputs and tests the output is the reliability of node , then the function for validity. If , can be probability of any given input state vector, easily computed as (1) Fig. 3. RAMP model of sample distribution system after RAMP has performed some simplifications. The portion in the box is a subsystem that RAMP will find and simplify. signal must be dependent on every node in the subsystem. Graphically, the subsystem can be isolated by drawing a box around it which has only two signals passing through it, an input signal which every component in the subsystem depends upon and an output signal from the subsystem. RAMP can also isolate similar subsystems that have no input signals; instead, the subsystem has input nodes as signal sources. Fig. 3 shows examples of both subsystems. Once RAMP finds these subsystems, it uses a state-space method to determine the reliability for the subsystem and replaces the subsystem with a new component with reliability equal to the subsystem reliability. After RAMP has performed all the simplifications it can, it can evaluate the whole system. RAMP treats a system as a collection of sources of logical signals, AND gates, and OR gates and uses Boolean algebra to calculate a sum of products expansion of the Boolean function where the product over is for all the nodes that are working and the product over is for all the failed nodes. RAMP begins its evaluation by trying the state with all components working. It computes the probability of that state and evaluates the output function. If the output function is 1, the system reliability is greater than or equal to the probability of all components being functional. Then, RAMP considers all single mode failures (input vectors with only one component failed). For each, it computes the probability of that state, evaluates the output function, and, if the function is valid, adds that probability to the output reliability. Essentially, RAMP is calculating the fraction of input vectors for which the output state is valid. Systems with many components may have an extremely large state space; direct calculation of all possible input vectors is then computationally infeasible. Fortunately, in these cases, the probability of a significant number of components failing simultaneously is generally negligible and RAMP utilizes a user-defined error parameter to determine when to stop sampling of the input space. For a typical system, RAMP will sample all but 10 % of the input space after examining all input states with up to five simultaneous failures. IV. RELIABILITY AND MAINTAINABILITY DATA Prior to this study, PREP used a table of Reliability and Maintainability data developed in the late 1970’s compiled from multiple sources (see Table I for a section of the table). This table contradicts itself numerous times and contains some confusing data. For example, using the 1970’s data, a synchronous motor over 600 V will be down approximately 6 h BRIGGS et al.: RELIABILITY AND AVAILABILITY ASSESSMENT OF ELECTRICAL AND MECHANICAL SYSTEMS per year due to failures, while a synchronous motor under 600 V will only be down 90 s year. This could be justified if the MTTR numbers were greatly different and the MTBF was relatively the same. In this case, it would mean that, while both motors failed at relatively the same time period, the higher voltage motors were not kept in spare parts stock, which resulted in extended acquisition time. However, a comparison of MTBF between the two motors indicates that this is not the case. Given these sorts of disparities, it became apparent that the data needed updating. A. Data Results Data results of this study are included in the Appendix. The reliability is calculated for total period hours and the MTTR numbers are also listed. The data should be adjusted using numbers that are representative of the site’s maintenance, spare parts stock, and manpower. The designer should consider using data already included in [2, pp. 222–225]. B. Discussion of Results The discussions which follow highlight the plausibility of the results and the ease of use. First, we will examine the previous problem with the motors. In the previous table, again, there is a difference between lower voltage and higher voltage synchronous motors. Higher voltage motors are out of service about 39 min per year, with lower voltage motors being out of service for an average of 35 s per year. While these numbers are different, the MTBF’s are approximately the same. The differences can be attributed to the logistics of procuring and delivering the larger motors and parts not normally kept in stock. Second, the results of our study are easier to use for the RAMP reliability program than earlier data. The RAMP program can now be used to find availabilities using a single value for a single piece of equipment. When using the old data, the modeler was limited to 120 items and had added operations (computational nodes) to account for maintenance. Now, we have data points for 239 types of equipment and the availability number contains maintenance. We can break out maintenance into categories such as no maintenance, corrective only, GSA estimated maintenance, or normally/commercially performed maintenance. In short, we can tailor the data to more closely resemble the modeled site. C. Collecting Data In order to collect statistically sound data on each of the items, we adopted the following guidelines. • Component size was important to ensure similar technologies. • A minimum of five years of operational data was collected. • The minimum sample population was 40. • There were no more than 10 samples of any one category at any one site. This limit was critical to eliminate any data skewing which could be caused by the influence of one or more data points in a small population, particularly those due to abnormal operating or maintenance procedures. 1391 • Locations surveyed had varying degrees of maintenance. • Age of equipment was important, since we wanted only new equipment and technologies. • There was a minimum of 3 500 000 operating hours total for each component. Components with no failures during this period were arbitrarily assumed to have a failure rate of one for the period observed (all components with no failures have at least 4 200 000 hours of data). Due to the low number of observed failures, the uncertainty in MTBF is large for components with large MTBF. We have flagged those components for which we observed less than eight failures. Since the model does not directly use MTBF, we use the failure rate, period, and MTTR data to calculate reliabilities for use in the model. It can be shown that the uncertainty in reliability is small for components with large MTBF, even if the uncertainty in MTBF is large. It is only for small MTBF’s that it is important to minimize uncertainty in MTBF. In any case, real-world system failures are dominated by components with lower reliabilities, and it is much more important to correctly model the items with numerous failures than it is to put every near-perfect (no failures) component into the model. D. Data Summarization As with every collection program, there are varying degrees of data gathered. Some of the data sources had complete records and could give statistics on operational characteristics on every piece of equipment from installation to the present time. More often, the only items tracked were major items, such as cooling towers, generators, and boilers. Data for items such as valves and filters were not usually recorded. Other problems included incomplete or noncurrent versions of the blue prints. Numerous technicians had to develop parts lists by hand, recording data from nameplates and relying on facility engineers for component descriptions. It is important to determine and be aware of the different levels of data quality. Data quality can be separated into four groups. 1) Perfect Data: Perfect data would include a parts list, failure history data with time to failure numerics, item descriptions, parts stock, operational periods, and ten continuous years of service. No engineering judgement is necessary for reduction. The database is comprised of 10%–20% of this type data. 2) Not Perfect Data: Not perfect data is defined as data with no serious flaws, but the data collection process demanded additional time to ensure useful information was gathered. Examples of this type of data would include parts lists determined by inspection, incomplete blueprints, or a service period less than ten years. The database is comprised of 35%–40% of this type of data. 3) Verbal/Inspection Data: This category of data had serious gaps which could not be utilized in the database until corrected. Items included were typically major items, such as generators and boilers. Interviews of senior maintenance personnel were used to supplement the data. An example of this would be a description of a particular failure or clarification of repair times. The database is comprised of 25% of this type of data. 1392 IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, VOL. 34, NO. 6, NOVEMBER/DECEMBER 1998 4) Soft Data: Soft data was the most difficult type of data to collect. Data collection involved working from scheduling sheets and repair records of repair companies. The database is comprised of 10%–15% of this type of data. Analysts identified and collected data from locations which had a variety of maintenance policies and practices. Maintenance policies and practices directly affect the availability of the equipment. High levels of maintenance will lower availability, but have the potential to increase reliability. The amount of maintenance performed can drastically affect the performance parameters being collected. V. CONCLUSIONS The data and methodology presented here are very useful in determining the site reliability or availability. The actual number produced for a given system may not be totally definitive; however, comparisons between system numbers are of great value. The data and procedure can be used in different manners to aid the facility designers and the facility engineers. The designers can use the software and data to evaluate different designs. They can also estimate occurrence of downtimes by adding the failure times to the production or mission loss and can estimate the total length of time from line stop to line start as a result of failures. New designs or redesigns can be evaluated to minimize the production or mission failure with estimates on money saved by avoiding downtime. The facility engineer can use the data and software to estimate downtimes associated with the systems or subsystems and compare these results to the actual downtimes. This could identify problem areas that may need more (or less) maintenance time and systems that may benefit from redundancy or replacement. The RAMP program, along with the user manual which contains the reliability and availability data, may be ordered from the U.S. Army Center For Public Works, Attn.: CECPWK, 7701 Telegraph Road, Alexandria, VA 22315-3862 USA. APPENDIX RELIABILITY DATA Description MTBF (h) MTTR (h) Reliability BRIGGS et al.: RELIABILITY AND AVAILABILITY ASSESSMENT OF ELECTRICAL AND MECHANICAL SYSTEMS 1393 1394 IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, VOL. 34, NO. 6, NOVEMBER/DECEMBER 1998 BRIGGS et al.: RELIABILITY AND AVAILABILITY ASSESSMENT OF ELECTRICAL AND MECHANICAL SYSTEMS 1395 1396 IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, VOL. 34, NO. 6, NOVEMBER/DECEMBER 1998 REFERENCES  Draft Design Features Manual, U.S. Army Center for Public Works, Power Reliability Enhancement Program, Ft. Belvior, VA, 1980.  Design of Reliable Industrial and Commercial Power Systems, IEEE Standard 493-1990.  W. H. Dickinson, P. E. Gannon, C. R. Heising, A. D. Patton, and D. W. McWilliams, “Fundamentals of reliability techniques as applied to industrial power systems,” in Conf. Rec. IEEE Industrial and Commercial Power Systems Tech. Conf., 1971, pp. 10–31.  C. Singh and R. Billinton, System Reliability Modeling and Evaluation. London, U.K.: Hutchinson, 1977.  Design Features Manual, U.S. Army Center for Public Works, Power Reliability Enhancement Program, Ft. Belvior, VA, 1980. Stephen J. Briggs received the B.A. degree in physics and mathematics from Knox College, Galesburg, IL, and the M.S. degree in nuclear engineering and the Ph.D. degree in electrical engineering from the University of Illinois at Urbana-Champaign in 1982, 1985, and 1990, respectively. Currently, he is a Principal Investigator at the Army Corps of Engineers Construction Engineering Research Laboratory (USACERL), Champaign, IL, where he is involved in the causes and effects of harmonics on power systems. Other interests include reliability/availability modeling tools for electrical systems, electric vehicles, applications of neural networks to electrical power systems, and electrical energy storage technologies. Michael J. Bartos received the B.S. degree in mechanical engineering from The Pennsylvania State University, University Park, in 1985. Currently, he is the Facility Engineer, Site Project Manager, National Military Command Center, Pentagon, Washington, DC. Previously, he was a Project Engineer for The Power Reliability Enhancement Program, Center for Public Works of the Army Corp of Engineers. Throughout his career, he has been developing reliability modeling techniques and reliability and maintainability data. He recently served as a consultant on the Pentagon renovations in the areas of power quality, low harmonic fluorescent lighting, reliable power and HVAC designs and command, control, communications, computer, and intelligence facility requirements. Robert G. Arno received the B.S. degree in electrical engineering from the State University of New York at Utica/Rome in 1982. He has worked in the reliability field for 19 years, joining IIT Research Institute, Rome, NY, in 1977. His principal responsibilities include program management, electrical and mechanical system analysis and modeling, and data collection and analysis. His most recent program included the data collection of power generation, distribution, and HVAC data. Mr. Arno is a member of the American Society for Quality Control.