by user

Category: Documents





Reliability and Availability Assessment of
Electrical and Mechanical Systems
Stephen J. Briggs, Michael J. Bartos, and Robert G. Arno
Abstract— Ensuring reliable electric power to critical equipment is vital to the operation of many industrial, commercial,
and military facilities. Accurate assessments of electrical system
reliability are needed to determine if improvements are necessary
and, in many cases, these assessments are required as a part
of any new construction. However, it is generally impossible to
perform actual testing on the system, and hand calculations are
also impossible for all but the most trivial systems. There are
computer modeling tools which calculate system reliability or
availability for electrical or mechanical systems based on the
reliability of individual components, how those components are
interconnected, and performance criteria for the system based
on facility requirements. Other computer modeling tools are
frequently cumbersome to use, require access to a mainframe
computer, and base the results on suspect data compiled in the
1970’s. The Reliability and Availability Modeling Program is
designed to perform reliability analysis using component operational and maintenance data on 234 items in the categories of
power generation, power distribution, and heating, ventilation,
and air conditioning.
Index Terms—Availability, electrical systems, mechanical systems, modeling, reliability.
RAMP is a tool which calculates system reliability or
availability for electrical or mechanical systems based on the
reliability of individual components, how those components
are interconnected, and performance criteria for the system
based on mission requirements. RAMP can be used as a
design tool for most systems; by trying a number of different
system configurations and computing the system reliability for
each, the design engineer can find a minimum cost method
to achieve the desired reliability. Other computer modeling
tools, such as the GO program1 are frequently cumbersome
to use and may base the results on suspect data compiled in
the 1970’s.
The component data is a culmination of a 24 000-h effort
which collected data on 234 items in the categories of power
generation, power distribution, and heating, ventilation, and
air conditioning (HVAC). The minimum requirements for data
collection were as follows:
• minimum of five years of operational data collected;
• minimum sample population of 40 with a maximum site
allocation of ten items each;
• minimum of 3 500 000 operating hours total for each
NSURING reliable electric power to critical equipment is
vital to the military and facilities. Accurate assessments
of electrical system reliability and availability are needed
to determine if improvements are necessary and, in many
cases, these assessments are required as a part of any new
The Power Reliability Enhancement Program (PREP) of the
U.S. Army Center for Public Works sponsored the development of a computer software package called the Reliability
and Availability Modeling Program (RAMP) and an extensive
data-gathering effort. They also publish a Draft Design Features Manual (DFM) [1] for Utility Systems Design Requirements for Command, Control, Communications, Computers,
and Intelligence (C4I) Facilities. The DFM requires the use of
a computer program to assure that overall system reliability
goals are met and recommends the use of RAMP and the
reliability and availability data gathered by the PREP effort.
Paper ICPSD 95–75, presented at the 1995 Industry Applications Society
Annual Meeting, Lake Buena Vista, FL, October 8–12, and approved for
Power Systems Engineering Committee of the IEEE Industry Applications
Society. Manuscript released for publication November 25, 1996.
S. J. Briggs is with the Construction Engineering Research Laboratory,
Corps of Engineers, Department of the Army, Champaign, IL 61826-9005
M. J. Bartos is with the National Military Command Center, SAM-LAFN,
Pentagon, Washington, DC 20330 USA.
R. G. Arno is with IIT Research Institute, Rome, NY 13442 USA.
Publisher Item Identifier S 0093-9994(98)08111-0.
Fig. 1 shows a sample electrical distribution system. Before
this system can be converted to a RAMP model, the modeler
must decide what the necessary and sufficient conditions are
for proper functioning at each point in the system. For purposes
of example, assume the following conditions are true.
• Any one of the three diesel generators can supply the
load for Bus A.
• Either Bus A or commercial power can supply Bus B.
• The maintenance bypass connecting Buses D and E is
not an acceptable path for power flow; the static switch
or UPS must be functional to supply Bus E. (Different
assumptions about system success might allow for power
flow through the bypass.)
• While the critical load is served off of Bus E, we are also
interested in the reliability at noncritical Bus C.
• The tie-breaker between Buses C and D is normally open.
• Bus C can be served either directly from Bus B, or
through Bus D and the tie-breaker.
• Bus D can be served either directly from bus B, or through
bus C and the tie-breaker.
1 Electric Power Software Center, University Computing Company, Dallas,
0093–9994/98$10.00  1998 IEEE
2) Component Nodes: A component node models a piece of
equipment, the successful output of which requires a successful
input and proper functioning of the component.
B. Logical Nodes
Fig. 1. One-line diagram of sample distribution system.
The RAMP model contains information on component
reliability, how those components are interconnected, and
what is required of the system for successful completion of
the mission. Once each component in the system has been
identified, the physical connectivity of the system is used along
with the mission requirements to create a logical model of the
The basic building block of a RAMP model is the node.
Nodes can be either logical or physical, depending on their
function in the model. All nodes have zero or more inputs
and exactly one output. We use the term “signal” to refer to
node outputs. Physical nodes correspond to actual pieces of
equipment and have a reliability associated with them. Logical
nodes correspond to interconnections within the system or
represent constraints imposed by mission requirements.
A. Physical Nodes
1) Input Nodes: These nodes are signal sources for RAMP;
they do not have any inputs and are not dependent on other
nodes for successful operation.
1) AND Node: This is a logical node that represents the
constraint “the output is valid if and only if all inputs are
2) OR Node: This is a logical node that represents the
constraint “the output is valid if and only if at least one input
is valid.”
3) M–N Node: This represents a point in the system with
inputs where the output is functional if at least
of the
inputs are functional. A bus with six possible sources of power
that required any four of them to be functional would be
– node. RAMP internally calculates
represented by an
the reliability of an – node as a combination of AND and
OR nodes.
A macro is a grouping of nodes into a larger aggregate for
convenience. For example, the modeler can define a macro to
describe the physical and logical structure of a diesel generator
and then simply enter the generator’s macro as one node in the
model. Macros may be nested. RAMP expands out macros into
their individual nodes when it reads the input files—macros
are provided as a convenience feature only and in no way
affect the performance or limitations of RAMP.
Using the data compiled in the Appendix, the analyst should
determine proper node types for each item on the one line.
The analyst should then convert the one line to a graphical
representation of the system operation using the node types
and kind numbers, which are used by the program as reference
numbers into the data table.
Control loops in which feedback signals propagate from
downstream components to upstream components are not
allowed in the model. If reliability is affected by items in the
control loop, the influence of those items must be reduced
to a series operator. This limitation also applies to other
cases (such as some circuit breaker failure modes) where
failure of downstream components may impact the reliability
of upstream components.
Nodes which are not independent must have their dependency included explicitly in the model. For example, a
substation might have two sources of utility power where the
two sources were not independent. In this case, the two utility
inputs would have to be modeled as two independent sources
which would each be ANDed with a third source representing
the dependent failure modes.
A structure is created which resembles a tree of nodes. At
the top are input node, which represent sources of signals, such
as batteries or sources from outside the area of interest such
as power coming in at the main service entrance. Connected
below the input nodes, and depending on the inputs for proper
operation, are other nodes, which represent components and
interconnections in the system. Signals flow down through
these nodes until they reach the bottom or output nodes, which
have no nodes connected to their outputs.
The RAMP model corresponding to the sample electrical
system is shown in Fig. 2. For the purposes of this illustration,
we assume cable will have a reliability of one and neglect it;
in reality, it would be included as specific components in the
In general, RAMP needs to be told what signals are of
interest, i.e., which signals should have their output reliabilities
computed and printed out. Any node in the model, the output
signal of which is not used elsewhere in the model will
automatically have its reliability computed and printed out.
Also, the user may explicitly request reliability information at
a specific node. RAMP will solve the system for the requested
Fig. 2. RAMP model of sample distribution system. Note RAMP has added
a fictitious node at the bottom of the model to facilitate its computations. The
numbers in the nodes are kind numbers that refer to specific components in the
data table. The boxed region in the middle corresponds to the model of two
buses connected with a tie-breaker. The “perfect operator” has reliability one
and is needed to provide an input to the two bus operators and the tie-breaker
There are several methods used to perform reliability calculations [2]; RAMP uses a combination of network reduction
and state-space methods to determine the system reliability.
RAMP uses component reliability or availability values (each
between 0 and 1) to express the reliability of each component.
These values can be calculated from mean time between
failures (MTBF) and mean time to repair (MTTR) values.
RAMP is written in ANSI C and has been compiled and tested
on a variety of 32-bit PC’s and workstations.
RAMP reads data from two files; the first contains the
reliability data for individual components (the kind data), and
the other contains the system configuration information. The
system description file contains information about the physical
connections between various pieces of equipment and also
logical information about the proper functioning of the system.
RAMP searches for all nodes that are outputs of the system,
that is, their output is not connected to another node. If there
is more than one such output node, RAMP creates a fictitious
output node which has all the other output nodes for inputs.
This is done so that the system can have a unique output and
is required by the system parsing algorithm.
At this point, the system is ready for RAMP to begin
reliability calculations. RAMP treats the system as a tree of
nodes. It begins at the unique bottom and processes the tree
up from that point. It searches up the tree for the tops of
branches (represented by input components) and then returns
down the tree and computes reliabilities. As RAMP traverses
the tree, it attempts to recursively simplify the system using
a combination of network reduction and direct evaluation
There are two distinct cases where RAMP is capable of
performing network reduction: series combinations and isolated subsystems. RAMP can combine components connected
in series into one new component. Components are connected
in series if and only if the following statements are true:
1) the output from the parent component is connected only
to the child component and 2) the child component has no
inputs other than the parent component. RAMP can replace
any number of components connected in series with one new
component. Fig. 3 shows the RAMP model of the one line
after RAMP has combined all series elements.
RAMP can distinguish and isolate subsystems characterized
by isolation from the rest of the system except for one input
signal and one output signal. Every node in the subsystem
must be dependent on the one input signal and the output
of the system output component. This Boolean function is 1
if the output of the system is functional and 0 if the output of
the system is invalid. It expresses all the dependence of the
system output on the individual component reliabilities.
RAMP evaluates the Boolean function by direct sampling
over all possible input state vectors to compute a numerical
reliability. RAMP first determines all the independent variables in the function. It then begins to examine all possible
combinations of working and failed inputs and tests the output
is the reliability of node , then the
function for validity. If
, can be
probability of any given input state vector,
easily computed as
Fig. 3. RAMP model of sample distribution system after RAMP has performed some simplifications. The portion in the box is a subsystem that
RAMP will find and simplify.
signal must be dependent on every node in the subsystem.
Graphically, the subsystem can be isolated by drawing a box
around it which has only two signals passing through it, an
input signal which every component in the subsystem depends
upon and an output signal from the subsystem. RAMP can also
isolate similar subsystems that have no input signals; instead,
the subsystem has input nodes as signal sources. Fig. 3 shows
examples of both subsystems.
Once RAMP finds these subsystems, it uses a state-space
method to determine the reliability for the subsystem and
replaces the subsystem with a new component with reliability
equal to the subsystem reliability. After RAMP has performed
all the simplifications it can, it can evaluate the whole system.
RAMP treats a system as a collection of sources of logical
signals, AND gates, and OR gates and uses Boolean algebra to
calculate a sum of products expansion of the Boolean function
where the product over is for all the nodes that are working
and the product over is for all the failed nodes.
RAMP begins its evaluation by trying the state with all
components working. It computes the probability of that state
and evaluates the output function. If the output function is 1,
the system reliability is greater than or equal to the probability
of all components being functional. Then, RAMP considers all
single mode failures (input vectors with only one component
failed). For each, it computes the probability of that state,
evaluates the output function, and, if the function is valid, adds
that probability to the output reliability. Essentially, RAMP is
calculating the fraction of input vectors for which the output
state is valid.
Systems with many components may have an extremely
large state space; direct calculation of all possible input vectors
is then computationally infeasible. Fortunately, in these cases,
the probability of a significant number of components failing
simultaneously is generally negligible and RAMP utilizes
a user-defined error parameter to determine when to stop
sampling of the input space. For a typical system, RAMP will
sample all but 10 % of the input space after examining all
input states with up to five simultaneous failures.
Prior to this study, PREP used a table of Reliability and
Maintainability data developed in the late 1970’s compiled
from multiple sources (see Table I for a section of the table).
This table contradicts itself numerous times and contains some
confusing data. For example, using the 1970’s data, a synchronous motor over 600 V will be down approximately 6 h
per year due to failures, while a synchronous motor under
600 V will only be down 90 s year. This could be justified
if the MTTR numbers were greatly different and the MTBF
was relatively the same. In this case, it would mean that,
while both motors failed at relatively the same time period, the
higher voltage motors were not kept in spare parts stock, which
resulted in extended acquisition time. However, a comparison
of MTBF between the two motors indicates that this is not the
case. Given these sorts of disparities, it became apparent that
the data needed updating.
A. Data Results
Data results of this study are included in the Appendix. The
reliability is calculated for total period hours and the MTTR
numbers are also listed. The data should be adjusted using
numbers that are representative of the site’s maintenance, spare
parts stock, and manpower. The designer should consider using
data already included in [2, pp. 222–225].
B. Discussion of Results
The discussions which follow highlight the plausibility of
the results and the ease of use. First, we will examine the
previous problem with the motors. In the previous table, again,
there is a difference between lower voltage and higher voltage
synchronous motors. Higher voltage motors are out of service
about 39 min per year, with lower voltage motors being out of
service for an average of 35 s per year. While these numbers
are different, the MTBF’s are approximately the same. The
differences can be attributed to the logistics of procuring and
delivering the larger motors and parts not normally kept in
Second, the results of our study are easier to use for
the RAMP reliability program than earlier data. The RAMP
program can now be used to find availabilities using a single
value for a single piece of equipment. When using the old
data, the modeler was limited to 120 items and had added
operations (computational nodes) to account for maintenance.
Now, we have data points for 239 types of equipment and the
availability number contains maintenance. We can break out
maintenance into categories such as no maintenance, corrective
only, GSA estimated maintenance, or normally/commercially
performed maintenance. In short, we can tailor the data to
more closely resemble the modeled site.
C. Collecting Data
In order to collect statistically sound data on each of the
items, we adopted the following guidelines.
• Component size was important to ensure similar technologies.
• A minimum of five years of operational data was collected.
• The minimum sample population was 40.
• There were no more than 10 samples of any one category
at any one site. This limit was critical to eliminate any
data skewing which could be caused by the influence of
one or more data points in a small population, particularly those due to abnormal operating or maintenance
• Locations surveyed had varying degrees of maintenance.
• Age of equipment was important, since we wanted only
new equipment and technologies.
• There was a minimum of 3 500 000 operating hours total
for each component. Components with no failures during
this period were arbitrarily assumed to have a failure rate
of one for the period observed (all components with no
failures have at least 4 200 000 hours of data). Due to
the low number of observed failures, the uncertainty in
MTBF is large for components with large MTBF. We
have flagged those components for which we observed
less than eight failures.
Since the model does not directly use MTBF, we use the
failure rate, period, and MTTR data to calculate reliabilities
for use in the model. It can be shown that the uncertainty in
reliability is small for components with large MTBF, even if
the uncertainty in MTBF is large. It is only for small MTBF’s
that it is important to minimize uncertainty in MTBF.
In any case, real-world system failures are dominated by
components with lower reliabilities, and it is much more
important to correctly model the items with numerous failures
than it is to put every near-perfect (no failures) component
into the model.
D. Data Summarization
As with every collection program, there are varying degrees
of data gathered. Some of the data sources had complete
records and could give statistics on operational characteristics
on every piece of equipment from installation to the present
time. More often, the only items tracked were major items,
such as cooling towers, generators, and boilers. Data for
items such as valves and filters were not usually recorded.
Other problems included incomplete or noncurrent versions of
the blue prints. Numerous technicians had to develop parts
lists by hand, recording data from nameplates and relying on
facility engineers for component descriptions. It is important to
determine and be aware of the different levels of data quality.
Data quality can be separated into four groups.
1) Perfect Data: Perfect data would include a parts list,
failure history data with time to failure numerics, item descriptions, parts stock, operational periods, and ten continuous years
of service. No engineering judgement is necessary for reduction. The database is comprised of 10%–20% of this type data.
2) Not Perfect Data: Not perfect data is defined as data
with no serious flaws, but the data collection process demanded
additional time to ensure useful information was gathered. Examples of this type of data would include parts lists determined
by inspection, incomplete blueprints, or a service period less
than ten years. The database is comprised of 35%–40% of this
type of data.
3) Verbal/Inspection Data: This category of data had serious gaps which could not be utilized in the database until
corrected. Items included were typically major items, such
as generators and boilers. Interviews of senior maintenance
personnel were used to supplement the data. An example
of this would be a description of a particular failure or
clarification of repair times. The database is comprised of 25%
of this type of data.
4) Soft Data: Soft data was the most difficult type of data
to collect. Data collection involved working from scheduling
sheets and repair records of repair companies. The database is
comprised of 10%–15% of this type of data.
Analysts identified and collected data from locations which
had a variety of maintenance policies and practices. Maintenance policies and practices directly affect the availability
of the equipment. High levels of maintenance will lower
availability, but have the potential to increase reliability. The
amount of maintenance performed can drastically affect the
performance parameters being collected.
The data and methodology presented here are very useful
in determining the site reliability or availability. The actual
number produced for a given system may not be totally
definitive; however, comparisons between system numbers
are of great value. The data and procedure can be used
in different manners to aid the facility designers and the
facility engineers. The designers can use the software and
data to evaluate different designs. They can also estimate
occurrence of downtimes by adding the failure times to the
production or mission loss and can estimate the total length
of time from line stop to line start as a result of failures.
New designs or redesigns can be evaluated to minimize the
production or mission failure with estimates on money saved
by avoiding downtime. The facility engineer can use the
data and software to estimate downtimes associated with the
systems or subsystems and compare these results to the actual
downtimes. This could identify problem areas that may need
more (or less) maintenance time and systems that may benefit
from redundancy or replacement.
The RAMP program, along with the user manual which
contains the reliability and availability data, may be ordered
from the U.S. Army Center For Public Works, Attn.: CECPWK, 7701 Telegraph Road, Alexandria, VA 22315-3862 USA.
MTBF (h)
MTTR (h)
[1] Draft Design Features Manual, U.S. Army Center for Public Works,
Power Reliability Enhancement Program, Ft. Belvior, VA, 1980.
[2] Design of Reliable Industrial and Commercial Power Systems, IEEE
Standard 493-1990.
[3] W. H. Dickinson, P. E. Gannon, C. R. Heising, A. D. Patton, and D. W.
McWilliams, “Fundamentals of reliability techniques as applied to industrial power systems,” in Conf. Rec. IEEE Industrial and Commercial
Power Systems Tech. Conf., 1971, pp. 10–31.
[4] C. Singh and R. Billinton, System Reliability Modeling and Evaluation.
London, U.K.: Hutchinson, 1977.
[5] Design Features Manual, U.S. Army Center for Public Works, Power
Reliability Enhancement Program, Ft. Belvior, VA, 1980.
Stephen J. Briggs received the B.A. degree in
physics and mathematics from Knox College, Galesburg, IL, and the M.S. degree in nuclear engineering
and the Ph.D. degree in electrical engineering from
the University of Illinois at Urbana-Champaign in
1982, 1985, and 1990, respectively.
Currently, he is a Principal Investigator at the
Army Corps of Engineers Construction Engineering
Research Laboratory (USACERL), Champaign, IL,
where he is involved in the causes and effects of
harmonics on power systems. Other interests include
reliability/availability modeling tools for electrical systems, electric vehicles,
applications of neural networks to electrical power systems, and electrical
energy storage technologies.
Michael J. Bartos received the B.S. degree in mechanical engineering from The Pennsylvania State
University, University Park, in 1985.
Currently, he is the Facility Engineer, Site Project
Manager, National Military Command Center, Pentagon, Washington, DC. Previously, he was a Project
Engineer for The Power Reliability Enhancement
Program, Center for Public Works of the Army Corp
of Engineers. Throughout his career, he has been
developing reliability modeling techniques and reliability and maintainability data. He recently served
as a consultant on the Pentagon renovations in the areas of power quality, low harmonic fluorescent lighting, reliable power and HVAC designs
and command, control, communications, computer, and intelligence facility
Robert G. Arno received the B.S. degree in electrical engineering from the State University of New
York at Utica/Rome in 1982.
He has worked in the reliability field for 19 years,
joining IIT Research Institute, Rome, NY, in 1977.
His principal responsibilities include program management, electrical and mechanical system analysis
and modeling, and data collection and analysis. His
most recent program included the data collection of
power generation, distribution, and HVAC data.
Mr. Arno is a member of the American Society
for Quality Control.
Fly UP