Reliability Evaluation of Manufacturing Systems: Methods and Applications

Several studies (Ascher et al..1984, Battini et al., 2009, Louit et al., 2007, Persona et al. 2007) state that often these complex methodologies are applied using false assumptions such as constant failure rates, statistical independence among components, renewal processes and others. This common approach results in poor evaluations of the real reliability performance of components. All subsequent analysis may be compromised by an incorrect initial assessment relating to the failure process. A correct definition of the model describing the failure mode is a very critical issue and requires efforts which are often not sufficiently focused on.

the stochastic failure and repair behavior of complex manufacturing systems, assembled using a variety of components. Consequently, the first part of the chapter presents a general framework for components which describes the procedure for the solution of the complete Failure Process Modeling (FPM) problem, from data collection to final failure modeling, that, in particular, develops the fitting analysis in the renewal process and the contribution of censored data throughout the whole process. The chapter discusses the main methods provided in the proposed framework.
Applications, strictly derived from industrial case studies, are presented to show the capability and the usefulness of the framework and methods proposed.

Failure Process Modeling (FPM) framework
A robust reliability analysis requires an effective failure process investigation, normally based on non-trivial knowledge about the past performance of components or systems, in particular in terms of failure times. This data collection is a fundamental step. The introduction of a Computer Maintenance Management System -CMMS and of a Maintenance Remote Control System (Persona et al., 2007) can play an important role. Ferrari et al. (2003) demonstrate the risk due to a small data set or due to hasty hypothesis often considered (e.g. constant failure rate, independent identically distributed failure times, etc.).
Literature suggests different frameworks for the investigation of the failure process modeling of components and complex systems, generally focused on a particular feature of problem (e.g. trend tests in failure data, renewal or not renewal approach, etc.).
In this chapter a general framework is proposed considering all the FPM process from data collection to final failure modelling, also considering the contribution of censored data. Figure 1 presents the proposed framework (Regattieri et al., 2010). Data collection is the first step of the procedure. This is a very important issue, since the robustness of analysis is strictly related to the collected data set. Both failure times and censored times are gathered. Times to failure are used for the failure process characterization and censored times finally enrich the data set used for the definition of the parameters of failure models, thus resulting in a more robust modeling.
In general, considering a population of components composed by m units, each specific failure (or inter-failure) time can be found. The result is represented by a set of times called Xi,j , where i th represents the time of failure of the j th unit: there is a complete data situation in this case, that is, all m unit failure times are available.
Unfortunately, frequently this is not a real situation, because a lot of time and information would be required. The real world test often ends before all units have failed, or several units have finished their work before data monitoring, so their real working times is unknown. These conditions are usually known as censored data situations.
Technically, censoring may be further categorized into:

Individual censored data
All units have the same test time t*. A unit has either failed before t* or is still running (generating censored data); 2. Multiple censored data www.intechopen.com

317
Test times vary from unit to unit. Clearly, failure times differ but there are also different censoring times. Censored units are removed from the sample at different times, while units go into service at different times. In reference to Figure 1, let X1,j< X2,j <…< Xi,j <…< Xn,j be the ordered set of failure or inter-failure times of item j; censored times (denoted Xij+) are temporarily removed from the data set. The trend test applied to ordered failure times (graphical trend test, Mann test, etc.) determines if the process is stationary or not.
If the process presents a trend, the Xi,j are not identically distributed and a non-stationary model must be fitted. The NHPP model is the most used form due to its simplicity and according to significant experimental evidence available (Coetzee, 1997). At this time the censored data must be reconsidered in the model. Their impact is discussed by Jiang et al. (2005).
If the failure process is trend free, the next step is to identify if inter-failure times are independent. There are a lot of tests for independence, but this check is usually skipped by practitioners, as stated by Ascher and Feingold, because of a lack of understanding of the relevance of this type of test. An effective way of testing the dependence is the serial correlation analysis discussed by Cox and Lewis (1966). Dependence between data involves a Branching Poisson Processes (BPP), which is also analyzed by Cox and Lewis (1966). Censored data also play an important role in the BPP model and must be considered during final modelling.
In real applications the failure process is frquently stationary and the failure data are independent: then a renewal process is involved. In spite of this, the proposed framework pays attention to the evaluation of reliability func ti on s , in part i cul a r in pres en ce of censored data.
More precisely, non parametric methods and distribution based techniques are suggested to find the reliability functions such as survival functions, hazard functions, etc. considering censored data.
The Product Limit Estimator method and Kaplan-Meier method for the first category and Least Square Analysis and Maximum Likelihood Estimator technique for the second category are robust and consistent approaches. Regattieri et al. (2010), Manzini et al. (2009) andEbeling (2005) discuss in details each method referred in the presented framework.

Applications
The proposed framework has been applied in several case studies. In this chapter, two applications are presented, in order to discuss methods, advantages and problems.
The first application deals with an important international manufacturer of oleo-dynamic valves. Using a reliability data set collected during the life of the manufacturing system, the effect of considering or not the censored data is discussed.
The second application involves the application of the complete framework in a light commercial vehicle manufacturing system. In particular, the estimation of the failure time distribution is discussed.

Application 1: The significant effect of censored data
During the production, the Company collects times to failure using the CMMS system. In particular, the performance of component r.090.1768 is analysed; this is a very important electric motor: it is responsible of the movement of the transfer system of the valves assembly line. Figure 2 shows a sketch of the component.
The failure process can be considered as a Renewal Process; stationary test and dependence test are omitted for the sake of simplicity (for details see application 2).
Using non parametric methods, in particular the Kaplan Meyer method (Manzini et al. 2009, Ebeling, 2005 it is possible to evaluate the empirical form of reliability function (called R(t i )). Assuming ti as the ranked failure times and ni to be the number of components at risk, prior to the ith failure, the estimated reliability is calculated by:

www.intechopen.com
The results of reliability analysis are summarized in Table 2 and Figure 3.  Figure 4 shows a comparison between the Reliability functions obtained by Kaplan Meyer method applied to the set with censored data, and Improved Direct Method (Manzini et al., 2009) applied only to the failure times (complete data set). Anyway, if censored times are not considered, a significant error is introduced.

Application 2: A complete failure process modeling
In this application the FPM process has been applied to carry out the reliability analysis of several components of a light commercial vehicles manufacturer production system. The plant is composed by a lot of subsystems; for each of them a set of critical components is considered. Each component has a preferable failure mode (wear, mechanical crash, thermal crash, etc.); from now on, the generic expression failure is used considering the particular and prominent failure mode for each component. Table 3 shows the subset of critical components, called S1, analyzed in the Chapter. Table 3. Critical components and subsystems Among them, the main welding robot named wr1 is very critical, and in particular its components named KKL5699, which are the main actuators, are considered to be mainly responsible for the poor reliability performance of the entire manufacturing system. For this reason, the Chapter presents the application of the proposed framework to the component KKL5699. Finally, the conclusions take into account all the critical components shown in Table 1.

Analysis of KKL5699 component
The tow system is composed by 9 identical repairable components KKL5699, working in 9 different positions, named with letters A,B,…, L, under the same operating conditions. For this reason, they are pooled in a single enhanced data set. The working time is 24 hours/day, 222 days/year.
The CMMS has collected failure data from initial installation (T 0 = 0). Table 4 reports the interfailure time X ij (failure i of item j) and the cumulative failure times F ij as shown in Figure 5.
The data are collected during 5 years of operating time, but FPM must be an iterative procedure applied at different instants of system service. The growth of the data set allows a more robust investigation of the failure process. In particular, the paper involves the results of analysis developed at the end of different time intervals [T 0 ,t] : 1, 440, 3,696, 4,824, 6,720, 8,448, 11,472, 13,440, 15,560, 18,816 and 23,688 hours. For the generic time t,the analysis uses all the failure times collected, but also the existing censored times according to the components in service. For each instant of analysis, 9 suspended times are collected due to the working times from the last repair action of components and the time analysis. Table 5 reports the data set of failure times, the censored times and the relative working position available at the instant of analysis 3,956 hours. Obviously, when the instant of analysis increases, the number of failure times increases too. Whereas the censored times are constantly 9, then the Censoring Rate -CR, given by (1), decreases: ) ( where Nc(t) is the number of censored times available at time analysis t, and Ntot(t) is the number of times (failure and censored) available at time analysis t.
The different Censoring Rates involved in the analysis are presented in Table 6.  According to the proposed framework, the first test deals with the stationary condition. Figure 6, referring to all the pooled data set, presents the cumulative failures vs time plot graphs. No trend can be appreciated in the failure data. The Mann test, counting the number of reverse arrangements, confirms this belief, both for each component and for the pooled data set. The Laplace trend test leads to the same conclusions, in particular its test statistic is u L =0.55 according to a p-value p=0.580. Comparing u L 2 with the χ 2 distribution with 1 degree of freedom, there is no evidence of a trend with a significance level of 5% (Ansell and Philips, 1994  A first assessment shows that the component failure process is stationary. The next step deals with the renewal process hypothesis evaluation. The serial correlation analysis is used to reveal the independence of the analyzed data set. According to the autocorrelation plot, in Figure 7, we cannot reject the null hypothesis of no autocorrelation for any length of lag (5% is the significance level adopted).
The Durbin-Watson statistic confirms this belief, then the component failure process for component KKL5699 can be considered to be a renewal process (RP).
Considering the RP assumption, non parametric methods or distribution based techniques are available to define the failure process model. It is important to take into consideration the role of censored data: as demonstrated in the previous application 1, their use enhances the data set and then increases the confidence of the model.
According to the censored data consideration, data are previously analyzed by the Product Limit Estimator method (Manzini et al. 2009, Ebeling, 2005, and then the best reliability distributions (i.e. survival function, hazard rate, etc.) are fitted by the Least square analysis method. This approach is applied for each time interval of the system service time (i.e. instant of analysis).
From now on, we will only consider the pooled data set; for this reason, the time notation X ij collapses into X z . The value of survival function -R(X z ), after a working time equal to X z using the Product Limit Estimator method, is given by: where δ z = (1;0) (if failure occurs at time X z ; if censoring occurs at time X z ); R(0) = 1; n number of failure and censored events available All the service times are investigated. Figure 8 shows the survival empirical curves for component KKL5699, obtained at different instants of analysis, sometimes very spaced out one from each other (e.g. several months apart).
The reliability evaluation after 1,440 hours (roughly 3 months of service) appears very approximate, while after 23,688 hours (more than 4 years of service) it is very confident. Considering the data set, the survival function evaluations change in a significant way (up to 25%) along the life of component; the data collection and the maintenance of data collected is thus a very important issue.
An alternative approach is to identify a proper statistical distribution for the principal reliability functions, such as the survival function, R(t), the failure cumulative probability function, F(t), and the hazard function, h(t), to evaluate its parameter(s), and perform a goodness-of fit test. In general, this approach is very interesting because in recent years a significant number of techniques, based on the knowledge of the reliability distributions, have been developed, and then they can provide an important contribution to the performance improvement of both industrial and non-industrial complex systems. These techniques are often referred to as RAMS (Reliability, Availability, Maintainability and Safety) analysis.
For example, scientific literature includes a huge number of interesting methods, some being more complex than others, linked to preventive maintenance models; these models support the determination of the best intervention interval, or the optimization of the procedures, that determine spare parts consumption or the best management of their operating costs.
The 2-parameters Weibull distribution is one of the most commonly used distributions in reliability engineering because of the many failure processes it attains for various values of parameters. It can therefore model a great variety of data and life characteristics. The distribution parameters can be estimated using several methods: the Least Square method (LS), the Maximum Likelihood Estimator (MLE) and others (Ebeling, 2005). In the KKL5699 component case, the Least Square method is preferred for its simplicity and robustness.  The parameters of Weibull distribution move toward steady values of about 3.0 for β and about 2.400 for η. Figure 9 shows graphically the trend of parameters according to different instants of analysis. Another interesting analysis deals with the link between the estimate of Weibull parameters and the censoring rate. Figures 10 and 11 show the significant role played by CR on β and η paths. The different censoring rates are due to a fixed number of censored data (i.e. 9) and an increasing number of failure data. Reliability functions, such as the survival function, the cumulative probability of failure and the hazard rate, can be directly derived by 2-parameters Weibull distribution using the estimated parameters of Table 7. Figure 12 shows the survival function R(t) of component KKL5699 for a generic time interval t estimated after 2,410, 4,824 and 23,688 hours of service. The proposed framework ( Fig.1) provides both the non parametric and the distributions based approaches. Their use usually results in similar outcomes but, as stated before, the second one incorporates more information and is preferable when possible. Figure 13 shows the comparison between the estimates of the survival functions of component KKL5699 obtained after 23,688 hours of service adopting the two different approaches.

Analysis of the subset S1
The stationary and dependence tests of the times to failure highlight how all the components of subset S1 can be described as having a Renewal Process behavior. Failure and censored data are used to perform the Product Limit Estimator method to evaluate empirical values of the cumulative failure distributions. The 2-parameters Weibull distribution is used to evaluate the reliability functions in an analytical manner; in particular the distribution parameters are estimated using the Least Square method for each instant of analysis.  Table 8. Parameters of Weibull distribution of S1 components after 23.688 operating hours Figure 14 shows the survival function of the critical components as calculated on the basis of the parameters reported in Table 8.

Conclusion
The Failure Process Modeling plays a fundamental role in reliability analysis of manufacturing systems. Complex methodologies are often applied using false assumptions such as constant failure rates, statistical independence between components, renewal processes and others. These misconceptions result in poor evaluation of the real reliability performance of components and systems. All complicated subsequent analysis may be compromised by an incorrect initial assessment relating to the failure mode process.
The experimental evidences show that a correct definition of the model describing the failure mode is a very critical issue and requires efforts often not sufficiently focused on by engineers.
The information collection of both failure data and censored data is a fundamental step. The CMMS method and a system automatically managing the alarms coming from the different sensors installed, can represent valid tools to improve this phase.
As demonstrated in the presented applications, the neglecting of the censored information results in significant errors in the evaluation of the reliability performance of the components.
The knowledge of a fitted analytical distribution is very interesting, because it allows several developments: for example, the determination of the best intervention frequency, or the optimization of the procedures that determine spare parts consumption or the best management of their operating costs.

330
The FPM procedure must also be maintained: during the service of systems, the reliability data set grows and a more robust estimation of reliability functions is allowed. FPM process, performed using the proposed framework (Fig. 1), must be an iterative procedure renewed during the life of systems.