Reliability analysis of computing systems using simulations.

The reliability of a computer system depends on the degree of influence on its components and the whole system of factors such as environmental conditions, supply voltage, mode of operation, the schedule of preventive maintenance and others. PKTB ASUZHT investigated the influence of these factors for different types of computers.

Reliability conventionally characterized by the number of failures for a certain period of time (downtime technical failure). Analysis of the impact of external factors on the reliability was performed using a simulation model, which is based on well-known in mathematical statistics regression equations.

As the output of the dependent variable was made the number of failures of computers. The number of failures and downtime is not a technical fault was chosen because in recent years are also affected by qualified personnel, the design features of computers, availability of spares, and others.

According to the registers of faults, voltage and temperature charts for three years, for the three types of computers monthly regression equations, whose coefficients were calculated by computer.

The level of compliance of the investigated object model allows its use for simulation experiment. Its essence lies in the fact that at a time 1% input variable and changes according to the change determined by the equations of the output variable.

As can be seen, less likely to increase the number of failures affects the overall time maintenance. So, the computer «Minsk-32» the number of failures is reduced by 0.3% for other types of computers is even less influence.

To estimate the ratio of increase in time to prevent and reduce downtime by technical failure took advantage of an equation that shows the dependence of the number of faults.

In view of the quantified characteristics have been proposed practical measures aimed at improving the reliability of the computer. Thus, to improve the temperature conditioning has been provided locally. As predicted by calculations, stability temperature by 2% reduced the number of failures by 15-18%. In order to stabilize the supply voltage has been organized through the system of the engine — generator.

At the second stage of the research examined the effect of the reliability of individual devices on the reliability of the entire system as a whole. With other conditions being equal, the better balanced reliability individual devices, the higher the reliability of the computer, because even a significant increase in the reliability of one of them can not significantly affect the reliability of the overall system.

To study this factor was built simulation model based on a regression equation in which the dependent variable (y) has been chosen by the computer downtime of technical failure (in hours), and the input of independent (x) is the number of failures of computer devices. With the help of simulation experiment, the influence of the number of failures for each device on a computer downtime. Simulating different streams device failures, expected downtime of computers. That device failures that have the greatest impact on the reliability of the entire system, classified as unreliable. Simulating flow reduction on the device by increasing its reliability was determined by another node and so unreliable. D.

To construct a simulation model using data on failures of computer equipment complete set for three years. It should be noted that in this case all the failures recorded by the device in question. Given that each of them is equally exposed to external factors, failures observed several devices (2 3) simultaneously.

In general, the impact of failures on the downtime of a technical failure depends on the intensity and nature of the flow of failures failures as various failures, even a single device for different effects on downtime. To identify the impact of failures on the types of devices normalized flow problem in this case, the coefficients of the regression equations were defined as affected by the refusal of this type of devices on downtime. The higher the coefficient in the regression equation, significantly influence the failure of the device on the total downtime of computers. In this case, the equations are constructed in such a way that is determined by the combined influence of these two components.

To maximize the fit of the model investigated object observations with a large spread is not used.

Simulating an increase in the flow of failures for each device type at 100%, the data increase idle computers in the percentage of total downtime.

most bottleneck in computer «Minsk-32» is an alphanumeric printer and reliability of computers «Minsk-32» is generally defined VNU reliability. It is necessary to avenge that the reliability of all the devices that computers quite balanced. The ES1030 computers and the EU in 1022 the reliability of nodes balanced worse bottlenecks are the most RAM and disk controller. Especially noticeable imbalance at the ES 1030 reliable, half of which is determined by the reliability Subdivision simulations show an insufficient level of reliability of all these computer console.

In accordance with the findings we have taken a number of measures aimed at improving the reliability of the computer. So it was zadublirovany drive controller replaced typewriters to display «Videoton-340», introduced the proposal, allows you to disable the failed part of the OP to the EU in 1030, and others.

The experimental data are also used by us in determining the complex multi-machine configurations based on the general field of memory to disk.

Analysis of the computing systems by this method on a computer yielded quantitative characteristics necessary to assess the real situation on the reliability of the computer. In accordance with the plans drawn up innovation work, technical training, etc.

Based on this analysis, you can make some recommendations. To ensure stable operation of third-generation computers operated on the rail network, it is necessary: ​​to comply strictly with the requirements of manufacturers of temperature, it is advisable to implement a local individual air-conditioning units; computer power should be carried out through the engine — generator; least reliable electromechanical devices computer to duplicate or replace more reliable.

In conclusion, it can be noted that the simulation can also study the question of reliability of computer systems, taking into account the reliability of the software, create a multicomputer systems, backup and others.

