期刊名称:International Journal of Grid and Distributed Computing
印刷版ISSN:2005-4262
出版年度:2014
卷号:7
期号:2
页码:53-68
DOI:10.14257/ijgdc.2014.7.2.06
出版社:SERSC
摘要:In this age of data and knowledge, Cloud, Grid and P2P systems are becoming common and advanced. Due to heterogeneous and distributed nature, Grid becomes more vulnerable to faults. Trace files are great way of storing and collecting fault and workload information from the system. FTA (Fault Trace Archive) and GWA (Grid Workload Archive) are two such trace files. Previously FTA and GWA have been individually analyzed by researchers, but in this research paper for the first time, we have analyzed the combination of FTA and GWA as a single research problem. Trace files have been joined based on the event timestamp values. Both the trace files have been analyzed to establish a correlation based model among node failures, failed jobs, number of nodes and failure duration. We have discovered that these factors are positively correlated with each other but to a different extent. Along with node failure frequency, failure resume time and node dedication factor, we have found that interactive jobs have a higher failure probability as compared to batch jobs.
关键词:FTA Fault Trace Archive; GWA Grid Workload Archive; Fault Tolerance; ; QoS Quality of Service