出版社:SISSA, Scuola Internazionale Superiore di Studi Avanzati
摘要:From its conception the job management system has been distributed to increase scalability and
robustness. The system consists of several applications (called ProdAgents) which manage
Monte Carlo, reconstruction and skimming jobs on collections of sites within different Grid
environments (OSG, NorduGrid, LCG) and submission systems such as GlideIn, local batch,
etc...
Production of simulated data in CMS mainly takes place on so called Tier2s (small to medium
size computing centers) resources. Approximately ~50% of the CMS Tier2 resources are
allocated to running simulation jobs. While the so-called Tier1s (medium to large size
computing centers with high capacity tape storage systems) will be mainly used for skimming
and reconstructing detector data. During the last one and a half years the job management
system has been adapted such that it can be configured to convert Data Acquisition (DAQ) /
High Level Trigger (HLT) output from the CMS detector to the CMS data format and manage
the real time data stream from the experiment. Simultaneously the system has been upgraded to
facilitate the increasing scale of the CMS production and adapting to the procedures used by its
operators.
In this paper we discuss the current (high level) architecture of ProdAgent, the experience in
using this system in computing challenges, feedback from these challenges, and future work
including migration to a set of core libraries to facilitate convergence between the different data
management projects within CMS that deal with analysis, simulation, and initial reconstruction
of real data. This migration is important, as it will decrease the code footprint used by these
projects and increase maintainability of the code base.