ATLAS Offline Software Computing Training: Spring 2013
CERN Week of
11 March 2013
Overview Software Tutorial Twiki page
INDICO schedule with presentation and tutorial links (requires CERN log-in)
Day 1
Session 1: Welcome and Basics (James)
Data processing chain: Yellow boxes represent data formats
AOD: Analysis Object Data (main data format)
Code is stored in SVN, builds managed by CMT (releases about once per month, major releases every six months. Notation is majorRelease.developerRelease.bugFix as well as additional digits for the addition of small packages (we are using 17.2.7.4.1 for this week’s examples) Use releases at least as recent as the data you are trying to read.
Package = one library that can be run via the Athena executable (requires certain header files and CMT alignment) – looks quite handy, should learn about (Example: BPhysExamples)
ATLANTIS and VP1 are event displays to check that we’re analyzing what we think we are (going to play with on Friday)
Session 2: Where do my data and MC come from?
Python (Sebastien)
ATLAS is using python 2.6 (SLC5 has python 2.4 installed by default)
Standard Python indentation for ATLAS is 4
spaces (no tabs)
Session 3: Finding and Getting my Data and MC
Hands on with Athena and Python (Sebastien, Karsten)
Start
with this tutorial to set up computer on network
Then: Great beginner's tutorial for Athena and Python used in this session
After completing this tutorial, can just type asetup 17.2.7.4.1
to setup Athena after logging in.
(Type: echo $TestArea to see if it's set up correctly)
COMA & AMI Reports
Gray = no stable beam; yellow = ready to collect data; green = taking data
Runs with missing luminosity are intended only for expert use
Hands on with AMI
Hands on with MC and DQ2
Day 2
Session 1: Introduction to the ATLAS Event Data Model (EDM) (Sebastien)
This session covers what is in the ATLAS data and how to access and examine it
ESD = Event summary data - how the variables used in analysis get from detection to my code
D3PD = simplified data format into a ROOT tree, with
varying content depending on the choices of the group that made the
D3PD.
Once you get to the D3PD level, the amount of included information is fixed.
Tracking
Specific aspects of the trajectory and vertex are written out to the D3PD – some data, such as specific hits along the track, are not written out. This section covers what tracking information is included.
Calorimetry
Electron/Photon (‘egamma’) Objects
‘Combined’: cluster and tracker information combined
Jets (Stephen)
Hadron level: what you get from MC (require truth information, from MC)
From data: Reconstructed or track jets
B Tagging
Muons
Taus
Handled completely differently than electrons and muons
Is hard to distinguish tau decay from QCD jet
An Introduction to Athena ROOT Access (Karsten)
See presentation.
Using ARA to Plot Data
Session 2: DPDs, Root Analysis Tools, and D3PD Analysis
DPDs: Derived Physics Data (derived from formats too large to download locally; has information and events skimmed): also defines slimming, trimming, thinning
Day 3
Session 1: ATHENA to D3PDs
Introduction to Athena (Karsten)
We don’t need to know all the back-code, but enough to see how the data are structured to build our D3PDs
Algorithm: Our data are built around “event structure” (before: initialize, loop over events: execute, and after: finalize)
If we create a sub-sequence, it can have conditions for completion/failure, but the main sequence will still complete
MyTopOptions.py
referenced in these presentations comes from today's later tutorials
Protected twiki pages do not come up in google;
search the twiki separately!
Includes e-group lists to join for help if/when building D3PDs
Making D3PD's: Introduction (Louise)
This info is helpful not just making the D3PDs, but adding and understanding variables, event filtering, and quick analysis
Using Tools in Athena and ROOT (Karsten)
Yesterday we analyzed a D3PD in ROOT, and selected our jets on specialized criteria (‘cleaning’ criteria) – was actually done with cumbersome selection cuts choosing whether a jet is ‘clean’
Today: Make this selection cut in Athena (ensures consistency and is scalable)
All cuts get configured in one file, for both Athena and ROOT in making the D3PD
Use the following to make a D3PD:
Use this tutorial to make a D3PD!
The above tutorial and instructions also contain pointers to D3PD making scripts available
We only need to recompile in the cmt directory when we add new python files or C++ code.
Session 2: Use Athena for Analysis (Mana)
Positives of working with D3PDs in Athena- Easy to work with several people on an analysis at one time
- Long-term solution for analysis
- Flexible and powerful, although feels complex for beginners
- Makes reading D3PD’s efficient and fast
The included tutorial is lengthy but informative.
Day 4
Session 1: The Grid
Introduction to the Grid
This session gives an overview of the Grid: widely distributed computing resources
The hierarchy of data placement is:
CERN > National Computing Centres > Regional > Individual sites
Includes data management, databases/bookkeeping, production (prodsys, panda), and distributed analysis (ganga, pathena)
Jobs go to the data – it’s harder to move data around on the grid, so computing jobs reference the data wherever it lives on the grid (and you can download small chunks of data)
Between 10-100GB per user per day is informally acceptable; past that will probably have issues
Rucio = emerging data management software (has been DDM/DQ2)
PanDA Overview
This session gives an overview of PanDA, and how to use it to accomplish distributed analysis on the GRID. (The following tutorial offers additional info)
PanDA = Atlas production and distributed analysis system
If you can use athena to run jobs, you can use PanDA with just a few added commands at the command line (pathena, prun)
You can monitor your PanDA jobs on http://panda.cern.ch
This session also includes useful contact info for getting help with using PanDA, the Grid, and distributed analysis! (last slide)
ATLAS Task Monitoring
This section explains the web site that we can use to monitor our tasks submitted to the Grid (web UI). Has good, detailed slides enumerating what each part of the web page means and how to use it.
Ganga: Helpful Grid Tool
In addition to using pathena/prun to access PanDA and other distributed computing venues, we can also use Ganga to submit jobs to the Grid. This section gives an overview of Ganga and how to use it.
The power of Ganga is ‘configure once, run anywhere’ – jobs look the same whether run locally or over the Grid
Can be accessed with the command line, iPython, and/or through a GUI
Various methods can help monitor Ganga jobs – a GUI and a web site. These slides also outline these methods.
This session also includes useful mailing lists, contacts and links.
Hands-on: Practice Running Jobs on the Grid
Software Tutorial Using the Grid
Be sure to make a separate directory for the job files to submit! The prun command sends (almost) all files in the current and recursive directory to the Grid
Always be
careful to test-run the job locally before submitting it to the Grid
Session 2: DAOD Making
Software Tutorial of DAOD Making
Day 5
Session 1: Athena-based Analysis - Calculating a Cross-Section
This session covers how to get Good Run Lists from the official web site, and continues through the process of gathering information about the run all the way to calculating a cross section.
Luminosity Calculation Instructions
Note that online luminosity = approximate integrated luminositySession 2: Event Displays
This session gives a good overview of using event displays (VP1 and ATLANTIS), which are useful to confirm that we’re analyzing what we think we’re analyzing, as well as to make great pictures for publications!
ATLANTIS Tutorial