diff --git a/episodes/02-cpp-hello-world.md b/episodes/02-cpp-hello-world.md new file mode 100644 index 0000000..e31428c --- /dev/null +++ b/episodes/02-cpp-hello-world.md @@ -0,0 +1,178 @@ +--- +title: "Lightning overview of C++" +teaching: 5 +exercises: 10 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- How do I write and execute C++ code? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- To write a *hello-world* C++ program + +:::::::::::::::::::::::::::::::::::::::::::::::: + +## Setting up your working area + +If you completed the [Docker pre-exercises](https://cms-opendata-workshop.github.io/workshopwhepp-lesson-docker/) +you should already have worked through +[this episode](https://cms-opendata-workshop.github.io/workshopwhepp-lesson-docker/03-docker-for-cms-opendata/index.html), under **Download the docker images for ROOT and python tools and start container**, and you will have + +- a working directory `cms_open_data_root` on your local computer +- a docker container with name `my_root` created with the working directory `cms_open_data_root` mounted into the `/code` directory of the container. + +Start your ROOT container with + +```bash +docker start -i my_root +``` + +In the container, you will be in the `/code` directory and it shares the files with your local `cms_open_data_root` directory. + +:::::::::::: callout + +## If you're using apptainer: + +Whenever you see a `docker start` instruction, replace it with `apptainer shell` to open either the ROOT or Python container image. +The specific commands in this pre-exercise and during the live workshop will be given for docker, since that is the most common application. +As a general rule, editing of files will be done in the standard terminal (the containers do not have all text editors!) or via the jupyter-lab interface, and then commands will be executed inside the container shell. If you see `Singularity>` on your command line, you are ready to run a ROOT or python script. + +::::::::::::: + +## Your first C/C++ program (Optional Review!) + +Let's start with writing a simple `hello world` program in C. First we'll edit the +*source* code with an editor of your choice. + +Note that you will +*edit* the file in a local terminal on your computer and then *run* the file +in the Docker environment. This is because we mounted the `cms_open_data_root` directory +on your local disk such that it is visible inside the Docker container. + +Let's create a new file called `hello_world.cc` in the `cms_open_data_root` directory, using your preferred editor. + +The first thing we need to do, is `include` some standard libraries. These libraries +allow us to access the C and C++ commands to print to the screen (`stdout` and `stderr`) as +well as other basic function. + +At the very beginning of your file, add these three lines + +```cpp +#include +#include +#include +``` + +The first library, `cstdlib`, you will see in almost every C++ program as it has many of the very +basic functions, including those to allocate and free up memory, or even just exit the program. + +The second library, `cstdio`, contains the basic C functions to print to screen, like `printf`. + +The third library, `iostream`, contains C++ functions to print to screen or write to files. + +Usually people will use one or the other of the C or C++ printing functions, but for pedagogical purposes, +we show you both. + +Every C++ program must have a `main` function. So let's define it here. The scope of this function +is defined by curly brackets `{ }`. So let's add + +```cpp +int main() { + + + return 0; +} +``` + +The `int` at the beginning tells us that this function will be returning an integer value. At the end of +the `main` function we have `return 0`, which usually means the function has run successfully to completion. + +:::::::::::::::::: callout +## Warning! + +Note that at the end of `return 0`, we have a semicolon `;`, which is how C/C++ programs terminate lines. +If you're used to programming in python or any other language that does not use a similar terminator, this +can be tough to remember. If you get errors when you compile, check the error statements for the lack +of `;` in any of your lines! +:::::::::::::::::: + +For this function, we are not passing in any arguments so we just have the empty `( )` after the `main`. + +This function would compile, but it doesn't do anything. Let's print some text to screen. Before +the `return 0;` line, let's add these three lines. + +```cpp + printf("Hello world! This uses the ANSI C 'printf' statement\n"); + + std::cout << "Hello world! This uses the C++ 'iostream' library to direct output to standard out." << std::endl; + + std::cerr << "Hello world! This uses the C++ 'iostream' library to direct output to standard error." << std::endl; +``` + +The text itself, should explain what they are doing. If you want to learn more about standard error and standard +output, you can read more on [Wikipedia](https://en.wikipedia.org/wiki/Standard_streams). + +OK! Your full `hello_world.cc` should look like this. + +::::::::::::::::::::::: spoiler +## Full source code file for `hello_world.cc` + +```cpp +#include +#include +#include + + +int main() { + + printf("Hello world! This uses the ANSI C 'printf' statement\n"); + + std::cout << "Hello world! This uses the C++ 'iostream' library to direct output to standard out." << std::endl; + + std::cerr << "Hello world! This uses the C++ 'iostream' library to direct output to standard error." << std::endl; + + return 0; + +} +``` +::::::::::::::::::::::::::: + +This won't do anything yet though! We need to *compile* the code, which means turning this into +[*machine code*](https://en.wikipedia.org/wiki/Machine_code). To do this, we'll use the GNU C++ compiler, `g++`. +Once you have saved your file, go to the container shell, make sure (e.g. with `ls -l`) that you your file is in the current directory and type this in your shell. + +```bash +g++ hello_world.cc -o hello_world + +``` + +This compiles your code to an executable called `hello_world`. You can now run this by typing the following on +the shell command line, after which you'll see the subsequent output. + +```bash +./hello_world +``` + +```output +Hello world! This uses the ANSI C 'printf' statement +Hello world! This uses the C++ 'iostream' library to direct output to standard out. +Hello world! This uses the C++ 'iostream' library to direct output to standard error. +``` + +When you are working with the Open Data, you will be looping over events +and may find yourself making selections based on certain physics criteria. +To that end, you may want to familiarize yourself with the C++ syntax for +[loops](https://www.w3schools.com/cpp/cpp_for_loop.asp) +and +[conditionals](https://www.w3schools.com/cpp/cpp_conditions.asp). + +::::::::::::::::::::::::::::::::::::: keypoints + +- We must compile our C++ code before we can execute it. + +:::::::::::::::::::::::::::::::::::::::::::::::: + diff --git a/episodes/03-root-and-cpp-read-and-write.md b/episodes/03-root-and-cpp-read-and-write.md new file mode 100644 index 0000000..ac8fec5 --- /dev/null +++ b/episodes/03-root-and-cpp-read-and-write.md @@ -0,0 +1,697 @@ +--- +title: "Using ROOT with C++ to write and read a file" +teaching: 15 +exercises: 20 +--- + +:::::::::::::::::::::::::::::: questions + +- Why do I need to use ROOT? +- How do I use ROOT with C++? + +:::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::: objectives + +- Write a ROOT file using compiled C++ code. +- Read a ROOT file using compiled C++ code. + +:::::::::::::::::::::::::::::: + + +## Why ROOT? + +HEP data can be challenging! Not just to analyze but to store! The data don't lend themselves to +neat rows in a spreadsheet. One event might have 3 muons and the next event might have none. +One event might have 2 jets and the next event might have 20. What to do??? + +The ROOT toolkit provides a file format that can allow for efficient storage of this type of +data with heterogenous entries in each event. It *also* provides a pretty complete analysis +environment with specialized libraries and visualization packages. Until recently, you had +to install the entire ROOT package just to read a file. The software provided by CMS to +read the open data relies on some minimal knowledge of ROOT to access. From there, you +can write out more ROOT files for further analysis or dump the data (or some subset of the data) +to a format of your choosing. + +## Interfacing with ROOT + +ROOT is a toolkit. That is, it is a set of functions and libraries that can be utilized in a variety +of languages and workflows. It was originally written in C++ and lends itself nicely to being used in standard, +compiled C++ code. + +However, analysts wanted something more interactive, and so the ROOT team developed +[CINT, a C++ interpreter](https://root.cern.ch/root/html534/guides/users-guide/CINT.html). This gave users +an iteractive environment where they could type of C++ code one line at a time and have it executed +immediately. This gave rise to C++ scripts that many analysts use and in fact the sample +[ROOT tutorals](https://root.cern/doc/master/group__Tutorials.html) +are almost exclusively written as these C++ scripts (with a `.C` file extension). Because they are written +to run in CINT, they usually do not need the standard C++ `include` statements that you will see +in the examples below. + +With the rise of the popularity of python, a set of Python-C++ bindings were written and eventually +officially supported by the ROOT development team, called [PyROOT](https://root.cern/manual/python/). +Many analysts currently write the code which plots or fits their code using PyROOT, and we will +show you some examples later in this exercise. + +## What won't you learn here + +ROOT is an incredibly powerful toolkit and has *a lot* in it. It is heavily used by most +nuclear and particle physics experiments running today. As such, a full overview is beyond the +scope of this minimal tutorial! + +This tutorial will *not* teach you how to: + +* Make any plots more sophisticated than a basic histogram. +* Fit your data +* Use any of the HEP-specific libraries (e.g. `TLorentzVector`) + +## OK, where *can* I learn that stuff? + +There are some great resources and tutorials out there for going further. + +* [The official ROOT Primer](https://root.cern/primer/). The recommended starting point to learn what ROOT can do. +* [The official ROOT tutorials](https://root.cern/doc/master/group__Tutorials.html) This is a fairly comprehensive listing +of well-commented examples, written in C++ *scripts* that are designed to be run from within the ROOT C-interpreter. +* [ROOT tutorial (2022).](https://github.com/root-project/training/tree/master/SummerStudentCourse/2022). Tutorial from offical ROOT project for summer students. +* [Efficient analysis with ROOT](https://cms-opendata-workshop.github.io/workshop-lesson-root/). This is a more complete, +end-to-end tutorial on using ROOT in a CMS analysis workflow. It was created in 2020 by some of our CMS colleagues +for a separate workshop, but much of the material is relevant for the Open Data effort. It takes about 2.5 hours +to complete the tutorial. +* [ROOT tutorial from Nevis Lab (Columbia Univ.)](https://www.nevis.columbia.edu/~seligman/root-class/). Very complete and always up-to-date tutorial from our friends at Columbia. + +::::::::::::::::::: callout +## Be in the container! +For this episode, you'll still be running your code from the `my_root` docker container +that you launched in the previous episode. + +As you edit the files though, you may want to do the editing from your *local* environment, +so that you have access to your preferred editors. +::::::::::::::::::::: + +## ROOT terminology + +To store these datasets, ROOT uses an object called `TTree` (ROOT objects are often prefixed by a `T`). + +Each variable on the `TTree`, for example the transverse momentum of a muon, is stored in its own +`TBranch`. + +:::::::::::::::::::::: callout + +## Source code for C++ lessons + +There's a fair amount of C++ in this lesson and you'll get the most out of it if you work through it all and type it all out. + +However, it's easy to make a mistake, particularly with the `Makefile`. We've made the the source code available in this +[tarball](https://github.com/cms-opendata-workshop/workshop2022-lesson-cpp-root-python/blob/gh-pages/data/root_and_cpp_tutorial_source_code.tgz). Just follow the link and click on the **Download** button or download it directly with `wget` to you working directory `cms_open_data_root` in your local bash terminal with: + +```bash +wget https://raw.githubusercontent.com/cms-opendata-workshop/workshop2022-lesson-cpp-root-python/gh-pages/data/root_and_cpp_tutorial_source_code.tgz +``` + +Untar the file with: +```bash +tar -zxvf root_and_cpp_tutorial_source_code.tgz +``` + +It will create a directory called `root_and_cpp_tutorial_source_code` with the files in it. +:::::::::::::::::::::::: + +## Write to a file + +Let's create a file with name `write_ROOT_file.cc` using our preferred editor. We'll call this file `write_ROOT_file.cc` and it will be saved in the `cms_open_data_root` directory. + +As in our last example, we first `include` some header files, both the standard C++ ones +and some new ROOT-specific ones. + +```cpp +#include +#include +#include + +#include "TROOT.h" +#include "TTree.h" +#include "TFile.h" +#include "TRandom.h" +``` + +Note the inclusion of `TRandom.h`, which we'll be using to generate some random data for our +test file. + +Next, we'll create our `main` function and start it off by defining our ROOT +file object. We'll also include some explanatory comments, which in the C++ syntax +are preceded by two slashes, `//`. + +```cpp +int main() { + + // Create a ROOT file, f. + // The first argument, "tree.root" is the name of the file. + // The second argument, "recreate", will recreate the file, even if it already exists. + TFile f("tree.root","recreate"); + + return 0; +} +``` + +Now we define the `TTree` object which will hold all of our variables and the data they represent. + +This line comes after the `TFile` creation, but before the `return 0` statement at the end of the +main function. Subsequent edits will also follow the previous edit but come before `return 0` statement. + +```cpp + // A TTree object called t1. + // The first argument is the name of the object as stored by ROOT. + // The second argument is a short descriptor. + TTree t1("t1","A simple Tree with simple variables"); +``` + +For this example, +we'll assume we're recording the missing transverse energy, which means +there is only one value recorded for each event. + +We'll also record the energy and momentum (transverse momentum, eta, phi) +for jets, where there could be between 0 and 5 jets in each event. + +This means we will define some C++ variables that will be used in the program. +We do this *before* we define the `TBranch`es in the `TTree`. + +When we define the variables, we use ROOT's `Float_t` and `Int_t` types, which +are analogous to `float` and `int` but are less dependent on the underlying +computer OS and architecture. + +```cpp + Float_t met; // Missing energy in the transverse direction. + + Int_t njets; // Necessary to keep track of the number of jets + + // We'll define these assuming we will not write information for + // more than 16 jets. We'll have to check for this in the code otherwise + // it could crash! + Float_t pt[16]; + Float_t eta[16]; + Float_t phi[16]; +``` + +We now define the `TBranch` for the `met` variable. + +```cpp + // The first argument is ROOT's internal name of the variable. + // The second argument is the *address* of the actual variable we defined above + // The third argument defines the *type* of the variable to be stored, and the "F" + // at the end signifies that this is a float + t1.Branch("met",&met,"met/F"); +``` + +Next we define the `TBranch`es for each of the other variables, but the syntax is slightly different +as these are acting as arrays with a varying number of entries for each event. + +```cpp + // First we define njets where the syntax is the same as before, + // except we take care to identify this as an integer with the final + // /I designation + t1.Branch("njets",&njets,"njets/I"); + + // We can now define the other variables, but we use a slightly different + // syntax for the third argument to identify the variable that will be used + // to count the number of entries per event + t1.Branch("pt",&pt,"pt[njets]/F"); + t1.Branch("eta",&eta,"eta[njets]/F"); + t1.Branch("phi",&phi,"phi[njets]/F"); +``` + +OK, we've defined where everything will be stored! Let's now generate 1000 mock events. + +```cpp + + Int_t nevents = 1000; + + for (Int_t i=0;iRndm() + 10; + + // Generate random number between 0-5, inclusive + njets = gRandom->Integer(6); + + for (Int_t j=0;jRndm(); + eta[j] = 6*gRandom->Rndm(); + phi[j] = 6.28*gRandom->Rndm() - 3.14; + } + + // After each event we need to *fill* the TTree + t1.Fill(); + } + + // After we've run over all the events, we "change directory" (cd) to the file object + // and write the tree to it. + // We can also print the tree, just as a visual identifier to ourselves that + // the program ran to completion. + f.cd(); + t1.Write(); + t1.Print(); +``` + +The full `write_ROOT_file.cc` should now look like this: + +::::::::::::::::::::::: spoiler + +## Full source code file for `write_ROOT_file.cc` + +```cpp +#include +#include +#include + +#include "TROOT.h" +#include "TTree.h" +#include "TFile.h" +#include "TRandom.h" + +int main() { + + // Create a ROOT file, f. + // The first argument, "tree.root" is the name of the file. + // The second argument, "recreate", will recreate the file, even if it already exists. + TFile f("tree.root", "recreate"); + + // A TTree object called t1. + // The first argument is the name of the object as stored by ROOT. + // The second argument is a short descriptor. + TTree t1("t1", "A simple Tree with simple variables"); + + Float_t met; // Missing energy in the transverse direction. + + Int_t njets; // Necessary to keep track of the number of jets + + // We'll define these assuming we will not write information for + // more than 16 jets. We'll have to check for this in the code otherwise + // it could crash! + Float_t pt[16]; + Float_t eta[16]; + Float_t phi[16]; + + // The first argument is ROOT's internal name of the variable. + // The second argument is the *address* of the actual variable we defined above + // The third argument defines the *type* of the variable to be stored, and the "F" + // at the end signifies that this is a float + t1.Branch("met",&met,"met/F"); + + // First we define njets where the syntax is the same as before, + // except we take care to identify this as an integer with the final + // /I designation + t1.Branch("njets", &njets, "njets/I"); + + // We can now define the other variables, but we use a slightly different + // syntax for the third argument to identify the variable that will be used + // to count the number of entries per event + t1.Branch("pt", &pt, "pt[njets]/F"); + t1.Branch("eta", &eta, "eta[njets]/F"); + t1.Branch("phi", &phi, "phi[njets]/F"); + + Int_t nevents = 1000; + + for (Int_t i=0;iRndm() + 10; + + // Generate random number between 0-5, inclusive + njets = gRandom->Integer(6); + + for (Int_t j=0;jRndm(); + eta[j] = 6*gRandom->Rndm(); + phi[j] = 6.28*gRandom->Rndm() - 3.14; + } + + // After each event we need to *fill* the TTree + t1.Fill(); + } + + // After we've run over all the events, we "change directory" to the file object + // and write the tree to it. + // We can also print the tree, just as a visual identifier to ourselves that + // the program ran to completion. + f.cd(); + t1.Write(); + t1.Print(); + + return 0; +} +``` +::::::::::::::::::::::::::::::::: + + +Because we need to compile this in such a way that it links to the ROOT libraries, we will use a `Makefile` +to simplify the build process. + +Create a new file called `Makefile` in the same directory as `write_ROOT_file.cc` and add the following to the +file. You'll most likely do this with the editor of your choice. + +```makefile +CC=g++ + +CFLAGS=-c -g -Wall `root-config --cflags` + +LDFLAGS=`root-config --glibs` + +all: write_ROOT_file + +write_ROOT_file: write_ROOT_file.cc + $(CC) $(CFLAGS) -o write_ROOT_file.o write_ROOT_file.cc + $(CC) -o write_ROOT_file write_ROOT_file.o $(LDFLAGS) +``` + +:::::::::::::::::::::::: callout +## Warning! Tabs are important in Makefiles! + +Makefiles have been around a long time and are used for many projects, not just +C/C++ code. While other build tools are slowly supplanting them (e.g. CMake), +Makefiles are a pretty tried and true standard and it is worth taking time +at some point and learning [more about them](https://www.tutorialspoint.com/makefile/makefile_rules.htm). + +One frustrating thing though can be a Makefile's reliance on *tabs* for specific purposes. +In the example above, the following lines are preceeded by a *tab* and ***not*** four (4) spaces. + +```makefile + $(CC) $(CFLAGS) -o write_ROOT_file.o write_ROOT_file.cc + $(CC) -o write_ROOT_file write_ROOT_file.o $(LDFLAGS) +``` + +If your Makefile has spaces at those points instead of a tab, `make` will not work for you +and you will get an error. +:::::::::::::::::::::: + + +You can now compile and run your compiled program from the command line of your `my_root` container shell! + +```bash +make write_ROOT_file +./write_ROOT_file +``` + +:::::::::::::::::::::::::::::: spoiler +## Output from `write_ROOT_file` + +```output +****************************************************************************** +*Tree :t1 : A simple Tree with simple variables * +*Entries : 1000 : Total = 51536 bytes File Size = 36858 * +* : : Tree compression factor = 1.35 * +****************************************************************************** +*Br 0 :met : met/F * +*Entries : 1000 : Total Size= 4542 bytes File Size = 3641 * +*Baskets : 1 : Basket Size= 32000 bytes Compression= 1.12 * +*............................................................................* +*Br 1 :njets : njets/I * +*Entries : 1000 : Total Size= 4552 bytes File Size = 841 * +*Baskets : 1 : Basket Size= 32000 bytes Compression= 4.84 * +*............................................................................* +*Br 2 :pt : pt[njets]/F * +*Entries : 1000 : Total Size= 14084 bytes File Size = 10445 * +*Baskets : 1 : Basket Size= 32000 bytes Compression= 1.29 * +*............................................................................* +*Br 3 :eta : eta[njets]/F * +*Entries : 1000 : Total Size= 14089 bytes File Size = 10424 * +*Baskets : 1 : Basket Size= 32000 bytes Compression= 1.30 * +*............................................................................* +*Br 4 :phi : phi[njets]/F * +*Entries : 1000 : Total Size= 14089 bytes File Size = 10758 * +*Baskets : 1 : Basket Size= 32000 bytes Compression= 1.26 * +*............................................................................* +``` + +::::::::::::::::::::::::::: + +Your numbers may be slightly different because of the random numbers that +are generated. + +After you've run this, you can look in this directory (`ls -ltr`) and see +if you have a new file called `tree.root`. This is the output +of what you just ran. + +Huzzah! You've successfully written your first ROOT file! + +:::::::::::::::::::: callout +## Will I have to `make` my Open Data analysis code? + +Maybe! + +- If you prefer to write your end-level analysis code in C++, your `make` setup will be very similar to this exercise +- If you are analyzing AOD or MiniAOD data and using the dedicated CMS software, a configuration and build system called [SCRAM](https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideScram) is used for this purpose of compiling and linking code. +- If you only analyze NanoAOD samples and do so in python (see the upcoming lesson pages!), then you will not use `make` +:::::::::::::::::::::: + + +## Read a ROOT file + +Let's try to read the `tree.root` file now. We won't do much with but we'll try to understand the process necessary +to read in all the data and loop over this event-by-event. + +You will now edit (with your favourite editor) a file called `read_ROOT_file.cc`. In this file, + you'll add the following code. + +We'll start with the basic include statements and the main program. + +```cpp +#include +#include +#include + +#include "TROOT.h" +#include "TTree.h" +#include "TFile.h" +#include "TRandom.h" + +int main() { + + return 0; +} +``` + +In the `main` function, we'll define the input file. + +```cpp + // Here's the input file + // Without the 'recreate' argument, ROOT will assume this file exists to be read in. + TFile f("tree.root"); +``` + +We'll make use of the built-in member functions to `TFile` to pull out the `TTree` named +`t1`. There's a few other things to note. + +First, we're going to assign it to a local variable named `input_tree`. This is to emphasize +that `t1` is just a string that refers to the name of the object stored in the `TFile` and +that we can assign it to any variable name, not just one named `t1`. + +The second thing to note is that we are going to create a *pointer* to `input_tree`, which makes +some of the memory management easier. This means that we precede our variable name with +an asterix `*`, we have to cast the object pulled out of the `TFile` as a `TTree` pointer (`TTree*`), +and subsequent uses of `input_tree` will access data members and member functions with +the `->` operator rather than a period `.`. + +If you want to learn more about pointers, there are [many](https://www.cplusplus.com/doc/tutorial/pointers/), +[many](https://www.w3schools.com/cpp/cpp_pointers.asp), +[resources](https://www.tutorialspoint.com/cplusplus/cpp_pointers.htm) out there. + +```cpp + // We will now "Get" the tree from the file and assign it to + // a new local variable. + TTree *input_tree = (TTree*)f.Get("t1"); +``` + +Just as we did in the `write_ROOT_file.cc` example, we will define some local variables. +These variables will actually get "filled" by the ROOT file when we loop over the events. + +```cpp + Float_t met; + Int_t njets; + Float_t pt[16]; + Float_t eta[16]; + Float_t phi[16]; +``` + +We'll now assign these local variables to specific `TBranch`es in `input_tree`. Note +that we'll be using the *address* of each local variable when we precede +the variable name with an ampersand `&`. + +```cpp + // Assign these variables to specific branch addresses + input_tree->SetBranchAddress("met", &met); + input_tree->SetBranchAddress("njets", &njets); + input_tree->SetBranchAddress("pt", &pt); + input_tree->SetBranchAddress("eta", &eta); + input_tree->SetBranchAddress("phi", &phi); + +``` + +We're ready now to loop over events! Each time we call `input_tree->GetEntry(i)`, +it pulls the `i`th values out of `input_tree` and "fills" the local variables +with those values. + +```cpp + for (Int_t i=0;iGetEntry(i); + + // Print the number of jets in this event + printf("%d\n", njets); + + // Print out the momentum for each jet in this event + for (Int_t j=0; j +#include +#include + +#include "TROOT.h" +#include "TTree.h" +#include "TFile.h" +#include "TRandom.h" + +int main() { + + // Here's the input file + // Without the 'recreate' argument, ROOT will assume this file exists to be read in. + TFile f("tree.root"); + + // We will now "Get" the tree from the file and assign it to + // a new local variable. + TTree *input_tree = (TTree*)f.Get("t1"); + + Float_t met; // Missing energy in the transverse direction. + + Int_t njets; // Necessary to keep track of the number of jets + + // We'll define these assuming we will not write information for + // more than 16 jets. We'll have to check for this in the code otherwise + // it could crash! + Float_t pt[16]; + Float_t eta[16]; + Float_t phi[16]; + + // Assign these variables to specific branch addresses + input_tree->SetBranchAddress("met", &met); + input_tree->SetBranchAddress("njets", &njets); + input_tree->SetBranchAddress("pt", &pt); + input_tree->SetBranchAddress("eta", &eta); + input_tree->SetBranchAddress("phi", &phi); + + // Get the number of events in the file + Int_t nevents = input_tree->GetEntries(); + + for (Int_t i=0;iGetEntry(i); + + // Print the number of jets in this event + printf("%d\n",njets); + + // Print out the momentum for each jet in this event + for (Int_t j=0; j +#include +#include + +#include "TROOT.h" +#include "TTree.h" +#include "TFile.h" +#include "TRandom.h" +#include "TH1F.h" + +int main() { + + // Here's the input file + // Without the 'recreate' argument, ROOT will assume this file exists to be read in. + TFile f("tree.root"); + + // Let's make an output file which we'll use to save our + // histogram + TFile fout("output.root","recreate"); + + // We define an histogram for the transverse momentum of the jets + // The arguments are as follow + // * Internal name of the histogram + // * Title that will be used if the histogram is plotted + // * Number of bins + // * Low edge of the lowest bin + // * High edge of the highest bin + TH1F h1("h1","jet pT (GeV/c)",50,0,150); + + // We will now "Get" the tree from the file and assign it to + // a new local variable. + TTree *input_tree = (TTree*)f.Get("t1"); + + Float_t met; // Missing energy in the transverse direction. + + Int_t njets; // Necessary to keep track of the number of jets + // We'll define these assuming we will not write information for + // more than 16 jets. We'll have to check for this in the code otherwise + // it could crash! + Float_t pt[16]; + Float_t eta[16]; + Float_t phi[16]; + + // Assign these variables to specific branch addresses + input_tree->SetBranchAddress("met",&met); + input_tree->SetBranchAddress("njets",&njets); + input_tree->SetBranchAddress("pt",&pt); + input_tree->SetBranchAddress("eta",&eta); + input_tree->SetBranchAddress("phi",&phi); + + // Get the number of events in the file + Int_t nevents = input_tree->GetEntries(); + + for (Int_t i=0;iGetEntry(i); + + // Print the number of jets in this event + printf("%d\n",njets); + + // Print out the momentum for each jet in this event + for (Int_t j=0;jPrint() +``` + +Please copy the output this statement generates and paste it into the corresponding section in our [assignment form](https://docs.google.com/forms/d/e/1FAIpQLSdxsc-aIWqUyFA0qTsnbfQrA6wROtAxC5Id4sxH08STTl8e5w/viewform); remember you must sign in and click on the submit button in order to save your work. You can go back to edit the form at any time. +Then, quit ROOT. + +:::::::::::::::::::::::::::::::: + +## Using a ROOT script + +We could also loop over all the events, create and save the histogram, but also +draw the histogram onto a `TCanvas` object and have it pop up, all from a ROOT +script and the CINT. + +First, let's copy over our C++ source code into a C++ script. + +```bash +cp fill_histogram.cc fill_histogram_SCRIPT.C +``` + +Next we'll remove the headers at the beginning and even get rid of the `int main` designation, +though we keep the curly brackets. + +We'll also define a `TCanvas` object on which we'll plot our histogram. After we do that, +we "change directory" to the canvas and draw our histogram. We can even save it to a +`.png` file. + +```cpp + // Declare a TCanvas with the following arguments + // * Internal name of the TCanvas object + // * Title to be displayed when it is drawn + // * Width of the canvas + // * Height of the canvas + TCanvas *c1 = new TCanvas("c1", "Canvas on which to display our histogram", 800, 400); + + c1->cd(0); + h1.Draw(); + c1->SaveAs("h_pt.png"); +``` + +Your `fill_histogram_SCRIPT.C` should look like this: + +:::::::::::::::::::::::::: spoiler + +## Source code for `fill_histogram_SCRIPT.C` + +```cpp +{ + + // Here's the input file + // Without the 'recreate' argument, ROOT will assume this file exists to be read in. + TFile f("tree.root"); + + // Let's make an output file which we'll use to save our + // histogram + TFile fout("output.root","recreate"); + + // We define an histogram for the transverse momentum of the jets + // The arguments are as follow + // * Internal name of the histogram + // * Title that will be used if the histogram is plotted + // * Number of bins + // * Low edge of the lowest bin + // * High edge of the highest bin + TH1F h1("h1","jet pT (GeV/c)",50,0,150); + + // We will now "Get" the tree from the file and assign it to + // a new local variable. + TTree *input_tree = (TTree*)f.Get("t1"); + + Float_t met; // Missing energy in the transverse direction. + + Int_t njets; // Necessary to keep track of the number of jets + + // We'll define these assuming we will not write information for + // more than 16 jets. We'll have to check for this in the code otherwise + // it could crash! + Float_t pt[16]; + Float_t eta[16]; + Float_t phi[16]; + + // Assign these variables to specific branch addresses + input_tree->SetBranchAddress("met",&met); + input_tree->SetBranchAddress("njets",&njets); + input_tree->SetBranchAddress("pt",&pt); + input_tree->SetBranchAddress("eta",&eta); + input_tree->SetBranchAddress("phi",&phi); + + // Get the number of events in the file + Int_t nevents = input_tree->GetEntries(); + + for (Int_t i=0;iGetEntry(i); + + // Print the number of jets in this event + printf("%d\n",njets); + + // Print out the momentum for each jet in this event + for (Int_t j=0;jcd(0); + h1.Draw(); + c1->SaveAs("h_pt.png"); + + fout.cd(); + h1.Write(); + fout.Close(); + +} +``` +::::::::::::::::::::::::::::: + +To run this, you need only type the following on the command line. + +```bash +root -l fill_histogram_SCRIPT.C +``` + +You'll be popped into the CINT environment and you should see the following plot pop up! + +::::::::::::: callout + +## TBrowser + +![](fig/h_pt.png) + +:::::::::::: + +Exit from the container. If you are using a container with VNC, first stop VNC with `stop_vnc`. + +::::::::::::::::: keypoints + +- You can quickly inspect your data using just ROOT +- A simple ROOT script is often all you need for diagnostic work + +::::::::::::::::: + diff --git a/episodes/05-using-root-with-python.md b/episodes/05-using-root-with-python.md new file mode 100644 index 0000000..ab36eab --- /dev/null +++ b/episodes/05-using-root-with-python.md @@ -0,0 +1,92 @@ +--- +title: "Using ROOT with python" +teaching: 5 +exercises: 0 +--- + +:::::::::::::: questions + +- Can I call ROOT from python? + +:::::::::::::: + +:::::::::::::: objectives + +- Find resources for PyROOT +- Find resources for Scikit-HEP + +:::::::::::::: + + +## PyROOT + +The PyROOT project started with Wim Lavrijsen in the late `00s and became very popular, +paralleling the rise of more general python tools within the community. Python has become the primary analysis language for the majority of HEP experimentalists. It has a +rich ecosystem that is constantly evolving. This is a good thing because it means that improvements +and new tools are always being developed, but it can sometimes be a challenge to keep up with the +latest and greatest projects! :) + +If you want to learn how to use PyROOT, you can go through some individual examples +[here](https://root.cern.ch/doc/master/group__tutorial__pyroot.html), or a more guided tutorial +[here](https://root.cern/manual/python/). + +Feel free to challenge yourself to rewrite the previous C++ code using PyROOT! + +## Scikit-HEP libraries + +Over the past several years, an effort has developed to provide more python tools that can interface with CMS ROOT file formats as well as typical scientific python tools used widely beyond particle physics. +We will use several of the [Scikit-HEP](https://scikit-hep.org/) libraries to analyze NanoAOD: `uproot`, `awkward`, and `vector`. As CMS datasets grow larger, we increasingly rely on tools for array-based +data processing in python, and the scikit-HEP tools are very important for that task. + +You can check out a tutorial for many of their tools [here](https://hsf-training.github.io/hsf-training-scikit-hep-webpage/). + +## Using the Python docker container + +The tools in the Python docker container will allow you to can easily open +and analyze ROOT files. This is useful for when you make use of the CMS open data tools to skim +some subset of the open data and then copy it to your local laptop, desktop, or perhaps an +HPC cluster at your home institution. + +If you completed the [Docker pre-exercises](https://cms-opendata-workshop.github.io/workshopwhepp-lesson-docker/) +you should already have worked through +[this episode](https://cms-opendata-workshop.github.io/workshopwhepp-lesson-docker/03-docker-for-cms-opendata/index.html), under **Download the docker images for ROOT and python tools and start container**, and you will have + +- a working directory `cms_open_data_python` on your local computer +- a docker container with name `my_python` created with the working directory `cms_open_data_python` mounted into the `/code` directory of the container. + +Start your python container with + +```bash +docker start -i my_python +``` + +In the container, you will be in the `/code` directory and it shares the files with your local `cms_open_data_python` directory. + +:::::::::::: callout + +## If you're using apptainer: + +Whenever you see a `docker start` instruction, replace it with `apptainer shell` to open either the ROOT or Python container image. +The specific commands in this pre-exercise and during the live workshop will be given for docker, since that is the most common application. +As a general rule, editing of files will be done in the standard terminal (the containers do not have all text editors!) or via the jupyter-lab interface, and then commands will be executed inside the container shell. If you see `Singularity>` on your command line, you are ready to run a ROOT or python script. + +::::::::::::: + +If you want to test out the installation, from within Docker you can launch and +interactive python session by typing `python` (in Docker) and then trying + +```python +import uproot +import awkward as ak +``` + +If you don't get any errors then congratulations! You have a working environment and you are ready to +perform some HEP analysis with your new python environment! + +:::::::::::::: keypoints + +- PyROOT is a complete interface to the ROOT libraries +- Scikit-HEP provides tools to interface between ROOT and global scientific python tools +- We will use `uproot`, `awkward`, and `vector` in our NanoAOD analysis + +:::::::::::::: diff --git a/episodes/06-uproot.md b/episodes/06-uproot.md new file mode 100644 index 0000000..f0ca2d5 --- /dev/null +++ b/episodes/06-uproot.md @@ -0,0 +1,240 @@ +--- +title: "Using uproot to open ROOT files" +teaching: 10 +exercises: 10 +--- + +::::::::::: questions + +- How do I open a ROOT file with uproot? +- How do I explore what is in the file? + +::::::::::: + +::::::::::: objectives + +- Use a different library than ROOT to open ROOT files +- Get comfortable with a different way of examining ROOT files + +::::::::::: + + +## Other resources + +Before we go any further, we point out that this episode and the next are only the +*most basic* introductions to `uproot` and `awkward`. There is a plethora of material +that go much deeper and we list just a few here. + +* [Official uproot documentation](https://uproot.readthedocs.io/en/latest/basic.html) +* [HSF uproot and awkward tutorial](https://hsf-training.github.io/hsf-training-uproot-webpage/aio/index.html) +* [Uproot, awkward, and columnar analysis](https://github.com/jpivarski-talks/2020-06-08-uproot-awkward-columnar-hats) +from Jim Pivarski. + +## How to type these commands? + +Now that you've installed the necessary python modules in your `my_python` container you can choose to write and execute the code however +you like. There are a number of options, but we will point out two here. + +* [Jupyter notebook](https://jupyter.org/). This provides an editor and an evironment in which to run +your python code. Often you will run the code one *cell* at a time, but you could always put all your +code in one cell if you prefer. There are many, many tutorials out there on using Jupyter notebooks +and if you chose to use Jupyter as your editing/executing environment that you have developed some +familiarity with it. + + * In the `my_python` container, you can start `jupyter-lab` with + ```bash + jupyter-lab --ip=0.0.0.0 --no-browser + ``` + and open the link given in the message on your browser. Choose the icon under "Notebook". + +* Python scripts. In this approach, you edit the equivalent of a text file and then pass that text +file into a python interpreter. For example, if you edited a file called `hello_world.py` such that +it contained + +```python +print("Hello world!") +``` + +You could save the file and then (perhaps in another Terminal window), execute + +```bash +python hello_world.py +``` + +This would interpret your text file as python commands and produce the output + +```output +Hello world! +``` +We leave it to you to decide which approach you prefer. + + +## Open a file + +Let's open a ROOT file! +If you're writing a python script, let's call it `open_root_file.py` and if you're using +a Jupyter notebook, let's call it `open_root_file.ipynb`. If you are working in the container, you will open and *edit* the python script on your local computer and *run* it in the container, or you will open a notebook on your jupyter-lab window in the browser. + +First we will import the `uproot` library, as well as some other standard +libraries. These can be the first lines of your python script or the first cell of your Jupyter notebook. + +*If this is a script, you may want to run `python open_root_file.py` every few lines or so to see the output. +If this is a Jupyter notebook, you will want to put each snippet of code in its own cell and execute +them as you go to see the output.* + + +```python +import numpy as np +import matplotlib.pylab as plt + +import uproot +import awkward as ak +``` + +Let's open the file! We'll make use of `uproot`s use of [XRootD](https://xrootd.slac.stanford.edu/) to +read in the file *over the network*. This will save us from having to download the file. + +```python +infile_name = 'root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/ForHiggsTo4Leptons/SMHiggsToZZTo4L.root' + +infile = uproot.open(infile_name) +``` + +::::::::::::::::: callout +## Download the file? +If too many people are trying to open the same file, it may be easier to download the file +to your laptop. +You can execute the following command in the bash terminal. + +```python +curl http://opendata.cern.ch/record/12361/files/SMHiggsToZZTo4L.root --output SMHiggsToZZTo4L.root +``` + +Alternatively, you can follow [this link](https://opendata.cern.ch/record/12361) to the data record +on the CERN Open Data Portal. If you scroll down to the bottom of the page and click +the **Download** button. + +For the remainder of this tutorial you will want the file to be in the same directory/folder +as your python code, whether you are using a Jupyter notebook or a simple python script. So make +sure you move this file to that location after you have downloaded it. + +To read in the file, you'll change one line to define the input file to be +```python +infile_name = 'SMHiggsToZZTo4L.root' +``` +:::::::::::::::::: + +## Investigate the file + +So you've opened the file with `uproot`. What is this `infile` object? Let's add the following code + +```python +print(type(infile)) +``` + +and we get + +```output + +``` + +We can interface with this object similar to how we would interface +with a [python dictionary](https://www.w3schools.com/python/python_dictionaries.asp). + + +```python +keys = infile.keys() + +print(keys) +``` + +```output +['Events;1'] +``` + +But what is this? + +```python +events = infile['Events'] + +print(type(events)) +``` + +```output + +``` + +Ah, this is the `TTree` object that we learned a bit about in the previous episodes! Let's see what's in it! + + +```python +branches = infile['Events'].keys() + +for branch in branches: + print(f"{branch:20s} {infile['Events'][branch]}") +``` + +```output +run +luminosityBlock +event +PV_npvs +PV_x +PV_y +PV_z +nMuon +Muon_pt +Muon_eta +Muon_phi +Muon_mass +Muon_charge +Muon_pfRelIso03_all +Muon_pfRelIso04_all +Muon_dxy +Muon_dxyErr +Muon_dz +Muon_dzErr +nElectron +Electron_pt +Electron_eta +Electron_phi +Electron_mass +Electron_charge +Electron_pfRelIso03_all +Electron_dxy +Electron_dxyErr +Electron_dz +Electron_dzErr +MET_pt +MET_phi +``` + +There are multiple syntax you can access each of these branches. + + +```python +pt = infile['Events']['Muon_pt'] + +# or + +pt = infile['Events/Muon_pt'] + +# or + +pt = events.Muon_pt + +# or + +pt = events['Muon_pt'] +``` + +We'll use that last one for this lesson just to save some typing. :) + +In the next episode we'll use the `awkward` array object when we extract these data +and see how we can use `awkward` in a standard-but-slow way or in a clever-and-fast way! + +::::::::::: keypoints + +- You can use uproot to interface with ROOT files which is often easier than installing the full ROOT ecosystem. + +::::::::::: diff --git a/episodes/07-awkward.md b/episodes/07-awkward.md new file mode 100644 index 0000000..8645338 --- /dev/null +++ b/episodes/07-awkward.md @@ -0,0 +1,422 @@ +--- +title: "Using awkward arrays to analyze HEP data" +teaching: 10 +exercises: 10 +--- + +:::::::::::: questions + +- What are awkward arrays? +- How do I work with awkward arrays? + +:::::::::::: + +:::::::::::: objectives + +- Learn to use awkward arrays in a simple, naive way. +- Learn to use awkward arrays in a much faster way, making use of the built-in functionality of the library. + +:::::::::::: + +::::::::::::::::::: testimonial +## Why awkward? +A natural question to ask would be "*Why do I have to learn about awkward? Why can't I just +use numpy?*" And yes, those are two questions. :) The quick answers are: you don't have to +and you can! But let's dig deeper. + +Awkward-array is a newer python tool written by Jim Pivarski (Princeton) and others that +allows for very fast manipulation of ``*jagged*" arrays, like we find in HEP. You definitely +don't *have* to use it, but we present it here because it can speed things up considerably...if +you know how to write your code. + +Similarly, you can of course use standard `numpy` arrays for many parts of your analysis, but +your data may not always fit into `numpy` arrays without some careful attention to your code. +In either case, you'll want to think about how to work with your data once you get it out +of your file. +::::::::::::::::::::: + +## Environment + +Use the Python environment that you set up in the previous two lesson pages, such as your `my_python` docker container. +We leave it up to you whether or not you write and execute this code in a script or as a Jupyter notebook. + +## Numpy arrays: a review + +Before we dive into `awkward` lets review some of the awsome aspects of `numpy` +arrays...and also examine their limitations. + +Let's import `numpy` (if you have not already done that in this python session). + +```python +import numpy as np +``` + +Next, let's make a simple `numpy` array to use for our exmples. + +```python +x = np.array([1, 2, 3, 4]) +print(x) +``` + +```output +[1 2 3 4] +``` + +Numpy arrays are great because we can perform mathematical operations on them +and the operation is quickly and efficiently carried out on every member of the array. + +```python +y = 2*x +print(y) +print() # This just puts a blank line between our other print statements + +z = x**2 +print(z) +print() + +a = np.sqrt(x) +print(a) +``` + +```output +[2 4 6 8] + +[ 1 4 9 16] + +[1. 1.41421356 1.73205081 2. ] +``` + +Note that in that last operation where we take the square root, we made use of a +`numpy` function `sqrt`. That function ``knows" how to operate on the elements of +a numpy array, as opposed to the standard python `math` library which does *not* know +how to work with `numpy` arrays. + +So this seems great! However, `numpy` arrays break down when you have arrays +that are not 1D and cannot be expressed in a regular ``n x m" format. +For example, suppose you have two events and each event has two muons and you +want to store the transverse momentum ($p_T$) for these muons in a `numpy` array. + +```python +pt = np.array([[20.9, 12.3], [127.1, 60.2]]) +x = 2*pt + +print(pt) +print() +print(x) +``` + + +```output +[[ 20.9 12.3] + [127.1 60.2]] + + [[ 41.8 24.6] + [254.2 120.4]] +``` + +Great! Everything looks good and we get our expected behavior! + +Now, suppose there are *3* muons in the second event. Does this still work? + +```python +pt = np.array([[20.9, 12.3], [127.1, 60.2, 23.8]]) +x = 2*pt + +print(x) +``` + +```output +[list([20.9, 12.3, 20.9, 12.3]) + list([127.1, 60.2, 23.8, 127.1, 60.2, 23.8])] +``` + +Wait...what??? It looks like it just duplicated the entries so now it looks +like we're storing information for *4* muons in the first event and *6* muons +in the second event! A closer looks shows us that while `pt` is a `numpy` +array, the elements are not arrays but python `list` objects, which behave differently. + +The reason this happened is that `numpy` arrays can't deal with this type of +``jagged" behavior where the first row of your data might have 2 elements +and the second row might have 3 elements and the third row might have 0 elements +and so on. For that, we need `awkward-array`. + +## Access or download a ROOT file for use with this exercise + +We'll work with the same file as in the previous lesson. If you have jumped straight to +this lesson, please go back and review how to access the file over the network +or by downloading it. + +## Open the file + +::::::::::::::::::::::::: prereq +## *Stop!* + +If you haven't already, make sure you have run through the +[previous lesson](https://cms-opendata-workshop.github.io/workshop2024-lesson-cpp-root-python/06-uproot/index.html) on working with uproot. +:::::::::::::::::::::::: + +Let's open this ROOT file! If you're writing a python script, let's call it `open_root_file_and_analyze_data.py` and if you're using +a Jupyter notebook, let's call it `open_root_file_and_analyze_data.ipynb`. + +First we will import the `uproot` library, as well as some other standard +libraries. These can be the first lines of your python script or the first cell of your Jupyter notebook. + +*If this is a script, you may want to run `python open_root_file_and_analyze_data.py` every few lines or so to see the output. +If this is a Jupyter notebook, you will want to put each snippet of code in its own cell and execute +them as you go to see the output.* + +```python +import numpy as np +import matplotlib.pylab as plt +import time + +import uproot +import awkward as ak +``` + +Let's open the file and pull out some data. + +```python +# Depending on if you downloaded the file or not, you'll use either +infile_name = 'root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/ForHiggsTo4Leptons/SMHiggsToZZTo4L.root' +# or +#infile_name = 'SMHiggsToZZTo4L.root' +# Uncomment the above line if you downloaded the file. + +infile = uproot.open(infile_name) + +events = infile['Events'] + +pt = events['Muon_pt'] +eta = events['Muon_eta'] +phi = events['Muon_phi'] +``` + +Let's inspect these objects a little closer. To access the actual values, we'll see +we need to use the `.array()` member function. + +```python +print(pt) +print() + +print(pt.array()) +print() + +print(len(pt.array())) +print() + +for i in range(5): + print(pt.array()[i]) +``` + +```output + + +[[63, 38.1, 4.05], [], [], [54.3, 23.5, ... 43.1], [4.32, 4.36, 5.63, 4.75], [], []] + +299973 + +[63, 38.1, 4.05] +[] +[] +[54.3, 23.5, 52.9, 4.33, 5.35, 8.39, 3.49] +[] +``` + +Taking a closer look at the entries, we see different numbers of values in each ``row", where +the rows correspond to events recorded in the CMS detector. + +So can we use manipulate this object like a numpy array? Yes! If we're careful about accessing +the array properly. + +```python +x = 2*pt.array() + +print(x[0:5]) +``` + +```output +[[126, 76.2, 8.1], [], [], [109, 47, 106, 8.66, 10.7, 16.8, 6.98], []] +``` + +When we histogram, however, we need to make use of the `awkward.flatten` function. +This turns our `awkward` array into a 1-dimensional array, so that we lose all record of +what muon belonged to which event. + + +```python +print(ak.flatten(pt.array())) + +plt.figure() +plt.hist(ak.flatten(pt.array()),bins=100,range=(0,100)); +plt.xlabel(r'Muon $p_T$ (GeV/c)',fontsize=14) +``` + +```output +[63, 38.1, 4.05, 54.3, 23.5, 52.9, 4.33, ... 32.6, 43.1, 4.32, 4.36, 5.63, 4.75] +``` + +![](fig//awkward_muon_pt.png) + +We can also manipulate the data quite quickly. Let's see how! + +This sample file is Monte Carlo data that simulates the decay of Higgs bosons +to 4 charged leptons. Let's look for decays to 4 muons, where there are +two positively charged muons and 2 negatively charged muons. + +Since the data stores momentum information as $p_T, \eta, \phi$, +first we'll calculate the Cartesian $x,y,z$ components of momentum, and then +we'll *loop* over our events to calculate an invariant mass. We'll find that +looping over the entries is slow, but there is a faster way! + +First the slow but explicit way: + +:::::::::::::::::: spoiler +## Python code (event loop) + +```python +# Some helper functions + +def energy(m, px, py, pz): + E = np.sqrt( (m**2) + (px**2 + py**2 + pz**2)) + return E + +def invmass(E, px, py, pz): + m2 = (E**2) - (px**2 + py**2 + pz**2) + + if m2 < 0: + m = -np.sqrt(-m2) + else: + m = np.sqrt(m2) + return m + +def convert(pt, eta, phi): + px = pt * np.cos(phi) + py = pt * np.sin(phi) + pz = pt * np.sinh(eta) + + return px, py, pz + +# Convert momentum to x,y,z components + +muon_number = events['nMuon'].array() + +pt = events['Muon_pt'].array() +eta = events['Muon_eta'].array() +phi = events['Muon_phi'].array() +muon_q = events['Muon_charge'].array() +mass = events['Muon_mass'].array() + +muon_px,muon_py,muon_pz = convert(pt, eta, phi) +muon_e = energy(mass, muon_px, muon_py, muon_pz) + +# Do the calculation + +masses = [] + +nevents = len(pt) +print(f"Nevents: {nevents}") + +start = time.time() + +for n in range(nevents): + + if n%10000==0: + print(n) + + nmuons = muon_number[n] + + e = muon_e[n] + q = muon_q[n] + px = muon_px[n] + py = muon_py[n] + pz = muon_pz[n] + + + if nmuons < 4: + continue + + for i in range(0, nmuons-3): + for j in range(i+1, nmuons-2): + for k in range(j+1, nmuons-1): + for l in range(k+1, nmuons): + + if q[i] + q[j] + q[k] + q[l] == 0: + etot = e[i] + e[j] + e[k] + e[l] + pxtot = px[i] + px[j] + px[k] + px[l] + pytot = py[i] + py[j] + py[k] + py[l] + pztot = pz[i] + pz[j] + pz[k] + pz[l] + + m = invmass(etot, pxtot, pytot, pztot) + masses.append(m) + +print(f"Time to run: {(time.time() - start)} seconds") + +# Plot the results + +plt.figure() +plt.hist(masses,bins=140,range=(80,150)) +plt.xlabel(r'4-muon invariant mass (GeV/c$^2$',fontsize=18) +plt.show() +``` +::::::::::::::::::::::: + + +![](fig/awkward_Higgs_4l_loop.png) + +When I run this on my laptop, it takes a little over 3 minutes to run. Is there a better way? + +Yes! + +We've adapted some code from +[this tutorial](https://hsf-training.github.io/hsf-training-scikit-hep-webpage/04-awkward/index.html), +put together by the HEP Software Foundation to show you +how much faster using the built-in awkward functions can be: + +:::::::::::::::::::: spoiler +## Python code (awkward array) + +```python +start = time.time() + +muons = ak.zip({ + "px": muon_px, + "py": muon_py, + "pz": muon_pz, + "e": muon_e, + "q": muon_q, +}) + +quads = ak.combinations(muons, 4) + +mu1, mu2, mu3, mu4 = ak.unzip(quads) + +mass_fast = (mu1.e + mu2.e + mu3.e + mu4.e)**2 - ((mu1.px + mu2.px + mu3.px + mu4.px)**2 + (mu1.py + mu2.py + mu3.py + mu4.py)**2 + (mu1.pz + mu2.pz + mu3.pz + mu4.pz)**2) + +mass_fast = np.sqrt(mass_fast) + +qtot = mu1.q + mu2.q + mu3.q + mu4.q + +print(f"Time to run: {(time.time() - start)} seconds") + +plt.hist(ak.flatten(mass_fast[qtot==0]), bins=140,range=(80,150)); +plt.xlabel(r'4-muon invariant mass (GeV/c$^2$',fontsize=18) +plt.show() +``` +:::::::::::::::::::::::: + +![](fig/awkward_Higgs_4l_ak_zip.png) + +On my laptop, this takes less than 0.2 seconds! Note that we are making use of +[boolean arrays to perform masking](https://jakevdp.github.io/PythonDataScienceHandbook/02.06-boolean-arrays-and-masks.html) +when we type `mass_fast[qtot==0]`. + +While we cannot teach you *everything* about `awkward`, we hope we've given you a basic introduction +to what it can do and where you can find more information so that you can quickly process +the output of any of your open data jobs and get started on your own analysis! + +:::::::::::: keypoints + +- Awkward arrays can help speed up your code, but it requires a different way of thinking and more awareness of what functions are coded up in the awkward library. + +:::::::::::: diff --git a/episodes/fig/awkward_Higgs_4l_ak_zip.png b/episodes/fig/awkward_Higgs_4l_ak_zip.png new file mode 100755 index 0000000..08286a9 Binary files /dev/null and b/episodes/fig/awkward_Higgs_4l_ak_zip.png differ diff --git a/episodes/fig/awkward_Higgs_4l_loop.png b/episodes/fig/awkward_Higgs_4l_loop.png new file mode 100755 index 0000000..08286a9 Binary files /dev/null and b/episodes/fig/awkward_Higgs_4l_loop.png differ diff --git a/episodes/fig/awkward_muon_pt.png b/episodes/fig/awkward_muon_pt.png new file mode 100755 index 0000000..abc8308 Binary files /dev/null and b/episodes/fig/awkward_muon_pt.png differ diff --git a/episodes/fig/cpp_and_root_uproot_awkward_logo.png b/episodes/fig/cpp_and_root_uproot_awkward_logo.png new file mode 100755 index 0000000..c832a6d Binary files /dev/null and b/episodes/fig/cpp_and_root_uproot_awkward_logo.png differ diff --git a/episodes/fig/h_pt.png b/episodes/fig/h_pt.png new file mode 100755 index 0000000..7f3453e Binary files /dev/null and b/episodes/fig/h_pt.png differ diff --git a/episodes/fig/tbrowser_screenshot_00.png b/episodes/fig/tbrowser_screenshot_00.png new file mode 100755 index 0000000..adf40a8 Binary files /dev/null and b/episodes/fig/tbrowser_screenshot_00.png differ diff --git a/episodes/fig/tbrowser_screenshot_01.png b/episodes/fig/tbrowser_screenshot_01.png new file mode 100755 index 0000000..55bdc9d Binary files /dev/null and b/episodes/fig/tbrowser_screenshot_01.png differ diff --git a/episodes/introduction.md b/episodes/introduction.md index 7065d23..6a4dbbe 100644 --- a/episodes/introduction.md +++ b/episodes/introduction.md @@ -1,114 +1,68 @@ --- -title: "Using Markdown" -teaching: 10 -exercises: 2 +title: "Introduction" +teaching: 5 +exercises: 0 --- :::::::::::::::::::::::::::::::::::::: questions -- How do you write a lesson using Markdown and `{sandpaper}`? +- What is C++? +- What is ROOT? +- What is the point of these exercises? :::::::::::::::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::: objectives -- Explain how to use markdown with The Carpentries Workbench -- Demonstrate how to include pieces of code, figures, and nested challenge blocks +- Learn a bit about C++ and how to compile C++ code. +- Learn how to use ROOT to write and read from files, using the C++ libraries. +- Learn how to use ROOT to investigate data and create simple histograms. +- Explore the ROOT python libraries. :::::::::::::::::::::::::::::::::::::::::::::::: -## Introduction +## Let's talk about ROOT! -This is a lesson created via The Carpentries Workbench. It is written in -[Pandoc-flavored Markdown](https://pandoc.org/MANUAL.txt) for static files and -[R Markdown][r-markdown] for dynamic files that can render code into output. -Please refer to the [Introduction to The Carpentries -Workbench](https://carpentries.github.io/sandpaper-docs/) for full documentation. +From the [ROOT website](https://root.cern/): -What you need to know is that there are three sections required for a valid -Carpentries lesson: +::::::::::::::::::::::: testimonial +*ROOT is a framework for data processing, born at CERN, at the heart of the research on high-energy physics. Every day, thousands of physicists use ROOT applications to analyze their data or to perform simulations.* - 1. `questions` are displayed at the beginning of the episode to prime the - learner for the content. - 2. `objectives` are the learning objectives for an episode displayed with - the questions. - 3. `keypoints` are displayed at the end of the episode to reinforce the - objectives. +In short, ROOT is an overarching toolkit that defines a file structure, methods to analyze particle physics +data, visualization tools, and is the backbone of many widely used statistical analysis tool kits, +such as RooFit and RooStats. You don't *need* to use ROOT for your own analysis, but you will have to have +some familiarity with it when you are first accessing the open data. +::::::::::::::::::::::: -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor +OK, that sounds cool. So what's the deal with C++? -Inline instructor notes can help inform instructors of timing challenges -associated with the lessons. They appear in the "Instructor View" +## ROOT and C++ -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: +In the mid-80's, [C++](https://en.wikipedia.org/wiki/C%2B%2B) extended the very popular [C programming language](https://en.wikipedia.org/wiki/C_(programming_language)), +primarily with the introduction of [object-oriented programming](https://en.wikipedia.org/wiki/Object-oriented_programming) (OOP). +This programming paradigm was adopted by many particle physics experiments in the 1990's and when ROOT was written, +it was written in C++. While there are now python hooks to call the ROOT functions, the underlying code is all in C++. -::::::::::::::::::::::::::::::::::::: challenge +Because C++ is a compiled code, it is usually much faster than a scripted language like python, though that is +changing with modern python tools. Still, since much of the original analysis and access code was written in C++ +and calling the C++ ROOT libraries, it's good to know some of the basics of ROOT, in a C++ context. -## Challenge 1: Can you do it? +Most CMS analysts interface with ROOT using python scripts and you may find yourself using a similar workflow. +Later on in this lesson, we'll walk you through some very basic python scripts and point you toward more in-depth +tutorials, for those who want to go further. -What is the output of this command? +::::::::::::::::::::::::::::: callout +## You still have choices! -```r -paste("This", "new", "lesson", "looks", "good") -``` - -:::::::::::::::::::::::: solution - -## Output - -```output -[1] "This new lesson looks good" -``` - -::::::::::::::::::::::::::::::::: - - -## Challenge 2: how do you nest solutions within challenge blocks? - -:::::::::::::::::::::::: solution - -You can add a line with at least three colons and a `solution` tag. - -::::::::::::::::::::::::::::::::: -:::::::::::::::::::::::::::::::::::::::::::::::: - -## Figures - -You can use standard markdown for static figures with the following syntax: - -`![optional caption that appears below the figure](figure url){alt='alt text for -accessibility purposes'}` - -![You belong in The Carpentries!](https://raw.githubusercontent.com/carpentries/logo/master/Badge_Carpentries.svg){alt='Blue Carpentries hex person logo with no text.'} - -::::::::::::::::::::::::::::::::::::: callout - -Callout sections can highlight information. - -They are sometimes used to emphasise particularly important points -but are also used in some lessons to present "asides": -content that is not central to the narrative of the lesson, -e.g. by providing the answer to a commonly-asked question. - -:::::::::::::::::::::::::::::::::::::::::::::::: - - -## Math - -One of our episodes contains $\LaTeX$ equations when describing how to create -dynamic reports with {knitr}, so we now use mathjax to describe this: - -`$\alpha = \dfrac{1}{(1 - \beta)^2}$` becomes: $\alpha = \dfrac{1}{(1 - \beta)^2}$ - -Cool, right? +Just to emphasize, you really only *need* to use ROOT and C++ at the early stages of analyzing CMS Open Data in the AOD (Run 1) or MiniAOD (Run 2) formats. These datasets require using CMS-provided tools that perform much better in C++ than python. However, downstream in your analysis or to analyze Run 2 NanoAOD files, you are welcome to use whatever tools and file formats you choose. +::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::::::::::: keypoints -- Use `.md` files for episodes when you want static content -- Use `.Rmd` files for episodes when you need to generate output -- Run `sandpaper::check_lesson()` to identify any issues with your lesson -- Run `sandpaper::build_lesson()` to preview your lesson locally +- C++ has a reputation for being intimidating, but there are only a few things you need to learn to edit the open data code for your own uses. +- You can use the ROOT toolkit using both C++ and python. +- Some ROOT code is written in C++ to access the datafiles. +- People will often use simpler C++ scripts or python scripts to analyze reduced datasets. :::::::::::::::::::::::::::::::::::::::::::::::: -[r-markdown]: https://rmarkdown.rstudio.com/ diff --git a/learners/setup.md b/learners/setup.md index 4244a31..7f282ac 100644 --- a/learners/setup.md +++ b/learners/setup.md @@ -2,53 +2,5 @@ title: Setup --- -FIXME: Setup instructions live in this document. Please specify the tools and -the data sets the Learner needs to have installed. - -## Data Sets - - -Download the [data zip file](https://example.com/FIXME) and unzip it to your Desktop - -## Software Setup - -::::::::::::::::::::::::::::::::::::::: discussion - -### Details - -Setup for different systems can be presented in dropdown menus via a `spoiler` -tag. They will join to this discussion block, so you can give a general overview -of the software used in this lesson here and fill out the individual operating -systems (and potentially add more, e.g. online setup) in the solutions blocks. - -::::::::::::::::::::::::::::::::::::::::::::::::::: - -:::::::::::::::: spoiler - -### Windows - -Use PuTTY - -:::::::::::::::::::::::: - -:::::::::::::::: spoiler - -### MacOS - -Use Terminal.app - -:::::::::::::::::::::::: - - -:::::::::::::::: spoiler - -### Linux - -Use Terminal - -:::::::::::::::::::::::: +This lesson requires a computer with an internet connection and a bash shell (either native Linux, MacOs or Windows WSL2 Linux). You should have Docker installed and the [Docker pre-exercises](https://cms-opendata-workshop.github.io/workshop2023-lesson-docker/) finished so that you can access the containers `my_root` and `my_python` created as instructed there.