Embedding at PDSF : A User's Guide

Document Authors : Matt Lamont and Peter Jacobs

Other Contact People : Iwona Sakrejda, Bum Choi, Eric Hjort, Yuri Fisyak and Patricia Fachini


Overview

In order to run the embedding, only a few operations need to be done. The rest of this document fills in the gaps as to what is happening and is worth reading, but is not needed to actually run the jobs.

Before you actually start doing anything, you should contact Peter Jacobs so that you can co-ordinate your embedding requirements with whatever else is planned. Once you have got the go-ahead, then this document will tell you how to run the embedding jobs, and has some trouble-shooting tips if things go wrong.

Production embedding is run from the starofl account by a limited number of operators. It is a production account and has enhanced LSF priority relative to other STAR users. Requests to run production embedding jobs can be made only by pwg convenors, and the current status of production embedding is shown on the Simulations Request page.

The first thing you have to do is to create a setup file. There are example setup files in the ~embedding/setup directory to copy, but this section describes what is needed. The point of the setup file is to define the specific embedding jobs you want to run (e.g. particle species, phase-space distribution ...), where a job represents one embedding run through a particular daq file. Once this is done, you need to edit a .csh script which submits to LSF a set of parallel batch jobs to run on different batch nodes. Again there are examples in the ~starofl/embedding directory, but for more information on this, click here. Once you have changed these, you should just execute the shell scripts and the jobs will be submitted to the LSF queue. They will be in pending mode until a node becomes available for them to run on. A notification will be sent by e-mail (address specified in .csh script) when each particular job has finished.


Getting Started

One important reference resource for STAR at PDSF that you should read before getting started, is here. This contains some very useful links that are good to read when starting out.

The production embedding jobs are currently run from the starofl account. This is a production account in the sense that it has significantly enhanced LSF priority relative to other STAR users and is currently allowed to consume up to 50% of the CPU resources available to STAR at any given time. This fraction is our current mechanism for balancing STAR production and user resources at PDSF and will be tuned as we gain more experience. Therefore, in order to run production embedding jobs you need to be able to log on as user starofl. To do this, you need to communicate with Iwona Sakrejda. You will need to create a public key (see here). Once this has been done, you can log onto the interactive PDSF nodes with the command :

ssh -l starofl pdsflx00n (n = 1 through 7)

You will then be asked to give your passphrase in order to log on. Once you have logged on as starofl, you need to move to the embedding directory, where all the scripts to run, are. To do this, just type :

cd embedding

Apart from all of the perl (.pl) scripts, there are a number of soft links set up to other directories. The important ones are daq_dir, tags_dir, data and LOG. You should make sure that if your daq files aren't in one of the specified daq directories, then you should create a uniquely named soft link to the directory your files are in. Do not just re-direct one of the existing links as this will cause problems for other people. The same applies for the tags directory. The tags files are needed so that the MC primary vertex is the same as the reconstructed data, more about this later. The data directory will contain your embedding output if you specify to write the files to disk, and your log-files will appear in the LOG directory. More about this in the next section. There is also a setup directory, which contains a setup file with your run control parameters.


The Embedding Chain

The embedding chain was set up by Peter Jacobs and is controlled by the perl script Embedding_v3.pl, which runs a series of perl modules in the directory EmbeddingLib_v3. The chain incorporates the following steps :

Note that the library version (e.g. P00hm) is taken from the path of the tags directory. Also, if no primary vertex is found, then a default value of (0.01, 0.01, 0.01) is used for the primary vertex.

This can be modified if necessary, as was done with strangeness embedding, where an acceptance filter was incorporated into the chain which involved filtering out and not embedding MC particles which obviously do not get into the TPC

The bfcMixer.C macro

The bfcMixer.C macro was written by Patricia Fachini and Yuri Fisyak and is the heart and soul of the embedding process. The macro in the embedding directory is the same as that in the CVS repository except that it doesn't write out all branches of the dst files.

The macro consists of four separate chains. There is a master chain which controls the other 3 chains, which in turn process the daq file, and then the MC with TRS. These are then mixed, and the mixed event is run through the reconstruction.

You do not need to alter the perl scripts or the bfcMixer.C macro to actually run the embedding, unless you want additional functionality, and can run the embedding as though they were a black box.


Starting the Embedding Jobs

The first thing that must be done is create a setup file and put this in the ~starofl/embedding/setup directory. An example of a setup file for deuterons is :

Here, the mult_fraction represents the number of MC particles you want to generate as a fraction of the multiplicity of an event. The pid is the GEANT pid of the particle species you are interested in, while the pt_low, pt_high, y_low and y_high represent the rapidity-pT phase space over which you wish to generate.

The "phasespace" generator, a built in function of GSTAR, is used to generate the MC uniformly in pT and rapidity (not eta), where the full azimuthal interval is used (0 < phi < two pi radians).

It is best to create a unique setup file for each embedding run you do, as this makes the retrieval of files from Disk and off HPSS a lot simpler.

The starofl account is set up so that the embedding jobs automatically use the local database (see ~starofl/dbServers.xml) and local star libraries (see ~starofl/.pdsf_setup). An AFS token is not needed. However, if you want to use AFS you must ensure that you have a valid AFS token. To do this, log onto pdsfsu05.nersc.gov and type the command :

klogx -cell rhic

This will ask you for an AFS password. To get the starofl AFS password you should contact Peter Jacobs. The rhic AFS token is supposed to last 100 hours but in practice is unreliable so using the local libraries and database is recommended.

Once you have created a setup file, you are ready to roll.

The Embedding_v3.pl script is used to run the embedding, and it takes a series of input arguments (which can be soft links pointing to directories). These are :

In order for the embedding to work, the required tags and daq files must be in the correct directory. You can use the following link to get an overview of what is currently on the data vaults. If the requisite data is not available, then either it has to be shipped off HPSS (minor job) or transferred from RCF (major job). The management of data on the data vaults and the shipping of data from RCF must be co-ordinated with Peter Jacobs and Eric Hjort. Disk space and bandwidth are scarce resources which need to be carefully managed.

You should usually use HPSS for the second option, as this has a much larger storage capacity than disk, however, you can use the DISK option when running small test jobs, as long as you remember to clean up after yourself and remove the files from disk.

Apart from running small test jobs, the Embedding_v3.pl script will not be run directly by the user. Instead, a wrapper script, submit.pl, will be used which submits jobs to the LSF batch nodes. The options for submit.pl are :

The syntax for the submit.pl script is therefore :

submit.pl [options] [setup file] [hpss mode] [tags dir] [daq dir] [log dir] [ prefix job name]

An example of this is :

submit.pl -q medium -m 1243006 -s Embedding_v3.pl -u my_e-mail_address setup/Lambdas.setup HPSS tags_dir_strange daq_dir LOG Lambdas_test.

The jobs will then be submitted to the medium queue on LSF, with 1 job per daq file in the run. The medium queue has a time limit for jobs of 24 hours, and they typically take 12 hours once they have started.

The most common mode for running the jobs will be via the execution of a shell script (.csh file). An example file in the ~starofl/embedding directory is :

So all you have to do is create a .csh file of your own, and execute it, and the embedding jobs will be submitted to the LSF queue. Note that there is no need in the geenral scheme of things to edit either Embedding_v3.pl or submit.pl/


Checking the Jobs

There are a number of commands which you can use in order to check what jobs are running :

Peter Jacobs has written a perl script (perl ~starofl/embedding/BatchJobsReport.pl) which succinctly reports the number of jobs running on all queues, both in total and also just by user starofl.

If you need to kill some of your jobs, then you should use the command :

If you kill a job that is running, then the output from the job will reside on the scratch disk of that particular node. You must clean this up. The command to do this is :

lsrun -m {host} {command}

An example usage of this command to clean up the scratch disk of a node would then be of the form :

lsrun -m pdsflx152 rm -fr /scratch/starofl

In order to minimize page swapping embedding jobs are submitted only to nodes whose CPU's have 600MB or more memory. At present (7/13/01) this limits the total number of embedding jobs that can run simultaneously to about 130 jobs.

When a job has finished running, you will receive an e-mail from LSF saying either that the job was successfully completed, or that it crashed with an error. Note that even if it is said to have successfully completed, this just means that it got to the end of the job, but that doesn't necessarily mean that job was successful. One way of checking this is by looking at the log-files, which reside in the LOG directory. Usually, if the job was unsuccessful, then the log-file will be a lot smaller than the rest of the log-files in that directory.


Retrieving Files from DISK and HPSS

DISK

If, via your options in submit.pl, you chose to write the output to disk, then the files can be accessed via the data directory. Once inside the directory, the data is then organised into the production library version (e.g. P00hm). Your data is then organised into subdirectories based upon the name of your setup file.

HPSS

If the HPSS option was chosen, then your data will be written to tape. You can get this data by ftp'ing to archive.nersc.gov, or by using HSI. HSI is the preferable way as it is very user friendly (see link). Another way of retrieving this is by using a script from Bum Choi entitled getembed.pl. This resides in the ~starofl/embedding/data region and retrieves the .dst, .runco and .geant dst branches.. The syntax for getembed.pl is :

getembed.pl -m (run number) (HPSS_dir) [Local_dir]

HPSS_dir is the directory your data is in on HPSS. this information can be obtained from the end of the log-file, but more generally can be assumed to be in ~/embedding/(STAR_Library_version)/(setup_filename). This is the main reason to have unique setup filenames for each embedding run. Local_dir is just the directory on disk you want to copy the files to. An example usage of this script might then be :

getembed.pl -m 1243006 embedding/P00hm/Lambdas_1 ~macl/embedding/Lambdas_1

The .dst and .geant branches of the files are quite large, and can be up to 1 GByte each in size for each file, which for central events, contain only ~150 events each. It is therefore necessary to first make sure you have enough disk space available before you retrieve large amounts of data, and secondly to clean up after yourself. Note that copying files from HPSS is quite resource intensive, and the bandwidth to download from HPSS ~ 5 MBytes/sec. Therefore, if you want to transfer > 50 GBytes, you must consult with either Peter Jacobs, Iwona Sakrejda or Eric Hjort first.

If you have any further questions regarding the embedding, then don't hesitate to contact one of the experts I have highlighted throughout the text. In order to keep abreast of information regarding pdsf, then you should check the PDSF Anouncements page, as well as the STAR PDSF hypernews page.