This page provides both general and specific tips for running on XSEDE infrastructure. General information is provided first, and then tips are listed by machine name (i.e. Lonestar, Kraken, Trestles, Stampede etc). If you are interested in running on a specific machine, please scroll down until you see the machine name.
If you do not see a particular machine name, BigJob may run on this machine but not be supported yet in the documentation. Please feel free to email bigjob-users@googlegroups.com to request machine information to be added.
In general, on XSEDE machines, production-grade science should be done in either the $SCRATCH or $WORK directories on the machine. This means you will run your BigJob script and make your BigJob agent directory in either $SCRATCH or $WORK and not in $HOME.
When creating BigJob scripts for XSEDE machines, it is necessary to add the project field to the pilot_compute_description.
"project": "TG-XXXXXXXXX"
TG-XXXXX must be replaced with your individual allocation SU number as provided to you by XSEDE.
Stampede uses the SLURM batch queuing system. When editing your scripts, the service_url should be set to slurm+ssh://login1.stampede.tacc.utexas.edu.
Installation of a virtual environment on Lonestar requires the use of a higher python version than the default. In order to load Python 2.7.x before installing the virtual environment, please execute:
module load python
Then you can proceed with the Installation instructions, and make sure that you activate your virtual environment in your .bashrc before you try to run BigJob.
You will need to put the following two lines in both your .bashrc and your .bash_profile in order to run on Ranger. This is due to the fact that interactive shells source a different file than regular shells.
module load python
source $HOME/bigjob/.python/bin/activate
Lonestar uses the Sun Grid Engine (SGE) batch queuing system. When editing your scripts, the service_url should be set to sge://localhost for running locally on Lonestar or sge+ssh://lonestar.tacc.utexas.edu for running remotely.
Before installing your virtual environment, you must do a module load python on Kraken to ensure you’re using Python 2.7.x instead of the system-level Python.
Prior to running code on Kraken, you will need to make a directory called agent in the same location that you are running your scripts from. The BigJob agent relies on aprun to execute subjobs. aprun works only if the working directory of the Pilot and Compute Units is set to the scratch space of Kraken.
Create your agent directory in /lustre/scratch/<username> by typing:
cd /lustre/scratch/<username>
mkdir agent
Replace <username> with your Kraken username.
To submit jobs to Kraken from another resource using gsissh, the use of myproxy is required. To start a my proxy server, execute the following command:
myproxy-logon -T -t <number of hours> -l <your username>
You need to use your XSEDE portal username and password. To verify that your my proxy server is running, type grid-proxy-info.
If it was successful, you should see a valid proxy running.
Kraken is a Cray machine with a special Torque queuing system. It requires the use of GSISSH (Globus certificates required). Initiate a grid proxy (using myproxy-logon) before executing the BigJob application. When editing your scripts, the service_url should be set to xt5torque+gsissh://gsissh.kraken.nics.xsede.org.