Submit jobs
sbatch <run_script>.sh
Examples of job scripts
- Serial job
#!/bin/bash
#SBATCH --account=def-afyshe-ab
# time (DD-HH:MM:SS)
#SBATCH --time=00-00:01:00
echo 'Hello, world!'
sleep 5
- Array job
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --account=def-afyshe-ab
#SBATCH --time=0-01:00:00
#SBATCH --mem-per-cpu=500M
# job name (%x), job ID (%j)
#SBATCH --output=output/%x-%j.txt
# run a 10 job array, with a maximum of 5 running at a time
#SBATCH --array=1-10%5
# job array with indexes [1,2,3,5,7]
#SBATCH --array=1,2,3,5,7
./myapplication $SLURM_ARRAY_TASK_ID
Interactive job
salloc --time=1:0:0 --ntasks=2 --account=def-someuser
salloc --time=0-01:00:00 --cpus-per-task=2 --account=def-afyshe-ab --mem-per-cpu=512M
salloc: Granted job allocation 1234567
... # do some work
exit # terminate the allocation
salloc: Relinquishing job allocation 1234567
Monitoring jobs
- Show jobs for a specific user
squeue -u <user> -r
- Show only running jobs, or only pending jobs
squeue -u <user> -t running
squeue -u <user> -t pending
PENDING, RUNNING, SUSPENDED, COMPLETING, COMPLETED, OUT_OF_MEMORY, FAILED
- Show (detailed) information for a specific job
scontrol show job <jobid>
scontrol show job -dd <jobid>
- Show status information for a running job
sstat -j <jobid>
# List info resource used by a job: Average cpu time, Max memory, Max virtual memory, Job ID
sstat -j <jobid> --format=AveCPU,MaxRSS,MaxVMSize,JobID
- Attach a running job
srun --jobid=<jobid> --pty bash -i
- Email notification
#SBATCH --mail-user=<email_address>
#SBATCH --mail-type=ALL
#SBATCH --mail-type=TIME_LIMIT
#SBATCH --mail-type=TIME_LIMIT_80
Completed jobs
- Show a short summary of a completed job
seff <jobid>
- Show a detailed summary of a completed job or all jobs of a user
sacct -j <jobid>
sacct -j <jobid> --format=JobID,JobName,AveCPU,MaxRSS,MaxVMSize,Elapsed
sacct –u <user> --format=JobID,JobName,AveCPU,MaxRSS,MaxVMSize,Elapsed
Controlling jobs
# Cancel a specific job
scancel <jobid>
# Cancel all jobs for a specific user
scancel -u $USER
# Cancel all pending jobs for a specific user
scancel -t PENDING -u $USER
# Cancel all running jobs for a specific user
scancel -t RUNNING -u $USER
# Cancel one or more jobs by name
scancel --name <jobName>
# Hold a job, prevent it form starting
scontrol hold <jobid>
# Release a job hold, allowing the job to try to start
scontrol release <jobid>
# Release a previously held job to begin execution
scontrol resume <jobid>
# Requeue a running, suspended or finished job into pending state
scontrol requeue <jobid>
# List running jobs by user
squeue -u <user> -ho %A -t RUNNING
# Set a new Timelimit a running (need admin privilege)/pending job
scontrol update jobid=<jobid> TimeLimit=<TimeLimit>
# Set other parameters for a job
scontrol update jobid=<jobid> Account=<account> CPUsPerTask=<count> MinMemoryCPU=<MB> Gres=<list>
SLURM Environment Variables
Environment Variable | Description |
---|---|
SLURM_JOB_NAME | User specified job name |
SLURM_JOB_ID | Slurm job id |
SLURM_NNODES | Number of nodes allocated to the job |
SLURM_NTASKS | Number of tasks allocated to the job |
SLURM_ARRAY_TASK_ID | Array index for the job |
SLURM_ARRAY_TASK_MAX | Total number of array indexes for the job |
SLURM_MEM_PER_CPU | Memory allocated per CPU |
SLURM_JOB_NODELIST | List of nodes on which resources are allocated to job |
SLURM_JOB_CPUS_PER_NODE | Number of CPUs allocated per node |
SLURM_JOB_PARTITION | List of Partition(s) that the job is in |
SLURM_JOB_ACCOUNT | Account under which this job is run |
Account information
# List user and their default account (accounting group)
sacctmgr show user <user> withassoc
# Show usage info for user
sshare -l -U <user>
# Show usage info for all users under a specific account
sshare -l -A <account>_cpu --all
Cluster information
# Show idle node on cluster
sinfo --states=idle
# Show down, drained and draining nodes and their reason
sinfo -R
# Show detailed node info
sinfo --Node --long
# Show reservations on the cluster
scontrol show reservation
# Show configuration descriptions
man slurm.conf
# Check configuration values
scontrol show config | grep Max
# Show job info on cluster
partition-stats
Software modules
# Show currently loaded modules
module list
# Search for a module (if listed)
module avail <name>
# Will give a little bit more info
module spider <name>
# Load a module
module load <moduleName>
# Unload a module
module unload <moduleName>
# Show commands in the module
module show <moduleName>
Disk usage
quota
quota --per_user
diskusage_report --per_user --all_users