This is an old revision of the document!
SLURM 19.04
Conceptos básicos
Jobs
Particiones
Task
Comandos básicos
Consultar cola
Enviar un programa
Cancelar un trabajo
Trabajando con SLURM
Simple usage for soroban
0. Note.
'-p intel' (equivalent long option: '–partition=intel') is required for soroban.
1. Save below as text file (e.g. my_first_slurm.sh).
#!/bin/bash #SBATCH --job-name=example # Nombre para el trabajo a ejecutar en el cluster #SBATCH --partition=intel # Nombre de la cola #SBATCH --output=example_%j.out # Salida de la simulacion #SBATCH --error=example_%j.err # Registro de errores durante la simulacion ls -lh pwd
2. Submit it as SLURM job.
sbatch (e.g. sbatch my_first_slurm.sh)
3. Check progress.
squeue
Ejecutando un programa con openMPI , usando un script base para SLURM:
#!/bin/bash #SBATCH --job-name=example # Nombre para el trabajo a ejecutar en el cluster #SBATCH --partition=intel # Nombre la COLA/PARTICION donde enviaras las simulaciones #SBATCH -n 48 # Debe de ser un número múltiplo de 24, idealmente para usar todos los cores de una CPU. #SBATCH --ntasks-per-node=24 # Numero de tareas por nodo #SBATCH --output=example_%j.out # Salidas de la simulación #SBATCH --error=example_%j.err # Errores durante la simulacion #SBATCH --mail-user=username@ufrontera.cl # correo para notificar de inicio/termino o problemas de la simulacion #SBATCH --mail-type=ALL # srun ./mpi_programa #comando y programa mpi a ejecuta, reemplazar por el programa correspondiente
4. Ejemplo básico 3
Este es un ejemplo de un script (ejemplo3.sh) con los elementos minimos para ejecutar el programa R-3.6.1 a través de slurm:
#!/bin/bash #SBATCH -J R-NOMBRE-SIMULACION #SBATCH -a 1-11%3 #SBATCH --nodes=1 #SBATCH --tasks-per-node=1 #SBATCH --mem=100G #SBATCH --partition=intel module load R/3.6.1 cmds=( 'sleep 10;echo 10' 'sleep 20;echo 20' 'sleep 30;echo 30' 'sleep 40;echo 40' 'sleep 50;echo 50' ) eval ${cmds[$SLURM_ARRAY_TASK_ID - 1]}
Para enviar este script a slurm, crear un job, y comenzar el procesamiento se requiere lo siguiente:
chmod +x ejemplo3.sh
sbatch ejemplo3.sh
List of available clusters and partitions
List of available clusters
List of available partitions
FAQ (Frequently Asked Questions)
Q. What is the difference of cluster and particion?
Q. I always use only one cluster. Is there any way to omit –clusters=[cluster_name] when I check/delete jobs by scontrol/scancel?
Further information
Use 'man' command after login to servers
man sbatch
SLURM commands
- sacct
- salloc
- sbatch
- scancel
- scontrol
- sinfo
- squeue
- sreport
Useful reference pages
About job array
https://slurm.schedmd.com/job_array.html
https://rcc.uchicago.edu/docs/running-jobs/array/index.html
https://www.accre.vanderbilt.edu/wp-content/uploads/2016/04/UsingArrayJobs.pdf
How to rewrite PBSPro/SGE script to SLURM script
Common command
PBS command | SGE command | SLURM command | |
Job submission | qsub [scriptfile] | qsub [scriptfile] | sbatch [scriptfile] |
Job deletion | qdel [job_id] | qdel [job_id] | scancel –clusters=[cluster_name] [job_id] |
Job status (for user) | qstat -u [username] | qstat -u [username] | squeue -u [username] |
Extended job status | qstat -f [job_id] | qstat -f -j [job_id] | scontrol –clusters=[cluster_name] show jobid=[job_id] |
Hold a job temporarily | qhold [job_id] | qhold [job_id] | scontrol hold [job_id] |
Release job hold | qrls [job_id] | qrls [job_id] | scontrol release [job_id] |
List of usable queues | qstat -Q | qconf -sql | sinfo, squeue |
Resource speciation
PBS command | SGE command | SLURM command | |
Queue | #PBS -q [queue] | #$ -q [queue] | #SBATCH -M=[queue] / #SBATCH –clusters=[queue] |
Processors (Single host) | #PBS -l select=1:ncpus=[#] | #$ -pe smp [#] | #SBATCH -c=[#]? |
Wall clock limit | #PBS -l walltime=[hh:mm:ss] | #$ -l time=[hh:mm:ss] | #SBATCH -t=[#] (“minutes”, “minutes:seconds”, “hours:minutes:seconds”, “days-hours”, “days-hours:minutes” and “days-hours:minutes:seconds”) |
Memory requirement | #PBS -l mem=XXXXmb | #$ -mem [#]G | #SBATCH –mem=[#][unit;K/M/G/T]? |
Standard output file | #PBS -o [file] | #$ -o [path] | #SBATCH -o [path] |
Standard error | #PBS -e [file] | #$ -e [path] | #SBATCH -e [path] |
Array job | #PBS -J [#-#] | #$ -t [#-#] | #SBATCH -a=[#-#] |
Array number Variable name | ${PBS_ARRAY_INDEX} | ${SGE_TASK_ID} | ${SLURM_ARRAY_TASK_ID} |
Max simulaneously running tasks for an array job | n/a? | #$ -tc [#] | #SBATCH -a=[#-#]%[#] (e.g. -a=0-15%4) |
Copy environment | #PBS -V | #$ -V | #SBATCH –get-user-env |
Notification event | #PBS -m abe | #$ -m abe | ? |
Email address | #PBS -M [email] | #$ -M [email] | #SBATCH -M [email] |
Job name | #PBS -N [name] | #$ -N [name] | #SBATCH -J [name] |
Job restart | #PBS -r [y/n] | #$ -r [yes/no] | #SBATCH –requeue / #SBATCH –no-requeue? |
Move current directory | n/a | #$ -cwd | ? |
Move working directory | n/a (in the main part of script, add cd ${PBS_O_WORKDIR}) | #$ -wd | #SBATCH -D [working_dirpath] |
Use BASH | #PBS -S /bin/bash | ? | shebang line (At the first line of the script, add #!/usr/bin/bash |