====== SLURM 19.04 ======
Conceptos básicos
Jobs
Particiones
Task
Comandos básicos
Consultar cola
Enviar un programa
Cancelar un trabajo
===== Trabajando con SLURM =====
**Simple usage for soroban**
0. Note.
'-p intel' (equivalent long option: '–partition=intel') is required for soroban.
1. Save below as text file (e.g. my_first_slurm.sh).
#!/bin/bash
#SBATCH --job-name=example # Nombre para el trabajo a ejecutar en el cluster
#SBATCH --partition=intel
#SBATCH --output=example_%j.out
#SBATCH --error=example_%j.err
ls -lh
pwd
2. Submit it as SLURM job.
sbatch (e.g. sbatch my_first_slurm.sh)
3. Check progress.
squeue
**Ejecutando un programa con openMPI , usando un script base para SLURM:**
#!/bin/bash
#SBATCH --job-name=example # Nombre para el trabajo a ejecutar en el cluster
#SBATCH --partition=intel
#SBATCH -n 32 # Debe de ser un número múltiplo de 16, número de procesos
#SBATCH --ntasks-per-node=16 # máximo por nodo
#SBATCH --output=example_%j.out
#SBATCH --error=example_%j.err
#SBATCH --mail-user=username@ufrontera.cl #correo para notificacion
#SBATCH --mail-type=ALL
srun ./mpi_programa
===== 4. Ejemplo básico 3 =====
Este es un ejemplo de un script (ejemplo3.sh) con los elementos minimos para ejecutar el programa R-3.6.1 a través de slurm:
#!/bin/bash
#SBATCH -J R-NOMBRE-SIMULACION
#SBATCH -a 1-11%3
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --mem=100G
#SBATCH --partition=intel
module load R/3.6.1
cmds=(
'sleep 10;echo 10'
'sleep 20;echo 20'
'sleep 30;echo 30'
'sleep 40;echo 40'
'sleep 50;echo 50'
)
eval ${cmds[$SLURM_ARRAY_TASK_ID - 1]}
Para enviar este script a slurm, crear un job, y comenzar el procesamiento se requiere lo siguiente:
chmod +x ejemplo3.sh
sbatch ejemplo3.sh
===== List of available clusters and partitions =====
==== List of available clusters ====
==== List of available partitions ====
===== FAQ (Frequently Asked Questions) =====
==== Q. What is the difference of cluster and particion? ====
==== Q. I always use only one cluster. Is there any way to omit –clusters=[cluster_name] when I check/delete jobs by scontrol/scancel? ====
===== Further information =====
==== Use 'man' command after login to servers ====
man sbatch
SLURM commands
* sacct
* salloc
* sbatch
* scancel
* scontrol
* sinfo
* squeue
* sreport
==== Useful reference pages ====
[[https://doku.lrz.de/display/PUBLIC/Running+parallel+jobs+on+the+Linux-Cluster#RunningparalleljobsontheLinux-Cluster-Step1:Editajobscript|https://doku.lrz.de/display/PUBLIC/Running+parallel+jobs+on+the+Linux-Cluster#RunningparalleljobsontheLinux-Cluster-Step1:Editajobscript]]
=== About job array ===
[[https://slurm.schedmd.com/job_array.html|https://slurm.schedmd.com/job_array.html]]
[[https://rcc.uchicago.edu/docs/running-jobs/array/index.html|https://rcc.uchicago.edu/docs/running-jobs/array/index.html]]
[[https://www.accre.vanderbilt.edu/wp-content/uploads/2016/04/UsingArrayJobs.pdf|https://www.accre.vanderbilt.edu/wp-content/uploads/2016/04/UsingArrayJobs.pdf]]
[[https://help.rc.ufl.edu/doc/SLURM_Job_Arrays|https://help.rc.ufl.edu/doc/SLURM_Job_Arrays]]
===== How to rewrite PBSPro/SGE script to SLURM script =====
==== Common command ====
| |**PBS command** |**SGE command** |**SLURM command** |
|Job submission|qsub [scriptfile]|qsub [scriptfile]|sbatch [scriptfile]|
|Job deletion|qdel [job_id]|qdel [job_id]|scancel –clusters=[cluster_name] [job_id]|
|Job status (for user)|qstat -u [username]|qstat -u [username]|squeue -u [username]|
|Extended job status|qstat -f [job_id]|qstat -f -j [job_id]|scontrol –clusters=[cluster_name] show jobid=[job_id]|
|Hold a job temporarily|qhold [job_id]|qhold [job_id]|scontrol hold [job_id]|
|Release job hold|qrls [job_id]|qrls [job_id]|scontrol release [job_id]|
|List of usable queues|qstat -Q|qconf -sql|sinfo, squeue|
==== Resource speciation ====
| |**PBS command** |**SGE command** |**SLURM command** |
|Queue|#PBS -q [queue]|#$ -q [queue]|#SBATCH -M=[queue] / #SBATCH –clusters=[queue]|
|Processors (Single host)|#PBS -l select=1:ncpus=[#]|#$ -pe smp [#]|#SBATCH -c=[#]?|
|Wall clock limit|#PBS -l walltime=[hh:mm:ss]|#$ -l time=[hh:mm:ss]|#SBATCH -t=[#] ("minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds")|
|Memory requirement|#PBS -l mem=XXXXmb|#$ -mem [#]G|#SBATCH –mem=[#][unit;K/M/G/T]?|
|Standard output file|#PBS -o [file]|#$ -o [path]|#SBATCH -o [path]|
|Standard error|#PBS -e [file]|#$ -e [path]|#SBATCH -e [path]|
|Array job|#PBS -J [#-#]|#$ -t [#-#]|#SBATCH -a=[#-#]|
|Array number Variable name|${PBS_ARRAY_INDEX}|${SGE_TASK_ID}|${SLURM_ARRAY_TASK_ID}|
|Max simulaneously running tasks for an array job|n/a?|#$ -tc [#]|#SBATCH -a=[#-#]%[#] (e.g. -a=0-15%4)|
|Copy environment|#PBS -V|#$ -V|#SBATCH –get-user-env|
|Notification event|#PBS -m abe|#$ -m abe|?|
|Email address|#PBS -M [email]|#$ -M [email]|#SBATCH -M [email]|
|Job name|#PBS -N [name]|#$ -N [name]|#SBATCH -J [name]|
|Job restart|#PBS -r [y/n]|#$ -r [yes/no]|#SBATCH –requeue / #SBATCH –no-requeue?|
|Move current directory|n/a|#$ -cwd|?|
|Move working directory|n/a (in the main part of script, add cd ${PBS_O_WORKDIR})|#$ -wd|#SBATCH -D [working_dirpath]|
|Use BASH|#PBS -S /bin/bash|?|shebang line (At the first line of the script, add #!/usr/bin/bash|
\\