slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
slurm [2019/09/23 16:33]
mickey [Resource speciation] ${SLURM_ARRAY_TASK_ID}
slurm [2020/09/17 13:19] (current)
admin [Trabajando con SLURM]
Line 1: Line 1:
-====== SLURM ======+====== SLURM 19.04 ======
  
 Conceptos básicos Conceptos básicos
Line 17: Line 17:
 Cancelar un trabajo Cancelar un trabajo
  
-===== How to rewrite PBSPro/SGE script to SLURM script ​=====+===== Trabajando con SLURM =====
  
 +**Simple usage for soroban**
  
-==== Common command ====+0. Note.
  
-| |**PBS command** |**SGE command** |**SLURM command** | +'-p intel' (equivalent long option: 'partition=intel'​) is required ​for soroban.
-|Job submission|qsub [scriptfile]|qsub [scriptfile]|sbatch [scriptfile]| +
-|Job deletion|qdel [job_id]|qdel [job_id]|scancel ​clusters=[cluster_name] [job_id]| +
-|Job status (for user)|qstat -u [username]|qstat -u [username]|squeue -u [username]| +
-|Extended job status|qstat -f [job_id]|qstat -f -j [job_id]|scontrol –clusters=[cluster_name] show jobid=[job_id]| +
-|Hold a job temporarily|qhold [job_id]|qhold [job_id]|scontrol hold [job_id]| +
-|Release job hold|qrls [job_id]|qrls [job_id]|scontrol release [job_id]| +
-|List of usable queues|qstat -Q|qconf -sql|scontrol show queue?|+
  
-\\+1. Save below as text file (e.g. my_first_slurm.sh).
  
 +<​code>​
 +#!/bin/bash
 +#SBATCH --job-name=example ​ # Nombre para el trabajo a ejecutar en el cluster
 +#SBATCH --partition=intel
 +#SBATCH --output=example_%j.out
 +#SBATCH --error=example_%j.err
  
-==== Resource speciation ====+ls -lh 
 +pwd 
 +</​code>​
  
-|   ​|**PBS command** ​  ​|**SGE command** ​  |**SLURM command** ​  | +2. Submit it as SLURM job.
-|Queue|#PBS -q [queue]|#$ -q [queue]|#​SBATCH -M=[queue] / #SBATCH –clusters=[queue]| +
-|Processors (Single host)|#PBS -l select=1:​ncpus=[#​]|#​$ -pe smp [#]|#SBATCH -c=[#]?| +
-|Wall clock limit|#PBS -l walltime=[hh:​mm:​ss]|#​$ -l time=[hh:​mm:​ss]|#​SBATCH -t=[#] ("​minutes",​ "​minutes:​seconds",​ "​hours:​minutes:​seconds",​ "​days-hours",​ "​days-hours:​minutes"​ and "​days-hours:​minutes:​seconds"​)| +
-|Memory requirement|#​PBS -l mem=XXXXmb|#​$ -mem [#​]G|#​SBATCH –mem=[#​][unit;​K/​M/​G/​T]?​| +
-|Standard output file|#PBS -o [file]|#$ -o [path]|#​SBATCH -o [path]| +
-|Standard error|#PBS -e [file]|#$ -e [path]|#​SBATCH -e [path]| +
-|Array ​job|#PBS -J [#-#]|#$ -t [#​-#​]|#​SBATCH -a=[#-#]| +
-|Array number Variable name|${PBS_ARRAY_INDEX}|${SGE_TASK_ID}|${SLURM_ARRAY_TASK_ID}| +
-|Max simulaneously running tasks for an array job|n/a?|#$ -tc [#]|#SBATCH -a=[#​-#​]%[#​] (e.g-a=0-15%4)| +
-|Copy environment|#​PBS -V|#$ -V|#SBATCH –get-user-env| +
-|Notification event|#PBS -m abe|#$ -m abe|?| +
-|Email address|#​PBS -M [email]|#$ -M [email]|#​SBATCH -M [email]| +
-|Job name|#PBS -N [name]|#$ -N [name]|#​SBATCH -J [name]| +
-|Job restart|#​PBS -r [y/n]|#$ -r [yes/​no]|#​SBATCH –requeue / #SBATCH –no-requeue?​| +
-|Move current directory|n/​a|#​$ -cwd|?| +
-|Move working directory|n/​a (in the main part of script, add cd ${PBS_O_WORKDIR})|#​$ -wd|#SBATCH -D [working_dirpath]| +
-|Use BASH|#PBS -S /​bin/​bash|?​|shebang line (At the first line of the script, add #​!/​usr/​bin/​bash|+
  
-\\+sbatch (e.g. sbatch my_first_slurm.sh)
  
 +3. Check progress.
  
-===== Trabajando con SLURM =====+squeue
  
 **Ejecutando un programa con openMPI , usando un script base para SLURM:** **Ejecutando un programa con openMPI , usando un script base para SLURM:**
Line 64: Line 50:
 #!/bin/bash #!/bin/bash
 #SBATCH --job-name=example ​ # Nombre para el trabajo a ejecutar en el cluster #SBATCH --job-name=example ​ # Nombre para el trabajo a ejecutar en el cluster
-#SBATCH --partition=troquil +#SBATCH --partition=intel 
-#SBATCH -n 32  # Debe de ser un número múltiplo de 16 +#SBATCH -n 32  # Debe de ser un número múltiplo de 16, número de procesos 
-#SBATCH --ntasks-per-node=16 # máximo por blade+#SBATCH --ntasks-per-node=16 # máximo por nodo
 #SBATCH --output=example_%j.out #SBATCH --output=example_%j.out
 #SBATCH --error=example_%j.err #SBATCH --error=example_%j.err
-#SBATCH --mail-user=username@ufrontera.cl+#SBATCH --mail-user=username@ufrontera.cl ​ #​correo para notificacion
 #SBATCH --mail-type=ALL #SBATCH --mail-type=ALL
  
 srun ./​mpi_programa srun ./​mpi_programa
 +</​code>​
 +
 +
 +===== 4. Ejemplo básico 3 =====
 +
 +Este es un ejemplo de un script (ejemplo3.sh) con los elementos minimos para ejecutar el programa R-3.6.1 a través de slurm:
 +
 +<​code>​
 +#!/bin/bash
 +
 +#SBATCH -J R-NOMBRE-SIMULACION
 +#SBATCH -a 1-11%3
 +#SBATCH --nodes=1
 +#SBATCH --tasks-per-node=1
 +#SBATCH --mem=100G
 +#SBATCH --partition=intel
 +
 +module load R/3.6.1
 +
 +cmds=(
 +'sleep 10;echo 10'
 +'sleep 20;echo 20'
 +'sleep 30;echo 30'
 +'sleep 40;echo 40'
 +'sleep 50;echo 50'
 +)
 +eval ${cmds[$SLURM_ARRAY_TASK_ID - 1]}
 +</​code>​
 +
 +Para enviar este script a slurm, crear un job, y comenzar el procesamiento se requiere lo siguiente:
 +
 +<​code>​
 +chmod +x ejemplo3.sh
 +</​code>​
 +
 +<​code>​
 +sbatch ejemplo3.sh
 </​code>​ </​code>​
  
Line 86: Line 109:
  
 ==== Q. I always use only one cluster. Is there any way to omit –clusters=[cluster_name] when I check/​delete jobs by scontrol/​scancel?​ ==== ==== Q. I always use only one cluster. Is there any way to omit –clusters=[cluster_name] when I check/​delete jobs by scontrol/​scancel?​ ====
 +
 +===== Further information =====
 +
 +==== Use '​man'​ command after login to servers ====
 +
 +man sbatch
 +
 +SLURM commands
 +
 +  * sacct
 +  * salloc
 +  * sbatch
 +  * scancel
 +  * scontrol
 +  * sinfo
 +  * squeue
 +  * sreport
 +
 +==== Useful reference pages ====
 +
 +[[https://​doku.lrz.de/​display/​PUBLIC/​Running+parallel+jobs+on+the+Linux-Cluster#​RunningparalleljobsontheLinux-Cluster-Step1:​Editajobscript|https://​doku.lrz.de/​display/​PUBLIC/​Running+parallel+jobs+on+the+Linux-Cluster#​RunningparalleljobsontheLinux-Cluster-Step1:​Editajobscript]]
 +
 +=== About job array ===
 +
 +[[https://​slurm.schedmd.com/​job_array.html|https://​slurm.schedmd.com/​job_array.html]]
 +
 +[[https://​rcc.uchicago.edu/​docs/​running-jobs/​array/​index.html|https://​rcc.uchicago.edu/​docs/​running-jobs/​array/​index.html]]
 +
 +[[https://​www.accre.vanderbilt.edu/​wp-content/​uploads/​2016/​04/​UsingArrayJobs.pdf|https://​www.accre.vanderbilt.edu/​wp-content/​uploads/​2016/​04/​UsingArrayJobs.pdf]]
 +
 +[[https://​help.rc.ufl.edu/​doc/​SLURM_Job_Arrays|https://​help.rc.ufl.edu/​doc/​SLURM_Job_Arrays]]
 +
 +===== How to rewrite PBSPro/SGE script to SLURM script =====
 +
 +==== Common command ====
 +
 +| |**PBS command** |**SGE command** |**SLURM command** |
 +|Job submission|qsub [scriptfile]|qsub [scriptfile]|sbatch [scriptfile]|
 +|Job deletion|qdel [job_id]|qdel [job_id]|scancel –clusters=[cluster_name] [job_id]|
 +|Job status (for user)|qstat -u [username]|qstat -u [username]|squeue -u [username]|
 +|Extended job status|qstat -f [job_id]|qstat -f -j [job_id]|scontrol –clusters=[cluster_name] show jobid=[job_id]|
 +|Hold a job temporarily|qhold [job_id]|qhold [job_id]|scontrol hold [job_id]|
 +|Release job hold|qrls [job_id]|qrls [job_id]|scontrol release [job_id]|
 +|List of usable queues|qstat -Q|qconf -sql|sinfo, squeue|
 +
 +==== Resource speciation ====
 +
 +|   ​|**PBS command** ​  ​|**SGE command** ​  ​|**SLURM command** ​  |
 +|Queue|#PBS -q [queue]|#$ -q [queue]|#​SBATCH -M=[queue] / #SBATCH –clusters=[queue]|
 +|Processors (Single host)|#PBS -l select=1:​ncpus=[#​]|#​$ -pe smp [#]|#SBATCH -c=[#]?|
 +|Wall clock limit|#PBS -l walltime=[hh:​mm:​ss]|#​$ -l time=[hh:​mm:​ss]|#​SBATCH -t=[#] ("​minutes",​ "​minutes:​seconds",​ "​hours:​minutes:​seconds",​ "​days-hours",​ "​days-hours:​minutes"​ and "​days-hours:​minutes:​seconds"​)|
 +|Memory requirement|#​PBS -l mem=XXXXmb|#​$ -mem [#​]G|#​SBATCH –mem=[#​][unit;​K/​M/​G/​T]?​|
 +|Standard output file|#PBS -o [file]|#$ -o [path]|#​SBATCH -o [path]|
 +|Standard error|#PBS -e [file]|#$ -e [path]|#​SBATCH -e [path]|
 +|Array job|#PBS -J [#-#]|#$ -t [#​-#​]|#​SBATCH -a=[#-#]|
 +|Array number Variable name|${PBS_ARRAY_INDEX}|${SGE_TASK_ID}|${SLURM_ARRAY_TASK_ID}|
 +|Max simulaneously running tasks for an array job|n/a?|#$ -tc [#]|#SBATCH -a=[#​-#​]%[#​] (e.g. -a=0-15%4)|
 +|Copy environment|#​PBS -V|#$ -V|#SBATCH –get-user-env|
 +|Notification event|#PBS -m abe|#$ -m abe|?|
 +|Email address|#​PBS -M [email]|#$ -M [email]|#​SBATCH -M [email]|
 +|Job name|#PBS -N [name]|#$ -N [name]|#​SBATCH -J [name]|
 +|Job restart|#​PBS -r [y/n]|#$ -r [yes/​no]|#​SBATCH –requeue / #SBATCH –no-requeue?​|
 +|Move current directory|n/​a|#​$ -cwd|?|
 +|Move working directory|n/​a (in the main part of script, add cd ${PBS_O_WORKDIR})|#​$ -wd|#SBATCH -D [working_dirpath]|
 +|Use BASH|#PBS -S /​bin/​bash|?​|shebang line (At the first line of the script, add #​!/​usr/​bin/​bash|
  
 \\ \\
  
  
  • slurm.1569267197.txt.bz2
  • Last modified: 2019/09/23 16:33
  • by mickey