Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
slurm [2020/05/19 01:33] mickey [Common command] |
slurm [2020/09/17 13:19] (current) admin [Trabajando con SLURM] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== SLURM ====== | + | ====== SLURM 19.04 ====== |
Conceptos básicos | Conceptos básicos | ||
Line 17: | Line 17: | ||
Cancelar un trabajo | Cancelar un trabajo | ||
- | ===== How to rewrite PBSPro/SGE script to SLURM script ===== | + | ===== Trabajando con SLURM ===== |
- | ==== Common command ==== | + | **Simple usage for soroban** |
- | | |**PBS command** |**SGE command** |**SLURM command** | | + | 0. Note. |
- | |Job submission|qsub [scriptfile]|qsub [scriptfile]|sbatch [scriptfile]| | + | |
- | |Job deletion|qdel [job_id]|qdel [job_id]|scancel –clusters=[cluster_name] [job_id]| | + | |
- | |Job status (for user)|qstat -u [username]|qstat -u [username]|squeue -u [username]| | + | |
- | |Extended job status|qstat -f [job_id]|qstat -f -j [job_id]|scontrol –clusters=[cluster_name] show jobid=[job_id]| | + | |
- | |Hold a job temporarily|qhold [job_id]|qhold [job_id]|scontrol hold [job_id]| | + | |
- | |Release job hold|qrls [job_id]|qrls [job_id]|scontrol release [job_id]| | + | |
- | |List of usable queues|qstat -Q|qconf -sql|sinfo, squeue| | + | |
- | \\ | + | '-p intel' (equivalent long option: '–partition=intel') is required for soroban. |
+ | 1. Save below as text file (e.g. my_first_slurm.sh). | ||
- | ==== Resource speciation ==== | + | <code> |
+ | #!/bin/bash | ||
+ | #SBATCH --job-name=example # Nombre para el trabajo a ejecutar en el cluster | ||
+ | #SBATCH --partition=intel | ||
+ | #SBATCH --output=example_%j.out | ||
+ | #SBATCH --error=example_%j.err | ||
- | | |**PBS command** |**SGE command** |**SLURM command** | | + | ls -lh |
- | |Queue|#PBS -q [queue]|#$ -q [queue]|#SBATCH -M=[queue] / #SBATCH –clusters=[queue]| | + | pwd |
- | |Processors (Single host)|#PBS -l select=1:ncpus=[#]|#$ -pe smp [#]|#SBATCH -c=[#]?| | + | </code> |
- | |Wall clock limit|#PBS -l walltime=[hh:mm:ss]|#$ -l time=[hh:mm:ss]|#SBATCH -t=[#] ("minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds")| | + | |
- | |Memory requirement|#PBS -l mem=XXXXmb|#$ -mem [#]G|#SBATCH –mem=[#][unit;K/M/G/T]?| | + | |
- | |Standard output file|#PBS -o [file]|#$ -o [path]|#SBATCH -o [path]| | + | |
- | |Standard error|#PBS -e [file]|#$ -e [path]|#SBATCH -e [path]| | + | |
- | |Array job|#PBS -J [#-#]|#$ -t [#-#]|#SBATCH -a=[#-#]| | + | |
- | |Array number Variable name|${PBS_ARRAY_INDEX}|${SGE_TASK_ID}|${SLURM_ARRAY_TASK_ID}| | + | |
- | |Max simulaneously running tasks for an array job|n/a?|#$ -tc [#]|#SBATCH -a=[#-#]%[#] (e.g. -a=0-15%4)| | + | |
- | |Copy environment|#PBS -V|#$ -V|#SBATCH –get-user-env| | + | |
- | |Notification event|#PBS -m abe|#$ -m abe|?| | + | |
- | |Email address|#PBS -M [email]|#$ -M [email]|#SBATCH -M [email]| | + | |
- | |Job name|#PBS -N [name]|#$ -N [name]|#SBATCH -J [name]| | + | |
- | |Job restart|#PBS -r [y/n]|#$ -r [yes/no]|#SBATCH –requeue / #SBATCH –no-requeue?| | + | |
- | |Move current directory|n/a|#$ -cwd|?| | + | |
- | |Move working directory|n/a (in the main part of script, add cd ${PBS_O_WORKDIR})|#$ -wd|#SBATCH -D [working_dirpath]| | + | |
- | |Use BASH|#PBS -S /bin/bash|?|shebang line (At the first line of the script, add #!/usr/bin/bash| | + | |
- | ===== Trabajando con SLURM ===== | + | 2. Submit it as SLURM job. |
+ | |||
+ | sbatch (e.g. sbatch my_first_slurm.sh) | ||
+ | |||
+ | 3. Check progress. | ||
+ | |||
+ | squeue | ||
**Ejecutando un programa con openMPI , usando un script base para SLURM:** | **Ejecutando un programa con openMPI , usando un script base para SLURM:** | ||
Line 60: | Line 50: | ||
#!/bin/bash | #!/bin/bash | ||
#SBATCH --job-name=example # Nombre para el trabajo a ejecutar en el cluster | #SBATCH --job-name=example # Nombre para el trabajo a ejecutar en el cluster | ||
- | #SBATCH --partition=troquil | + | #SBATCH --partition=intel |
- | #SBATCH -n 32 # Debe de ser un número múltiplo de 16 | + | #SBATCH -n 32 # Debe de ser un número múltiplo de 16, número de procesos |
- | #SBATCH --ntasks-per-node=16 # máximo por blade | + | #SBATCH --ntasks-per-node=16 # máximo por nodo |
#SBATCH --output=example_%j.out | #SBATCH --output=example_%j.out | ||
#SBATCH --error=example_%j.err | #SBATCH --error=example_%j.err | ||
- | #SBATCH --mail-user=username@ufrontera.cl | + | #SBATCH --mail-user=username@ufrontera.cl #correo para notificacion |
#SBATCH --mail-type=ALL | #SBATCH --mail-type=ALL | ||
srun ./mpi_programa | srun ./mpi_programa | ||
+ | </code> | ||
+ | |||
+ | |||
+ | ===== 4. Ejemplo básico 3 ===== | ||
+ | |||
+ | Este es un ejemplo de un script (ejemplo3.sh) con los elementos minimos para ejecutar el programa R-3.6.1 a través de slurm: | ||
+ | |||
+ | <code> | ||
+ | #!/bin/bash | ||
+ | |||
+ | #SBATCH -J R-NOMBRE-SIMULACION | ||
+ | #SBATCH -a 1-11%3 | ||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --tasks-per-node=1 | ||
+ | #SBATCH --mem=100G | ||
+ | #SBATCH --partition=intel | ||
+ | |||
+ | module load R/3.6.1 | ||
+ | |||
+ | cmds=( | ||
+ | 'sleep 10;echo 10' | ||
+ | 'sleep 20;echo 20' | ||
+ | 'sleep 30;echo 30' | ||
+ | 'sleep 40;echo 40' | ||
+ | 'sleep 50;echo 50' | ||
+ | ) | ||
+ | eval ${cmds[$SLURM_ARRAY_TASK_ID - 1]} | ||
+ | </code> | ||
+ | |||
+ | Para enviar este script a slurm, crear un job, y comenzar el procesamiento se requiere lo siguiente: | ||
+ | |||
+ | <code> | ||
+ | chmod +x ejemplo3.sh | ||
+ | </code> | ||
+ | |||
+ | <code> | ||
+ | sbatch ejemplo3.sh | ||
</code> | </code> | ||
Line 96: | Line 123: | ||
* scancel | * scancel | ||
* scontrol | * scontrol | ||
+ | * sinfo | ||
* squeue | * squeue | ||
* sreport | * sreport | ||
- | |||
==== Useful reference pages ==== | ==== Useful reference pages ==== | ||
Line 112: | Line 139: | ||
[[https://www.accre.vanderbilt.edu/wp-content/uploads/2016/04/UsingArrayJobs.pdf|https://www.accre.vanderbilt.edu/wp-content/uploads/2016/04/UsingArrayJobs.pdf]] | [[https://www.accre.vanderbilt.edu/wp-content/uploads/2016/04/UsingArrayJobs.pdf|https://www.accre.vanderbilt.edu/wp-content/uploads/2016/04/UsingArrayJobs.pdf]] | ||
- | https://help.rc.ufl.edu/doc/SLURM_Job_Arrays | + | [[https://help.rc.ufl.edu/doc/SLURM_Job_Arrays|https://help.rc.ufl.edu/doc/SLURM_Job_Arrays]] |
+ | |||
+ | ===== How to rewrite PBSPro/SGE script to SLURM script ===== | ||
+ | |||
+ | ==== Common command ==== | ||
+ | |||
+ | | |**PBS command** |**SGE command** |**SLURM command** | | ||
+ | |Job submission|qsub [scriptfile]|qsub [scriptfile]|sbatch [scriptfile]| | ||
+ | |Job deletion|qdel [job_id]|qdel [job_id]|scancel –clusters=[cluster_name] [job_id]| | ||
+ | |Job status (for user)|qstat -u [username]|qstat -u [username]|squeue -u [username]| | ||
+ | |Extended job status|qstat -f [job_id]|qstat -f -j [job_id]|scontrol –clusters=[cluster_name] show jobid=[job_id]| | ||
+ | |Hold a job temporarily|qhold [job_id]|qhold [job_id]|scontrol hold [job_id]| | ||
+ | |Release job hold|qrls [job_id]|qrls [job_id]|scontrol release [job_id]| | ||
+ | |List of usable queues|qstat -Q|qconf -sql|sinfo, squeue| | ||
+ | |||
+ | ==== Resource speciation ==== | ||
+ | |||
+ | | |**PBS command** |**SGE command** |**SLURM command** | | ||
+ | |Queue|#PBS -q [queue]|#$ -q [queue]|#SBATCH -M=[queue] / #SBATCH –clusters=[queue]| | ||
+ | |Processors (Single host)|#PBS -l select=1:ncpus=[#]|#$ -pe smp [#]|#SBATCH -c=[#]?| | ||
+ | |Wall clock limit|#PBS -l walltime=[hh:mm:ss]|#$ -l time=[hh:mm:ss]|#SBATCH -t=[#] ("minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds")| | ||
+ | |Memory requirement|#PBS -l mem=XXXXmb|#$ -mem [#]G|#SBATCH –mem=[#][unit;K/M/G/T]?| | ||
+ | |Standard output file|#PBS -o [file]|#$ -o [path]|#SBATCH -o [path]| | ||
+ | |Standard error|#PBS -e [file]|#$ -e [path]|#SBATCH -e [path]| | ||
+ | |Array job|#PBS -J [#-#]|#$ -t [#-#]|#SBATCH -a=[#-#]| | ||
+ | |Array number Variable name|${PBS_ARRAY_INDEX}|${SGE_TASK_ID}|${SLURM_ARRAY_TASK_ID}| | ||
+ | |Max simulaneously running tasks for an array job|n/a?|#$ -tc [#]|#SBATCH -a=[#-#]%[#] (e.g. -a=0-15%4)| | ||
+ | |Copy environment|#PBS -V|#$ -V|#SBATCH –get-user-env| | ||
+ | |Notification event|#PBS -m abe|#$ -m abe|?| | ||
+ | |Email address|#PBS -M [email]|#$ -M [email]|#SBATCH -M [email]| | ||
+ | |Job name|#PBS -N [name]|#$ -N [name]|#SBATCH -J [name]| | ||
+ | |Job restart|#PBS -r [y/n]|#$ -r [yes/no]|#SBATCH –requeue / #SBATCH –no-requeue?| | ||
+ | |Move current directory|n/a|#$ -cwd|?| | ||
+ | |Move working directory|n/a (in the main part of script, add cd ${PBS_O_WORKDIR})|#$ -wd|#SBATCH -D [working_dirpath]| | ||
+ | |Use BASH|#PBS -S /bin/bash|?|shebang line (At the first line of the script, add #!/usr/bin/bash| | ||
+ | |||
+ | \\ | ||