slurm

SLURM 19.04

Conceptos básicos

Jobs

Particiones

Task

Comandos básicos

Consultar cola

Enviar un programa

Cancelar un trabajo

Simple usage for soroban

0. Note.

'-p intel' (equivalent long option: '–partition=intel') is required for soroban.

1. Save below as text file (e.g. my_first_slurm.sh).

#!/bin/bash
#SBATCH --job-name=example  # Nombre para el trabajo a ejecutar en el cluster
#SBATCH --partition=intel
#SBATCH --output=example_%j.out
#SBATCH --error=example_%j.err

ls -lh
pwd

2. Submit it as SLURM job.

sbatch (e.g. sbatch my_first_slurm.sh)

3. Check progress.

squeue

Ejecutando un programa con openMPI , usando un script base para SLURM:

#!/bin/bash
#SBATCH --job-name=example  # Nombre para el trabajo a ejecutar en el cluster
#SBATCH --partition=intel
#SBATCH -n 32  # Debe de ser un número múltiplo de 16, número de procesos
#SBATCH --ntasks-per-node=16 # máximo por nodo
#SBATCH --output=example_%j.out
#SBATCH --error=example_%j.err
#SBATCH --mail-user=username@ufrontera.cl  #correo para notificacion
#SBATCH --mail-type=ALL

srun ./mpi_programa

Este es un ejemplo de un script (ejemplo3.sh) con los elementos minimos para ejecutar el programa R-3.6.1 a través de slurm:

#!/bin/bash

#SBATCH -J R-NOMBRE-SIMULACION
#SBATCH -a 1-11%3
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --mem=100G
#SBATCH --partition=intel

module load R/3.6.1

cmds=(
'sleep 10;echo 10'
'sleep 20;echo 20'
'sleep 30;echo 30'
'sleep 40;echo 40'
'sleep 50;echo 50'
)
eval ${cmds[$SLURM_ARRAY_TASK_ID - 1]}

Para enviar este script a slurm, crear un job, y comenzar el procesamiento se requiere lo siguiente:

chmod +x ejemplo3.sh
sbatch ejemplo3.sh

man sbatch

SLURM commands

  • sacct
  • salloc
  • sbatch
  • scancel
  • scontrol
  • sinfo
  • squeue
  • sreport

About job array

PBS command SGE command SLURM command
Job submissionqsub [scriptfile]qsub [scriptfile]sbatch [scriptfile]
Job deletionqdel [job_id]qdel [job_id]scancel –clusters=[cluster_name] [job_id]
Job status (for user)qstat -u [username]qstat -u [username]squeue -u [username]
Extended job statusqstat -f [job_id]qstat -f -j [job_id]scontrol –clusters=[cluster_name] show jobid=[job_id]
Hold a job temporarilyqhold [job_id]qhold [job_id]scontrol hold [job_id]
Release job holdqrls [job_id]qrls [job_id]scontrol release [job_id]
List of usable queuesqstat -Qqconf -sqlsinfo, squeue
PBS command SGE command SLURM command
Queue#PBS -q [queue]#$ -q [queue]#SBATCH -M=[queue] / #SBATCH –clusters=[queue]
Processors (Single host)#PBS -l select=1:ncpus=[#]#$ -pe smp [#]#SBATCH -c=[#]?
Wall clock limit#PBS -l walltime=[hh:mm:ss]#$ -l time=[hh:mm:ss]#SBATCH -t=[#] (“minutes”, “minutes:seconds”, “hours:minutes:seconds”, “days-hours”, “days-hours:minutes” and “days-hours:minutes:seconds”)
Memory requirement#PBS -l mem=XXXXmb#$ -mem [#]G#SBATCH –mem=[#][unit;K/M/G/T]?
Standard output file#PBS -o [file]#$ -o [path]#SBATCH -o [path]
Standard error#PBS -e [file]#$ -e [path]#SBATCH -e [path]
Array job#PBS -J [#-#]#$ -t [#-#]#SBATCH -a=[#-#]
Array number Variable name${PBS_ARRAY_INDEX}${SGE_TASK_ID}${SLURM_ARRAY_TASK_ID}
Max simulaneously running tasks for an array jobn/a?#$ -tc [#]#SBATCH -a=[#-#]%[#] (e.g. -a=0-15%4)
Copy environment#PBS -V#$ -V#SBATCH –get-user-env
Notification event#PBS -m abe#$ -m abe?
Email address#PBS -M [email]#$ -M [email]#SBATCH -M [email]
Job name#PBS -N [name]#$ -N [name]#SBATCH -J [name]
Job restart#PBS -r [y/n]#$ -r [yes/no]#SBATCH –requeue / #SBATCH –no-requeue?
Move current directoryn/a#$ -cwd?
Move working directoryn/a (in the main part of script, add cd ${PBS_O_WORKDIR})#$ -wd#SBATCH -D [working_dirpath]
Use BASH#PBS -S /bin/bash?shebang line (At the first line of the script, add #!/usr/bin/bash


  • slurm.txt
  • Last modified: 2024/09/16 17:58
  • (external edit)