This is an old revision of the document!


SLURM 19.04

Conceptos básicos

Jobs

Particiones

Task

Comandos básicos

Consultar cola

Enviar un programa

Cancelar un trabajo

Simple usage for soroban

0. Note.

'-p intel' (equivalent long option: '–partition=intel') is required for soroban.

1. Save below as text file (e.g. my_first_slurm.sh).

#!/bin/bash
#SBATCH --job-name=example  # Nombre para el trabajo a ejecutar en el cluster
#SBATCH --partition=intel
#SBATCH --output=example_%j.out
#SBATCH --error=example_%j.err

ls -lh
pwd

2. Submit it as SLURM job.

sbatch (e.g. sbatch my_first_slurm.sh)

3. Check progress.

squeue

Ejecutando un programa con openMPI , usando un script base para SLURM:

#!/bin/bash
#SBATCH --job-name=example  # Nombre para el trabajo a ejecutar en el cluster
#SBATCH --partition=troquil
#SBATCH -n 32  # Debe de ser un número múltiplo de 16
#SBATCH --ntasks-per-node=16 # máximo por blade
#SBATCH --output=example_%j.out
#SBATCH --error=example_%j.err
#SBATCH --mail-user=username@ufrontera.cl
#SBATCH --mail-type=ALL

srun ./mpi_programa

Este es un ejemplo de un script (ejemplo3.sh) con los elementos minimos para ejecutar el programa R-3.6.1 a través de slurm:

#!/bin/bash

#SBATCH -J R-NOMBRE-SIMULACION
#SBATCH -a 1-11%3
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --mem=100G
#SBATCH --partition=intel

module load R/3.6.1

cmds=(
'sleep 10;echo 10'
'sleep 20;echo 20'
'sleep 30;echo 30'
'sleep 40;echo 40'
'sleep 50;echo 50'
)
eval ${cmds[$SLURM_ARRAY_TASK_ID - 1]}

Para enviar este script a slurm, crear un job, y comenzar el procesamiento se requiere lo siguiente:

chmod +x ejemplo3.sh
sbatch ejemplo3.sh

man sbatch

SLURM commands

  • sacct
  • salloc
  • sbatch
  • scancel
  • scontrol
  • sinfo
  • squeue
  • sreport

About job array

PBS command SGE command SLURM command
Job submissionqsub [scriptfile]qsub [scriptfile]sbatch [scriptfile]
Job deletionqdel [job_id]qdel [job_id]scancel –clusters=[cluster_name] [job_id]
Job status (for user)qstat -u [username]qstat -u [username]squeue -u [username]
Extended job statusqstat -f [job_id]qstat -f -j [job_id]scontrol –clusters=[cluster_name] show jobid=[job_id]
Hold a job temporarilyqhold [job_id]qhold [job_id]scontrol hold [job_id]
Release job holdqrls [job_id]qrls [job_id]scontrol release [job_id]
List of usable queuesqstat -Qqconf -sqlsinfo, squeue
PBS command SGE command SLURM command
Queue#PBS -q [queue]#$ -q [queue]#SBATCH -M=[queue] / #SBATCH –clusters=[queue]
Processors (Single host)#PBS -l select=1:ncpus=[#]#$ -pe smp [#]#SBATCH -c=[#]?
Wall clock limit#PBS -l walltime=[hh:mm:ss]#$ -l time=[hh:mm:ss]#SBATCH -t=[#] (“minutes”, “minutes:seconds”, “hours:minutes:seconds”, “days-hours”, “days-hours:minutes” and “days-hours:minutes:seconds”)
Memory requirement#PBS -l mem=XXXXmb#$ -mem [#]G#SBATCH –mem=[#][unit;K/M/G/T]?
Standard output file#PBS -o [file]#$ -o [path]#SBATCH -o [path]
Standard error#PBS -e [file]#$ -e [path]#SBATCH -e [path]
Array job#PBS -J [#-#]#$ -t [#-#]#SBATCH -a=[#-#]
Array number Variable name${PBS_ARRAY_INDEX}${SGE_TASK_ID}${SLURM_ARRAY_TASK_ID}
Max simulaneously running tasks for an array jobn/a?#$ -tc [#]#SBATCH -a=[#-#]%[#] (e.g. -a=0-15%4)
Copy environment#PBS -V#$ -V#SBATCH –get-user-env
Notification event#PBS -m abe#$ -m abe?
Email address#PBS -M [email]#$ -M [email]#SBATCH -M [email]
Job name#PBS -N [name]#$ -N [name]#SBATCH -J [name]
Job restart#PBS -r [y/n]#$ -r [yes/no]#SBATCH –requeue / #SBATCH –no-requeue?
Move current directoryn/a#$ -cwd?
Move working directoryn/a (in the main part of script, add cd ${PBS_O_WORKDIR})#$ -wd#SBATCH -D [working_dirpath]
Use BASH#PBS -S /bin/bash?shebang line (At the first line of the script, add #!/usr/bin/bash


  • slurm.1600358404.txt.bz2
  • Last modified: 2020/09/17 13:00
  • by admin