# 提交與管理Job範例
1. 使用sbatch提交工作
:::success
```
# 建立一個sbatch Job Script (sample-job.sh)
# 此範例資源會分配兩個計算節點,每個節點執行112個task,每個task使用1個CPU Core
# 因沒指定 memory 相關參數,預設分配節點所有可用記憶體(總共482582MB * 2)
# 若指定 --mem=450G,則總記憶體為 450G * 2(nodes)
[user@ilgn01 ~]$ vim sample-job.sh
#!/bin/bash
#SBATCH --account=<PROJECT_ID> # (-A) iService Project ID
#SBATCH --job-name=sbatch # (-J) Job name
#SBATCH --partition=development # (-p) Slurm partition
#SBATCH --nodes=2 # (-N) Maximum number of nodes to be allocated
#SBATCH --cpus-per-task=1 # (-c) Number of cores per MPI task
#SBATCH --ntasks-per-node=112 # Maximum number of tasks on each node
#SBATCH --time=00:30:00 # (-t) Wall time limit (days-hrs:min:sec)
#SBATCH --output=job-%j.out # (-o) Path to the standard output file
#SBATCH --error=job-%j.err # (-e) Path to the standard error file
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=user@example.com # Where to send mail. Set this to your email address
module purge
module load intel/....
mpiexec ./hello
```
```
# 提交Job Script
[user@ilgn01 ~]$ sbatch sample-job.sh
```
```
# 查看job queue狀態
[user@ilgn01 ~]$ squeue -u $UID
```
```
# 查看Job 執行狀態
[user@ilgn01 ~]$ scontrol show job <job_id>
```
:::
2. 使用salloc提交工作
:::success
```
# 提交一個使用單一台CPU節點單一核心的互動式工作
# 資源分配成功後,可以看到Job ID為6938,分配的計算節點為 icpnp305。
[user@ilgn01 ~]$ salloc --partition=development --account=<PROJECT_ID> --ntasks=1 --tasks-per-node=1
salloc: Granted job allocation 6938
salloc: Waiting for resource configuration
salloc: Nodes icpnp305 are ready for job
# 此時會進入salloc的專用SHELL
# 在您離開這個SHELL之前,Job 6938會處於RUNNING狀態且持續計費
[user@ilgn01 salloc_6938 ~]$
# 要離開salloc SHELL,可以輸入 exit 指令
[user@ilgn01 salloc_6938 ~]$
```
```
# 查看Slurm Job環境資訊,可查看Job相關資訊
[user@ilgn01 salloc_6938 ~]$ env |grep -i slurm
```
```
# 在salloc SHELL中,您可以執行srun指令,每個srun等同於一個Job Step
[user@ilgn01 salloc_6938 ~]$ srun hostname
icpnp305
```
```
# 您可以直接以ssh進入此Job分配的計算節點執行程式或指令
[user@ilgn01 salloc_6938 ~]$ ssh icpnp305
[user@icpnp305 ~]$ hostname
icpnp305
[user@icpnp305 ~]$ exit
logout
Connection to icpnp305 closed.
[user@ilgn01 salloc_6938 ~]$
# 或者,您也可以直接用srun進入此Job分配的計算節點執行程式或指令
[user@ilgn01 salloc_6938 ~]$ srun --pty /bin/bash
[user@icpnp305 salloc_6948 ~]$ hostname
icpnp305
[user@icpnp305 salloc_93123 ~]$ exit
exit
[user@icpnp305 salloc_93123 ~]$
```
```
# 您可以使用sacct 指令查詢Job Step清單與執行結果
# 6938.0 和6938.1即為前面執行的step,數字0和1是step的編號
[user@ilgn01 salloc_6938 ~]$ sacct -j $SLURM_JOBID
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
6938 interacti+ developme+ govXXXXXX 1 RUNNING 0:0
6938.extern extern govXXXXXX 1 RUNNING 0:0
6938.0 hostname govXXXXXX 1 COMPLETED 0:0
6938.1 bash govXXXXXX 1 COMPLETED 0:0
```
```
# 欲離開互動式工作,輸入exit指令
[user@ilgn01 salloc_6938 ~]$ exit
salloc: Relinquishing job allocation 6938
salloc: Job allocation 6938 has been revoked.
[user@ilgn01 ~]$
```
:::