Commit 3e45dd0c authored by Jakub Yaghob's avatar Jakub Yaghob
Browse files

uprava popisu vzhledem k nastaveni preempce

parent 25fc82af
...@@ -19,8 +19,8 @@ Anyway, we have provided a short description of SLURM and of our clusters. ...@@ -19,8 +19,8 @@ Anyway, we have provided a short description of SLURM and of our clusters.
### Terminology ### Terminology
A **cluster** is a bunch of **nodes**. A **cluster** is a bunch of **nodes**.
Nodes are grouped together to partitions. Nodes are grouped together to **partitions**.
Partitions may overlap, ie. one node can be in more **partitions**. Partitions may overlap, ie. one node can be in more partitions.
**Feature** is a string describing a feature of a node, e.g. avx2 for a node capable of executing AVX2 instructions. **Feature** is a string describing a feature of a node, e.g. avx2 for a node capable of executing AVX2 instructions.
Each node has two sets of features: current features and available features. Each node has two sets of features: current features and available features.
Usually, they are same. Usually, they are same.
...@@ -42,7 +42,12 @@ Each step is executed by a bunch of **tasks** (usually 1 task), where resources ...@@ -42,7 +42,12 @@ Each step is executed by a bunch of **tasks** (usually 1 task), where resources
Jobs are inserted to a **scheduling queue**, where you can find them. Jobs are inserted to a **scheduling queue**, where you can find them.
Partition has a priority. Partition has a priority.
A job submitted to a partition with higher **priority** can suspend an another job submitted to a partition with lower priority. A job submitted to a partition with higher **priority** can preempt an another job submitted to a partition with lower priority.
Our clusters use only two kind of preemption:
- **SUSPEND** - the preempted job is suspended and releases CPUs. Unfortunately, it does not release memory or GPUs.
- **REQUEUE** - the preempted job is killed and requeued to the scheduling queue, waiting for available resources.
All resources including GPUs are released.
### Important commands ### Important commands
...@@ -134,15 +139,15 @@ Parlab nodes ...@@ -134,15 +139,15 @@ Parlab nodes
Parlab partitions Parlab partitions
| Name | Nodes | Priority | Timelimit | Intended use | | Name | Nodes | Priority | Timelimit | Preemption | Intended use |
| -------- | ---------- | -------- | --------- | ------------ | | -------- | ---------- | -------- | --------- | ---------- | ------------ |
| big-lp | w[401-404] | low | 1 day | default, general or MPI debugging, long jobs | | big-lp | w[401-404] | low | 1 day | SUSPEND | default, general or MPI debugging, long jobs |
| big-hp | w[401-404] | high | 1 hour | executing short jobs on 4-socket system, MPI jobs | | big-hp | w[401-404] | high | 1 hour | SUSPEND | executing short jobs on 4-socket system, MPI jobs |
| small-lp | w[201-208] | low | 1 day | debugging on newer CPUs, MPI debugging, long jobs | | small-lp | w[201-208] | low | 1 day | SUSPEND | debugging on newer CPUs, MPI debugging, long jobs |
| small-hp | w[201-208] | high | 30 mins | executng short jobs on 2-socket system, MPI jobs | | small-hp | w[201-208] | high | 30 mins | SUSPEND | executng short jobs on 2-socket system, MPI jobs |
| phi-lp | phi[01-02] | low | 1 day | KNL debugging, long jobs | | phi-lp | phi[01-02] | low | 1 day | SUSPEND | KNL debugging, long jobs |
| phi-hp | phi[01-02] | high | 30 mins | executing short jobs on KNL | | phi-hp | phi[01-02] | high | 30 mins | SUSPEND | executing short jobs on KNL |
| all | all | high | 30 mins | executing short jobs on all nodes, used primarily for testing heterogeneous MPI computing | | all | all | high | 30 mins | SUSPEND | executing short jobs on all nodes, used primarily for testing heterogeneous MPI computing |
#### Gpulab cluster specification #### Gpulab cluster specification
...@@ -161,13 +166,13 @@ Gpulab nodes ...@@ -161,13 +166,13 @@ Gpulab nodes
Gpulab partitions Gpulab partitions
| Name | Nodes | Priority | Timelimit | Intended use | | Name | Nodes | Priority | Timelimit | Preemption | Intended use |
| --------- | ---------------- | --------- | --------- | ------------ | | --------- | ---------------- | --------- | --------- | ---------- |------------ |
| debug-lp | dw[01-04],varjag | low | 7 days | default, general debugging, long jobs, build Docker image | | debug-lp | dw[01-04],varjag | low | 7 days | SUSPEND | default, general debugging, long jobs, build Docker image |
| debug-hp | dw[01-04],varjag | high | 1 hour | short jobs, build Docker image | | debug-hp | dw[01-04],varjag | high | 1 hour | SUSPEND | short jobs, build Docker image |
| volta-elp | volta03 | extra low | 7 days | extra long GPU jobs | | volta-elp | volta[01-03] | extra low | 7 days | REQUEUE | extra long GPU jobs |
| volta-lp | volta[02-03] | low | 1 day | long GPU jobs | | volta-lp | volta[01-03] | low | 1 day | REQUEUE | long GPU jobs |
| volta-hp | volta[01-03] | high | 1 hour | debugging GPU task, executing short GPU jobs | | volta-hp | volta[01-03] | high | 1 hour | SUSPEND | debugging GPU task, executing short GPU jobs |
### Useful examples ### Useful examples
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment