Commit 3e45dd0c authored by Jakub Yaghob's avatar Jakub Yaghob
Browse files

uprava popisu vzhledem k nastaveni preempce

parent 25fc82af
......@@ -19,8 +19,8 @@ Anyway, we have provided a short description of SLURM and of our clusters.
### Terminology
A **cluster** is a bunch of **nodes**.
Nodes are grouped together to partitions.
Partitions may overlap, ie. one node can be in more **partitions**.
Nodes are grouped together to **partitions**.
Partitions may overlap, ie. one node can be in more partitions.
**Feature** is a string describing a feature of a node, e.g. avx2 for a node capable of executing AVX2 instructions.
Each node has two sets of features: current features and available features.
Usually, they are same.
......@@ -42,7 +42,12 @@ Each step is executed by a bunch of **tasks** (usually 1 task), where resources
Jobs are inserted to a **scheduling queue**, where you can find them.
Partition has a priority.
A job submitted to a partition with higher **priority** can suspend an another job submitted to a partition with lower priority.
A job submitted to a partition with higher **priority** can preempt an another job submitted to a partition with lower priority.
Our clusters use only two kind of preemption:
- **SUSPEND** - the preempted job is suspended and releases CPUs. Unfortunately, it does not release memory or GPUs.
- **REQUEUE** - the preempted job is killed and requeued to the scheduling queue, waiting for available resources.
All resources including GPUs are released.
### Important commands
......@@ -134,15 +139,15 @@ Parlab nodes
Parlab partitions
| Name | Nodes | Priority | Timelimit | Intended use |
| -------- | ---------- | -------- | --------- | ------------ |
| big-lp | w[401-404] | low | 1 day | default, general or MPI debugging, long jobs |
| big-hp | w[401-404] | high | 1 hour | executing short jobs on 4-socket system, MPI jobs |
| small-lp | w[201-208] | low | 1 day | debugging on newer CPUs, MPI debugging, long jobs |
| small-hp | w[201-208] | high | 30 mins | executng short jobs on 2-socket system, MPI jobs |
| phi-lp | phi[01-02] | low | 1 day | KNL debugging, long jobs |
| phi-hp | phi[01-02] | high | 30 mins | executing short jobs on KNL |
| all | all | high | 30 mins | executing short jobs on all nodes, used primarily for testing heterogeneous MPI computing |
| Name | Nodes | Priority | Timelimit | Preemption | Intended use |
| -------- | ---------- | -------- | --------- | ---------- | ------------ |
| big-lp | w[401-404] | low | 1 day | SUSPEND | default, general or MPI debugging, long jobs |
| big-hp | w[401-404] | high | 1 hour | SUSPEND | executing short jobs on 4-socket system, MPI jobs |
| small-lp | w[201-208] | low | 1 day | SUSPEND | debugging on newer CPUs, MPI debugging, long jobs |
| small-hp | w[201-208] | high | 30 mins | SUSPEND | executng short jobs on 2-socket system, MPI jobs |
| phi-lp | phi[01-02] | low | 1 day | SUSPEND | KNL debugging, long jobs |
| phi-hp | phi[01-02] | high | 30 mins | SUSPEND | executing short jobs on KNL |
| all | all | high | 30 mins | SUSPEND | executing short jobs on all nodes, used primarily for testing heterogeneous MPI computing |
#### Gpulab cluster specification
......@@ -161,13 +166,13 @@ Gpulab nodes
Gpulab partitions
| Name | Nodes | Priority | Timelimit | Intended use |
| --------- | ---------------- | --------- | --------- | ------------ |
| debug-lp | dw[01-04],varjag | low | 7 days | default, general debugging, long jobs, build Docker image |
| debug-hp | dw[01-04],varjag | high | 1 hour | short jobs, build Docker image |
| volta-elp | volta03 | extra low | 7 days | extra long GPU jobs |
| volta-lp | volta[02-03] | low | 1 day | long GPU jobs |
| volta-hp | volta[01-03] | high | 1 hour | debugging GPU task, executing short GPU jobs |
| Name | Nodes | Priority | Timelimit | Preemption | Intended use |
| --------- | ---------------- | --------- | --------- | ---------- |------------ |
| debug-lp | dw[01-04],varjag | low | 7 days | SUSPEND | default, general debugging, long jobs, build Docker image |
| debug-hp | dw[01-04],varjag | high | 1 hour | SUSPEND | short jobs, build Docker image |
| volta-elp | volta[01-03] | extra low | 7 days | REQUEUE | extra long GPU jobs |
| volta-lp | volta[01-03] | low | 1 day | REQUEUE | long GPU jobs |
| volta-hp | volta[01-03] | high | 1 hour | SUSPEND | debugging GPU task, executing short GPU jobs |
### Useful examples
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment