Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
KSI
clusters
Commits
3e45dd0c
Commit
3e45dd0c
authored
Jan 06, 2020
by
Jakub Yaghob
Browse files
uprava popisu vzhledem k nastaveni preempce
parent
25fc82af
Changes
1
Show whitespace changes
Inline
Side-by-side
README.md
View file @
3e45dd0c
...
...
@@ -19,8 +19,8 @@ Anyway, we have provided a short description of SLURM and of our clusters.
### Terminology
A
**cluster**
is a bunch of
**nodes**
.
Nodes are grouped together to partitions.
Partitions may overlap, ie. one node can be in more
**
partitions
**
.
Nodes are grouped together to
**
partitions
**
.
Partitions may overlap, ie. one node can be in more partitions.
**Feature**
is a string describing a feature of a node, e.g. avx2 for a node capable of executing AVX2 instructions.
Each node has two sets of features: current features and available features.
Usually, they are same.
...
...
@@ -42,7 +42,12 @@ Each step is executed by a bunch of **tasks** (usually 1 task), where resources
Jobs are inserted to a
**scheduling queue**
, where you can find them.
Partition has a priority.
A job submitted to a partition with higher
**priority**
can suspend an another job submitted to a partition with lower priority.
A job submitted to a partition with higher
**priority**
can preempt an another job submitted to a partition with lower priority.
Our clusters use only two kind of preemption:
-
**SUSPEND**
- the preempted job is suspended and releases CPUs. Unfortunately, it does not release memory or GPUs.
-
**REQUEUE**
- the preempted job is killed and requeued to the scheduling queue, waiting for available resources.
All resources including GPUs are released.
### Important commands
...
...
@@ -134,15 +139,15 @@ Parlab nodes
Parlab partitions
| Name | Nodes | Priority | Timelimit | Intended use |
| -------- | ---------- | -------- | --------- | ------------ |
| big-lp | w[401-404] | low | 1 day | default, general or MPI debugging, long jobs |
| big-hp | w[401-404] | high | 1 hour | executing short jobs on 4-socket system, MPI jobs |
| small-lp | w[201-208] | low | 1 day | debugging on newer CPUs, MPI debugging, long jobs |
| small-hp | w[201-208] | high | 30 mins | executng short jobs on 2-socket system, MPI jobs |
| phi-lp | phi[01-02] | low | 1 day | KNL debugging, long jobs |
| phi-hp | phi[01-02] | high | 30 mins | executing short jobs on KNL |
| all | all | high | 30 mins | executing short jobs on all nodes, used primarily for testing heterogeneous MPI computing |
| Name | Nodes | Priority | Timelimit |
Preemption |
Intended use |
| -------- | ---------- | -------- | --------- |
---------- |
------------ |
| big-lp | w[401-404] | low | 1 day |
SUSPEND |
default, general or MPI debugging, long jobs |
| big-hp | w[401-404] | high | 1 hour |
SUSPEND |
executing short jobs on 4-socket system, MPI jobs |
| small-lp | w[201-208] | low | 1 day |
SUSPEND |
debugging on newer CPUs, MPI debugging, long jobs |
| small-hp | w[201-208] | high | 30 mins |
SUSPEND |
executng short jobs on 2-socket system, MPI jobs |
| phi-lp | phi[01-02] | low | 1 day |
SUSPEND |
KNL debugging, long jobs |
| phi-hp | phi[01-02] | high | 30 mins |
SUSPEND |
executing short jobs on KNL |
| all | all | high | 30 mins |
SUSPEND |
executing short jobs on all nodes, used primarily for testing heterogeneous MPI computing |
#### Gpulab cluster specification
...
...
@@ -161,13 +166,13 @@ Gpulab nodes
Gpulab partitions
| Name | Nodes | Priority | Timelimit | Intended use |
| --------- | ---------------- | --------- | --------- | ------------ |
| debug-lp | dw[01-04],varjag | low | 7 days | default, general debugging, long jobs, build Docker image |
| debug-hp | dw[01-04],varjag | high | 1 hour | short jobs, build Docker image |
| volta-elp | volta
03
| extra low | 7 days | extra long GPU jobs |
| volta-lp | volta[0
2
-03] | low | 1 day | long GPU jobs |
| volta-hp | volta[01-03] | high | 1 hour | debugging GPU task, executing short GPU jobs |
| Name | Nodes | Priority | Timelimit |
Preemption |
Intended use |
| --------- | ---------------- | --------- | --------- | ----------
|----------
-- |
| debug-lp | dw[01-04],varjag | low | 7 days |
SUSPEND |
default, general debugging, long jobs, build Docker image |
| debug-hp | dw[01-04],varjag | high | 1 hour |
SUSPEND |
short jobs, build Docker image |
| volta-elp | volta
[01-03]
| extra low | 7 days |
REQUEUE |
extra long GPU jobs |
| volta-lp | volta[0
1
-03] | low | 1 day |
REQUEUE |
long GPU jobs |
| volta-hp | volta[01-03] | high | 1 hour |
SUSPEND |
debugging GPU task, executing short GPU jobs |
### Useful examples
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment