Commit 84619bd3 authored by Jakub Yaghob's avatar Jakub Yaghob
Browse files

sligthly changed Charliecloud workflow and description

updated clusters parameters
parent a0cfaa72
......@@ -158,7 +158,8 @@ Gpulab nodes
| Node names | CPU | Sockets | Cores | HT | RAM | GRES | Additional info |
| ------------ | ---------------------- | ------- | ----- | -- | ------ | ---- | --------------- |
| dw[01-02] | Intel Xeon E5450 | 2 | 4 | 1 | 32 GB | | Docker installed |
| dw01 | Intel Xeon E5450 | 2 | 4 | 1 | 32 GB | | Docker installed |
| dw02 | Intel Xeon E5-2630v3 | 2 | 8 | 2 | 256 GB | | Docker installed |
| dw03 | Intel Xeon E5640 | 2 | 4 | 2 | 96 GB | | Docker installed |
| dw[04-05] | Intel Xeon E5-2660v2 | 2 | 10 | 2 | 256 GB | | Docker installed |
| varjag | Intel Xeon E7-4830 | 4 | 8 | 2 | 256 GB | | |
......@@ -227,9 +228,11 @@ Charliecloud provides user-defined software stacks (UDSS) for HPC.
It allows you to run nearly any software stack (like TensorFlow) on the cluster even it is not system-wide installed and available.
All informations about Charlicloud can be found on its [Charliecloud documentation](https://hpc.github.io/charliecloud/) page.
### Simple workflow
### Basic workflow
1. #### Create or get Docker image
This workflow is valid for Charliecloud version 0.21.
1. #### Get or create Docker image
Docker is installed on dw[01-05] workers in gpulab cluster. You can access them using
......@@ -237,51 +240,43 @@ Docker is installed on dw[01-05] workers in gpulab cluster. You can access them
You can either pull already prepared Docker image (e.g. for TensorFlow) or you may create your own one.
You will make this step only once for the given UDSS.
Of course, you must begin the whole workflow, if there is a new version of the UDSS.
Of course, you must restart the whole workflow, if there is a new version of the UDSS.
If you are pulling a Docker image, use `sudo` command
`sudo docker pull dockertag`
If you are building your own Docker image, Charliecloud offers simplified version of Docker invocation
If you are building your own Docker image from Dockerfile, Charliecloud offers simplified version of Docker invocation
`ch-build -t dockertag .`
which must be run on dw[01-05] workers.
which must be run on dw[01-05] workers as well.
Latest versions of Charliecloud added a new command `ch-grow`, which builds an image in unprivileged mode from Dockerfile.
This command doesn't need Docker, it can be run anywhere.
Moreover, using `ch-grow` command allows you to skip the workflow step 2, because it immediately builds a directory image.
Just remember: every use of `docker` command must be prefixed by `sudo` command.
2. #### Create tar/directory or SquashFS image from Docker image
2. #### Create tar/directory image from Docker image
You must convert prepared Docker image to either a TAR file and then to a directory structure or
to a SquashFS file. You will make this step only once for the given UDSS.
All commands for this step must be again run on dw[01-04] nodes.
You must convert prepared Docker image to a TAR file and then to a directory structure.
You will make this step only once for the given UDSS.
All commands for this step must be again run on dw[01-05] nodes.
For the first case, run
In the first step, run
`ch-builder2tar dockerimage ${TMPDIR}`
which creates a TAR file in your temporary directory. Then you must convert the created TAR file to a directory structure using
which creates a TAR.GZ file in your temporary directory. Then you must convert the created TAR.GZ file to a directory structure using
`ch-tar2dir ${TMPDIR}/myudss.tar.gz imgdir`
which expands the TAR file to your output image directory (usually your home directory on the shared volume).
For the second case, run
`ch-builder2squash dockerimage outdir`
which creates a SquashFS file in your output directory (usually your home directory on the shared volume).
which expands the TAR.GZ file to your output image directory (usually your home directory on the shared volume).
3. #### Import CUDA libraries
This step is required only for UDSS with CUDA requirement (like TensorFlow).
If your UDSS does not require CUDA, skip this step.
You will make this step only once for the given UDSS.
It works only with tar/directory structure.
All commands for this step must be run on volta[01-03] nodes.
You will make this step only once for the given UDSS, but you should repeat this step, when new host drivers/CUDA are installed.
It works only with directory structure.
All commands for this step must be run on volta[01-05] nodes.
Execute on gpulab
......@@ -293,19 +288,57 @@ which copies some necessary CUDA files from the host to your image directory str
This step is executed many times as necessary on any node of parlab and gpulab clusters.
For the tar/directory structure, run
For an interactive job run
`srun <slurm params> ch-run <charlie options> imgdir/<dockertag> <my img command>`
You will probably use SLURM batch job mode more often, as the length of computation is usually several hours or days.
In this case use
`ch-run <charlie options> imgdir/<dockertag> <my img command>`
in your shell script passed to the `sbatch` SLURM command.
Beware, that by default, your home is binded to the image, which results in overload of entire `/home` directory.
You may disable this by specifying `--no-home` option.
Moreover, you may bind additional directories by using options `--bind=/some/dir`
(which will appear as `/mnt/0` in your UDSS environment) or by `--bind=/source/dir:/dest/dir`.
### Advanced techniques and notes
#### Builders
Charliecloud brings a term **builder**. A builder is anything capable of produce a Linux filesystem tree
from either a prepared set of container images or from some container description.
Currently Charliecloud supports several builders. We have enabled only two of them: Docker and ch-image.
`ch-image` command/builder builds an image in unprivileged mode from `Dockerfile`.
This command doesn't need Docker, it can be run anywhere.
Moreover, using `ch-image` command allows you to skip the workflow step 2, because it immediately builds a directory image.
#### Image creation speedup
You can greatly accelerate UDSS image creation by skipping compression step during the 2nd step of the basic workflow.
Instead of recommended sequence of commands, which produces .tar.gz file, you can use
ch-builder2tar --nocompress dockerimage ${TMPDIR}
ch-tar2dir ${TMPDIR}/myudss.tar imgdir
This can save several minutes depending on the size of the image.
`srun <slurm params> ch-run <charlie options> imgdir/<dockertag>`
This improvement was contributed by Vít Kabele.
which will execute your UDSS using SLURM in interactive mode. Beware, that by default, your home is binded to the image, which results overload of entire `/home` directory. You may disable this by specifying `--no-home` option.
Moreover, you may bind additional directories by `--bind=/some/dir` (which will appear as `/mnt/0` in your UDSS environment) or by `--bind=/source/dir:/dest/dir`.
#### CUDA import errors
The SquashFS case is better used in a batch mode, e.g. prepare a shell-script, which looks something like
It can happen, executing `ch-fromhost` from the 3rd step will produce some errors, which looks something like
ch-mount mysquashimg ${TMPDIR}
ch-run ${TMPDIR}/mysquashimg --bind=/mnt/home/myhome
ch-umount ${TMPDIR}/mysquashimg
/sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib64: No such file or directory
/sbin/ldconfig.real: Can't stat /usr/local/lib/x86_64-linux-gnu: No such file or directory
Then execute the script using
Ignore safely these errors, they do no harm to you.
`sbatch <slurm params> myudss.sh`
This warning/notice was contributed by Vít Kabele.
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment