Kruskal Cluster

Martin David Kruskal

 

Introduction

The kruskal cluster consists of 36 servers each with dual 16 core AMD Opteron processors (total 1152 cores). Hardware details can be found here
They are interconnected by Infiniband Quad Data Rate (40Gb) switches. These systems each boot from the same CentOS 6.8 Operating System image.

The kruskal queue schedules batch jobs on the kruskal cluster. The cluster is optimized for mid scale (>=32 core) parallel jobs, and is not intended to run small jobs which can be accommodated by other clusters.

Tips To Run Kruskal Jobs

1. Users are encouraged to build their executables with "pathscale/3.2" and "openmpi". Applications built with MPICH will run on kruskal through IPoIB, but this won't be able to fully utilize underlying Infiniband capacity through RDMA.

2. OpenMPI is integrated with the batch system. It uses a considerable amount of shared memory with RDMA technology. Users running openmpi need to increase this limit:

for a C shell user, put the following line in ./cshrc file:

limit memorylocked 1048576

for a Bash shell user, put the following line in ./bashrc file:

ulimit -l 1048576

3. OpenMPI attemps to determine which BTL driver to use at run time. The following example shows how to use the mca option (--mca) to instruct it to use InfiniBand driver: ("np" is number of cores)

mpirun --mca btl openib -np 128 myprog

Note: This is only recommended for older versions of openmpi, such as version 1.2.7. "--mca btl openib" became obsolete in version 1.3 and up. You do not need mca option anymore. Adding it improperly could stop your program from running.

4. Sample job script: