Overview of the PPPL Research Clusters
The PPPL Cluster provides easy to use and convenient computing resources for PPPL.
It provides a computing facility for the small to mid size jobs not favored at the
leadership computing facilities.
It consists of several sub-clusters.
The entry point to the PPPL clusters is portal.pppl.gov.
Users login to the cluster through this load balanced pool of systems
using their PPPL Unix accounts. To get an account, send email to lscimeca@pppl.gov.
On portal, users test programs, stage their jobs, submit their jobs, and do other
interactive tasks.
PPPL's clusters are typically named after notable PPPL research physicists:
The dawson cluster is the main cluster with thousands of CPU cores and 2GB of memory
per core, and is used for parallel (>4 CPU) jobs. It is linked via a low latency private
10Gbps network. Named after
John Dawson.
The ellis cluster has ~160 CPUs and 4GB of memory per core.
It is used exclusively for small (1-4)
CPU jobs. It is linked via a 1Gb private network. Named after
Robert Ellis.
The kruskal cluster has 36 systems, each with 32 cpu cores
and 2GB of memory per core, linked via a 40Gb Infiniband
interconnect. Named after
Martin Kruskal.
The greene cluster has 48 systems, each with 16 cpus and a large
memory size (32-128GB of RAM per system), linked via a 40Gb Infiniband
interconnect. This cluster is used mainly by the XGC and M3D groups. Named after
John Greene.
The ganesh cluster has systems with large memory size (>= 4GB of RAM per CPU).
Several systems contain 32 cores/192GB of memory, and two have 64 cores/384GB. These
systems are useful for very large simulations or modeling applications.
Named after the Hindu elephant god
Ganesh.
The GPU cluster has systems with NVIDIA Tesla GPU cards. The system gpusrv01 contains two Nvidia 2070 GPU cards, each with 448 GPU cores (for a total of 896 gpu cores). The system gpusrv02 has two Nvidia K20 GPU cards, each with 2496 GPU cores (for a total of 4992 gpu cores). Each system has 32 CPU cores.
Based on the job attributes specified in a job script, the job scheduler will
select the right cluster and set of systems upon which to run a user's job, queue
the job for execution, and, if an appropriate set of systems is available, execute
the job.
Therefore, it is not necessary for the user to specify a job queue in their job script,
unless a special interconnect (ex. Infiniband), large memory, or other feature is required.
The PPPL cluster also offers interactive use of its nodes.
Users can enter the use command on portal to reserve one or more
systems for their exclusive use.
Users' home directories (/u) and project directories (/p) are served by very
high speed UNIX/ZFS fileservers which are responsible for storage and backup of files.
Home directories are limited in size and thus should not be used for project data.
Most projects have directories already created, and new users can be added
to the group that has write access to that project, but for new projects, a
new project directory can be created by submitting a request in the
helpdesk ticketing system.