Frequently Asked Questions
Q1: My code runs on a single processor. How can I run multiple copies, each with different input parameters, with a single batch submission? Also, since the hosts each have multiple processors, if I allocate the whole system to one process, one CPU is just sitting idle. So how do I run two processes on each host?
A1: Ernie Valeo has created a script that allows you do do this.
Q2: Do you have pbs reporting tools/packages that provides more info than simple pbs-report?
A2: You can type fnodes, which provides a count (by sub cluster) of free nodes, free processors, busy nodes, and a count of running and queued jobs.
Q3: What is Muda?
A3: Muda is the Japanese term for waste and comes from Lean Enterprise/Manufacturing. In this context we refer to 'waiting/idle muda'. It's a ratio of how much of a jobs'time is spent suspended, as a ratio of suspend time to CPU time. Our goal is to drive muda to zero.
Q4: UserA runs 8-16 node jobs all the time, and he got 0.128 Efcy. UserB runs single process jobs all the time, and he got 2.624 Efcy. Can you explain this?
A4: The explanation about the differences for Efcy of userA and userB is that MPI jobs spawn processes that are not being tracked by PBS. Therefore that would explain the difference.
Q5: Where is ghostview on portal?
A5: Please try ggv, which stands for Gnome Ghostview.
Q6: What is the maximum value that I can set for P4_GLOBMEMSIZE?
A6: Currently, on all PPPL cluster nodes, the maximum value for P4_GLOBMEMSIZE is 1073741824.
Q7: When I run mathematica on the portal systems, the fonts do not appear correctly.
A7: In your login file, you need to specify a font server from which to get the fonts. Add this line to your login file (.cshrc or .login or .bashrc):
xset +fp tcp/fontsrv1.pppl.gov:7100
Q8: How do I access a project space through samba on my iMac desktop or Mac notebook?
A8: On your Mac system, start Finder, click "Go" => "Connect to Server" option on the tool bar. Type "smb://samba.pppl.gov/projects" under "Server Address", authenticate, and open a x terminal on your Mac. "cd" to /Volumes/projects/my_project. Your project is automounted and may not appear under /Volumes/projects util you "cd" into it.
Q9: How do I access a project space through samba on my linux desktop?
A9: First, make sure you have samba-common and samba-client RPM packages installed. Then you may mount the samba projects as following example:
# mount.cifs //samba.pppl.gov/projects /mnt -o user=[my_cluster_user_name],workgroup=PPPL
# cd /mnt/my_project/
... do your work
# umount.cifs /mnt
Note: /mnt/my_project/ is mount with your cluster user privilege.
Q10: How do I access a project space through samba on my windows desktop?
A10: On your Windows system, start windows explore, click "Tools" => "Map Network Drive...". Select a drive from "Drive" drop down list and type "\\samba\projects\" under "Folder". is the same name that you use under /p/ on a cluster node. You may need to login with your domain account.
Q11: How can I recover a file that I accidently deleted from my home directory?
A11: Sysadmins take a snap shot of a user home directory nightly. To recover a file that is accidently deleted, you need to look for it in the snapshot that is captured from the previous night. First, 'cd' into $HOME/.zfs/snapshot/ and 'ls' to see all the snapshots that are available. They are listed in date and time format as 2011-11-29 for example. Then, copy the file you need back into your home directory($HOME). Due to limited disk space, only one snapshot from previous night is kept and available for users for now. Currently, we are working on deploying this feature to all project spaces.
Q12: When I try to rm files on my portal user directory, I get the following error:
rm: cannot remove `myfile.txt': Disk quota exceeded
This seems like a chicken-and-egg problem. How do I deal with the disk quota problem if I can't remove files?
A12: On ZFS, the filesystem that hosts our homedirs and project disks, you may find yourself unable to delete files with full disk quota. ZFS is a copy-on-write filesystem, so a file deletion transiently takes slightly more space on disk before a file is actually deleted. It has to write the metadata involved with the file deletion before it removes the allocation for the file being deleted. This is how ZFS is able to always be consistent on disk, even in the event of a crash.
Workaround: copy /dev/null to the file you want to delete. If this fails, try removing another single file (in some cases, some files can be removed while others cannot).