[an error occurred while processing this directive]
CF: Services: High Performance Computing
 

Table of Content

  1. Introduction
  2. Access
  3. Support
  4. HPC Wiki
  5. Configuration
  6. Current cluster status is available here (@cf.harvard.edu), or here (@si.edu).
  7. Past cluster use is available here (@cf.harvard.edu), or here (@si.edu).

Introduction

SAO has access to a Linux based Beowulf cluster known as Hydra.
This cluster started at SAO a while back, and was then managed by the CF. It has since been moved to the Smithsonian's Data Center, located in Herndon, VA (near Washington, D.C.) and is managed by SI's Office of Research Computing (ORIS), part of SI's Office of the Chief Information Office (OCIO), and has become an SI-wide resource.


Access

Accounts on the cluster (passwords and home directories) are separate and distinct from CF or HEA unix accounts.
To use Hydra you need to request an account as follows: (CfA/SAO users only)
  • To request an account, use the account request page (do not email CF or HEA support).
  • All cluster users, regardless of their status, must read and sign SD 931 and be entered in the SDF.
  • Passwords on Hydra expire and must be changed every 90 days (per SD 931).
To login on Hydra, use ssh from a trusted machine. A trusted machine is
  • any CF or HEA managed computer,
  • including:
    • login.cfa.harvard.edu (CF ppl), or
    • pogoN.cfa.harvard.edu (HEA ppl) where N = 1, 2, ..., 6.
  • or from any machine connected to CfA via VPN.
Use either of the two login nodes, i.e.,
 % ssh hydra-login01.si.edu
or
 % ssh hydra-login02.si.edu 
both machines are identical.


Support

The cluster is managed by DJ Ding (DingDJ@si.edu), the system administrator in Herndon, VA.
Do not contact the CF or HEA support sys-admins for issues regarding Hydra.

Support for SAO, is provided by SAO's HPC analyst. This role is currently assumed by Sylvain Korzennik.

Sylvain is not the sys-admin, so contact DJ Ding at SI-HPC-Admin@si.edu for critical problems; contact Sylvain for application support, advice, help & suggestions.

A mailing list is available for the cluster users. Messages posted to this list are read by the cluster sysadmin and the HPC analyst and all the other cluster users.

Use this mailing list to communicate with other cluster users, share ideas, ask for help with cluster use problems, offer advice and solutions to problems, etc.

To post/send messages to the list, or to read past messages, you must log to the listserv.
You will need to choose a password the first time you use it (upper right, under "Options").
Because of the way this listserv is setup and managed at SI, emailing directly from your machine to HPCC-L@si-listserv.si.edu is likely to fail (with no warning or error message), so use the web portal.

Office Hours

Rather than dropping in during preset office hours, feel free to book a time slot for a Zoom-based one-on-one session using this page. You will receive confirmation and a Zoom link via email.

By default, these will take place on Thursdays between 2 and 5, on a first come first serve basis.

The target audience is primarily SAO users, since the bioinformatics support group already offers Bioinformatics Brown Bags on Wednesdays at noon EST. This should offer a convenient way to get help in running your tasks on Hydra.
This being said, non-SAO users are welcome to book a session if the problem(s) they want to resolve is/are related to job scheduling, scripting, etc, and not specific to bioinformatics applications.

Tutorials & Past Presentations

The plan is to create tutorials on various aspects of scientific computing and HPC. This a place holder for future links to these tutorials.

You can view slides of past presentations:


HPC Wiki

Information on how to use the system is at SI's HPC Wiki.

Contact the SAO's HPC analyst if you think that other topics and/or questions should be covered.


Configuration

Hardware

The cluster consists of some 100 compute nodes, for a total of around 5,000 compute cores (CPUs) and a few GPUs.
All the nodes are interconnected via a 10 GbE Ethernet switch, and on the 100 Gbps capable InfiniBand (IB) fabric, although older nodes have a 40 Gbps IB interface.
There is some 3PB of disk space, broken down in a 3 tier architecture (NetApp/NFS, GPFS, and FreeNAS), with some being public while some is project specific.


Software

The cluster is a Linux-based distributed cluster, we use Bright Cluster Manager to manage the OS and the Univa Grid Engine as job scheduler.

Access to the cluster is done via two login nodes, while the cluster queuing is run on a separate front-end node. To run on the cluster, you log on one of the login nodes, and you submit jobs using the queuing system (in batch mode, with the qsub command). You do not start interactively jobs on the compute nodes - you use instead a job script or request access to one of the interactive nodes. The login nodes are for normal interactive use like editing, compiling, script writing, etc. Computation are not run on these nodes either.

The following software is available on the cluster:

  • Compilers
    • GCC compilers,
    • NVIDIA's HPC SDK compilers - formerly Portland Group (PGI) compilers,
    • Intel compilers.
  • Libraries
    • MPI for all compilers (OpenMPI and MVAPICH), including IB support,
    • the libraries that come with the compilers,
    • GSL, BLAS, LAPACK, etc.
  • Packages
    • IDL, including 128 run-time licenses, GDL/FL,
    • MATLAB - untime only,
    • Python,
    • JULIA, R, Java, etc.

If you need some specific software or believe that it would benefit the user community to have some additional sofware available on the cluster, contact the SAO's HPC analyst.



SAO HPC analyst - Sylvain Korzennik  (hpc@cfa.harvard.edu)
Last modified Saturday, 13-Nov-2021 08:10:14 EST
 
 

Section Photo