Hydra

Data Processing on the Hydra Cluster

The Hydra Cluster

Hydra is the Smithsonian Institution's shared High Performance Computing (HPC) cluster, now fully hosted in the Ashburn Data Center. It provides large-scale CPU resources, GPU nodes, and high-performance storage for research computing. Hydra is designed for running parallel, batch, and high-memory workloads that exceed the capabilities of a single workstation or server.

Users access Hydra via SSH to the login nodes and submit jobs through the cluster scheduler rather than running computations directly on the login machines. It is well suited for compute-intensive CASA, Python, and other scientific workflows that benefit from parallel processing or GPU acceleration.

Hydra also includes dedicated GPU-enabled compute nodes for accelerated workloads. GPUs are available through the job scheduler and must be explicitly requested when submitting jobs. These nodes support CUDA-based applications and are suitable for machine learning, large-scale simulations, and GPU-accelerated scientific workflows. Users requesting GPUs should ensure their software respects the CUDA_VISIBLE_DEVICES environment variable set by the scheduler.

Using Hydra

For detailed instructions and template scripts for running a job on Hydra see Running Jobs on Hydra .

Hydra can be used for both serial (single-core) and parallel jobs. The link above includes examples of each.

Performance comparison of the different Hydra disks [2023].
Performance comparison of the RTDC v Hydra [2024].

External links

CF: High Performance Computing pages
Hydra Quick start guide
Hydra Reference pages
How to use GPUs on Hydra
Hydra QSub Generator (web page to generate your job submission file - VPN required)
Hydra QSub Generator - Instructions
Hydra password reset (username & VPN access required)
Operating status page - see what the cluster is running now.