Skip to main content

Brief introduction to the cluster

To provide a good experience for various tasks for various users, the cluster consists of many components. Below are some highlights:

  1. Variety in compute nodes
  2. Parallel file system storage
  3. Fast Infiniband and Ethernet network
  4. SSH servers cluster
  5. Web portal
  6. CLI client
  7. Software by modules and containers

This user guide will not cover the detailed hardware spec but instead focus on the experience instead. If you are interested in those technical details, please feel free to contact us.

Compute nodes

We want to provide a heterogenous cluster with a wide variety of hardware and software, for users to experience different combinations. Compute nodes may have very different models, architecture, and performances. We carefully build and fine-tune the software on the cluster to ensure they fully leverage the computing power. We provide tools on our web portal to help you to choose what suit you.

Besides the OneAsia resources, bringing in hardware is also welcome. Our billing system is smart enough to charge jobs by individual nodes. That means one can submit a giant job to allocate computing power owned by multiple providers. To align the terminology, we group them into 3 pools:

  1. OneAsia
    • Hardware owned by OneAsia Network Limited.
  2. Bring-in shared
    • Hardware brought by external but willing to share with others.
    • Quota, priority, preemption, and fair share could be enforced to control who has priority.
  3. Bring-in dedicated
    • Hardware brought by external but not willing to share.

Storage

The cluster has a parallel file system which provides fast and reliable access to your data. We are charging monthly by the maximum allowed quota. You could easily check your quota in our web portal or through our CLI client. You may also request a larger quota anytime by submitting a ticket to us.

One should see at least 3 file sets, they are:

  1. User home directory
    • This is where you may store your persistent data. It is mounted at /pfss/home/$USER and has a default quota of 10GB.
  2. User scratch directory
    • This is where you use for I/O during your job. It is mounted at /pfss/scratch01/$USER and the default quota is 100GB. Inactive files may be purged every 30 days.
  3. Group scratch directory
    • This is where you share your file with your group mate. It is mounted at /pfss/scratch02/$GROUP and the default quota is 1TB.

There are many ways you can access your files.

  1. From the web portal file browser
  2. SSH / SFTP
  3. Mount to your local computer using our CLI client

When running jobs, no matter whether you are using our modules or containers. All file sets you have access to will be available.

Networking

Traffic between compute nodes or between compute nodes and the parallel file systems is going through our Infiniband network. Both our modules or containers are compiled with the latest MPI toolchain to fully utilize the bandwidth.

Login nodes farm

Whether you access the cluster through our web portal or your own SSH client, you will be connecting to our SSH servers cluster. Our load balancer will connect you to the server with the least connections.

Per connection, you will have one dedicated CPU core and 1GB of memory. It is for you to prepare your software, job script, and data. It is free of charge, and you will have access to all your file sets, all modules, containers, SLURM commands, and our CLI client.

Web portal

CLI client

Softwares