Brief introduction to the cluster
ToThe cluster consists of many components to provide a good experience for various tasks for various users, the cluster consists of many components.users. Below are some highlights:
- Variety in compute nodes
- Parallel file system storage
- Fast Infiniband and Ethernet network
- SSH servers cluster
- Web portal
- CLI client
- Software by modules and containers
This user guide will not cover the detailed hardware spec but instead focus on the experience instead.experience. If you are interested in those technical details, please feelget freein totouch contactwith us.
Compute nodes
We want to provide a heterogenous cluster with a wide variety of hardware and software,software for users to experience different combinations. Compute nodes may have very different models, architecture, and performances. We carefully build and fine-tune the software on the cluster to ensure they fully leverage the computing power. We provide tools on our web portal to help you to choose what suitsuits you.
Besides the OneAsia resources, bringing in hardware is also welcome. Our billing system is smart enough to charge jobs by individual nodes. That means one can submit a giant job to allocate computing power owned by multiple providers. To align the terminology, we group them into 3three pools:
- OneAsia
- Hardware owned by OneAsia Network
Limited.Limited
- Hardware owned by OneAsia Network
- Bring-in shared
- Hardware brought by external but willing to share with
others.others Quota,You could control priority with quota, priority, preemption, and fair sharecould be enforced to control who has priority.
- Hardware brought by external but willing to share with
- Bring-in dedicated
- Hardware brought by external but not willing to
share.share
- Hardware brought by external but not willing to
Storage
The cluster has a parallel file system which provides fast and reliable access to your data. We are charging monthly by the maximum allowed quota. You couldcan easilyquickly check your quota inthrough our web portal or through our CLI client. You may also request a larger quota anytime by submitting a ticket to us.
One should see at least 3three file sets, they are:
- User home directory
This is where you mayTo store your persistent data. It is mounted at /pfss/home/$USER and has a default quota of10GB.10GB
- User scratch directory
ThisToisbewhere you useused for I/O during your job. It is mounted at /pfss/scratch01/$USER andthehas a default quotaisof100GB.100GB- We may
bepurgepurgedinactive files every 30days.days
Inactive files - Group scratch directory
This is where youTo share your file with your group mate. It is mounted at /pfss/pass/scratch02/$GROUP andthehas a default quotaisof 1TB.
There are many ways you can access your files.
- From the web portal file browser
- SSH / SFTP
- Mount to your local computer using our CLI client
When running jobs, no matter whether you are using our modules or containers. All file sets you have access to will be available.
Networking
Traffic between compute nodes or between compute nodes and the parallel file systems is going through our Infiniband network. Both our modules or containers are compiled with the latest MPI toolchain to fully utilize the bandwidth.bandwidth fully.
Login nodes farm
Whether you access the cluster through our web portal or your own SSH client, you will be connecting to our SSH servers cluster. Our load balancer will connect you to the server with the least connections. Connections willWe only begrant granted when authenticatedconnections by a private key,key. noNo password authentication is allowed. You may connect through the web portal or the CLI client if you don't want to keep the private key.
Per connection, youYou will have one dedicated CPU core, 1GB of memory, and a limited bandwidth.bandwidth Itper isconnection. Free of charge for you to prepare your software, job script, and data. It is free of charge, and youYou will have access to all your file sets, all modules, containers, SLURM commands, and our CLI client.
Please leverage compute nodes for heavy workloads. In caseIf you really need more resources on the login node, please submit a ticket and let us help.
Web portal
Our web portal provides many features to make the journey easier. Our goal is to enable users from different backgrounds, to consume HPC resources easilyquickly and efficiently. We also leverage the web portal internally for research and management. We will cover the details in later chapters. Below are some highlights:
- Web Terminal
- File browser
- Software browser
- Quick jobs launcher
- Job efficiency viewer and alert
- Quota control
- Team management
- Ticket system
- Cost allocation
CLI client
To further accelerate the workflow, we created our own command line client. We will cover the details laterlater, but below are some example use cases:
- Connect to the login nodes farm without the private key
- Mount a file sets to your local computer
- Allocate ports from compute node for GUI workloads
- Check quota and usage
- Check cluster healthiness
Software
The cluster currently provides free software in two ways: Lmod and Containers. Our team is working hard to provide state-of-the-art software which fine-tuned for the cluster's compute nodes. You may log in to our web portal to browse the available software.
Besides software, we also provide pre-trained models and popular data sets. We will cover the details later.