Skip to main content

Jobs, quota, and setup alerts

You may want to check the jobs teammate submitted to make sureensure they are reasonably leveraging your resources. This article covers how you check jobs, how the quota system works, and how we can we set up alerts to let the system monitor for you.

Check running, queuing, and completed jobs

Job owners can inspect their jobs through the top-right corner dropdown. AnOn the other hand, an account owner can inspectreview every member's jobs on the account page. ClickSo, the name ofclick the account name you want to check from the Accounts page. InThen, in the overview sub-page, you will see links to inspect jobs. ClickFinally, click either running or completed jobs to open the jobs window.

In the running tab, you see jobs currently running under the selected account.account, You can see thetheir requester, the partition, the duration, and the per-job real-time CPU or memory utilization. You may cancel any job by clicking the cancel button.

In the queuing tab, you see jobs currently waiting inside queues. For each job, you seequeues, the partition itthey isare in, and the requested CPU or memory. You may cancel or change its priority.

In the completed tab, there are jobs already completed or failed. Click the job ID to view the detailed charge and utilization status.

If you want to sort them by CPU or memory utilization, you may click the gear button at the top-right corner to toggle the columns of the table.

Setup alerts about utilization

To spot under-utilized jobs, we may inspect jobs on the portal in real time. But the system also provided a way to monitor it automatically. Switch to the settings tab on the account page, and you will see a job efficiency monitor section.

job-eff-monitor.png

By default, the system notifies the owner if their job is using below 50% of either CPU or memory. The system will not count the first 10 minutes arebecause assumedwe aassume warm-upthe soapplication willis notwarming count.up. You may play around with the settings for your need.

How quota works

OAsis is usinguses our own quota systemsystem, which is different from a typical SLURM setting. InsteadIt allows six meters setting instead of a combined total number, we divided it into six meters.number.

  • CPU Oneasia
  • GPU Oneasia
  • CPU Shared
  • GPU Shared
  • CPU Dedicated
  • GPU Dedicated

As their names tell, they are referring to CPU usage and GPU usage,usage over 3 node pools. The unit of CPU usage is the number of hours spent on one AMD EPYC 7713 core. On the other hand, the number of hours spent on one NVIDIA A100 GPU card.

Quota is applied on the account (group) level and it considers not just your account quota,quota but every upper-level account. For example, an institute may have 1,000 units of "GPU Oneasia" evenly distributed to 4 departments. And the departments can assign them to each project group. NewThe system only accepts jobs would be accepted only when all levels (institute, department, project group) have enough quota.

The system supports a custom reset period per account,account. youYou may choose from weekly, monthly, quarterly, and yearly.

Check current usage and my quota

You may check them through the web portal. They are shownportal on the accounts page.

view-quota.png

You may also check them through the CLI client as the following:

$ hc quotas
# Account   | CPU/Mem Oneasia   | CPU/Mem Shared    | GPU Shared        | CPU/Mem Dedicated | GPU Oneasia       | GPU Dedicated
# appcara   | 0.2 / 800         | 0.0               | 0.0               | 0.0               | 0.0 / 100         | 0.0

# of if you prefer a JSON format
$ hc quotas -o json
# [
#  {
#    "account_id": "appcara",
#    "quota": {
#      "oneasia_csu": 800.0,
#      "oneasia_gsu": 100.0
#    },
#    "usage": {
#      "dedicated_csu": 0.0,
#      "dedicated_gsu": 0.0,
#      "oneasia_gsu": 0.047666665,
#      "shared_csu": 0.0,
#      "shared_gsu": 0.0,
#      "oneasia_csu": 0.16666667
#    }
#  }
# ]

Set quota and auto alerts

If your upper-level account empowered you to modify quotas, you cancould do this on the account settings page.

quota-settings.png

You may change the "Behavior when quota exceeded" from "Notify Only" to "Auto kill jobs" if you want a hard quota limit.