Advanced Job Management @ A*STAR II

The advanced job management training focuses on array job, job dependencies, checkpoint and restart, file stage-in/out and troubleshooting job submission issues. This will help users to more efficiently run their jobs by best utilizing the hardware.

2 May 2019

Level 17, Connexis South Tower, 1 Fusionopolis Way, Singapore 138632

Overview

The advanced job management training focuses on array job, job dependencies, checkpoint and restart, file stage-in/out and troubleshooting job submission issues. This will help users to more efficiently run their jobs by best utilizing the hardware.

  • Introduction
  • Job management and project info in brief
  • Job exit codes
  • MPI jobs in batch mode
    • “mpirpocs” parameter
    • MPI tight integration
  • Multithreaded jobs and OMP_NUM_THREADS in batch mode
  • Details on memory enforcement
  • Job Arrays
  • Job dependencies
  • PBS Reservations
  • Using Check pointing
  • File stage-in/out
  • Using IME
  • Troubleshooting
  • Lab Session
  • Using Compute Manager
  • Using Display Manager
  • A valid user account on NSCC system, ASPIRE1
  • Pre-installed SSH client like Putty or Moba-Xterm to connect to ASPIRE1 on user’s laptop
  • Basic understanding of Linux commands.
    1. File management
    2. “vi” editor
    3. Using “modules” in Linux
    4. Process management
  • Basic PBS Pro job management

At the end of this course, one will have a fair understanding of advanced job management such as array jobs, reservations, Job dependencies, file stage-in/out, IME and Compute, Display Manager.