Run :program:`mix-models` on UnICC ********************************** This is a guide on how to get set up on the Unified IIASA Computing Cluster (UnICC) and how to run MESSAGEix scenarios on the cluster. .. attention:: - The steps in this guide will only be actionable for IIASA staff and collaborators who have access to the UnICC. It *may* be of use to others with access to similar systems, but is not (yet) intended as a general guide. - The information contained is up-to-date as of 2025-01-16. Changes to the cluster configuration may change the required steps. .. contents:: :local: :backlinks: none Prerequisites and good-to-knows =============================== Access to the UnICC ------------------- To access the UnICC, an IIASA account is required. With your IIASA account, create a ticket with Information and Communication Technologies (ICT) to request access to the UnICC. The intranet page on the UnICC can be found `here `__. On the intranet webpage, the Slurm User Guide file has a section on how to request access to the UnICC, including what information needs to be provided to ICT in your request: 1. Are there any existing shared projects folder inside the cluster that you need access to? 2. Do you need a new shared project folder inside the cluster? In this case, please specify the project name (default size 1 TB), also the name of the users who need access to the folder. 3. Please note that existing home folders will be automatically attached. 4. Please describe which already existing P: drive folder(s) you need access to from inside the cluster. 5. Please note, a 5GB home folder will be automatically created for you in the cluster. Storage space ~~~~~~~~~~~~~ When requesting access to UnICC, 5GB of space on your home directory will likely be given by default. While setting up the MESSAGE environment, it is easy to hit the maximum (the repositories like `message_data` are big and GAMS installation itself is almost 2GB on its own). So, request more space upfront or ask for an increase later (it is possible to request 50GB of storage space, and increase that even further later). Network drive access ~~~~~~~~~~~~~~~~~~~~ As part of the questionnaire above for the ticket, specify which P: drive folders you need access to. Additionally, access to your H: drive on the cluster will be automatically granted. Every user's H: drive is located on the cluster in `/hdrive/all_users/[username]`. If a shared project folder was requested, it will be located in ``/projects/[project name]``. Using MESSAGE environments on H: drive vs setting up new MESSAGE Environments ----------------------------------------------------------------------------- This guide walks through the process of installing a MESSAGEix environment from source on the cluster (in your home directory). Theoretically, because the H: drive can be accessed on the cluster, repositories and MESSAGEix environments could possibly be in your H: drive folder. Then, potentially, just activate your MESSAGE environment(s) from the H: drive, saving the trouble of creating new MESSAGE environments. Working in terminal ------------------- The rest of this document assumes you're in a terminal window on the UnICC cluster and not in a notebook. Also, throughout this guide :program:`nano` is used to edit files. If :program:`nano` is not familiar, use :program:`vim`, :program:`emacs` or any other text editor you're comfortable with. Git-related setup ================= Generate SSH Key ---------------- This was needed to clone GitHub repositories. Follow GitHub's instructions to `generate a new SSH key and add it to the ssh-agent `_, then `add the new SSH key to your GitHub account `_ Run: .. code:: bash ssh-keygen -t ed25519 -C "you@email.com" # replace with your own keygen info and email Received prompt: .. code:: bash Generating public/private ed25519 key pair. Enter file in which to save the key (/h/u142/username/.ssh/id_ed25519): Enter passphrase (empty for no passphrase): (Save your passphrase somewhere safe.) Add SSH Key to SSH-Agent ------------------------ Start ssh-agent in the background: .. code:: bash eval "$(ssh-agent -s)" Add SSH private key to ssh-agent: .. code:: bash ssh-add ~/.ssh/id_ed25519 Add SSH Key to GitHub Account ----------------------------- Run: .. code:: bash cat ~/.ssh/id_ed25519.pub Copy the content. On GitHub, go to Settings > SSH and GPG keys. Click on “New SSH key”. Name new SSH key and paste the key. Creating Personal Access Tokens ------------------------------- This was needed to clone message_data for some reason. Refer to `creating a personal access token `_ for instructions. In Settings > Developer settings > Personal access tokens > Fine-grained tokens 1. Click “Tokens (classic)” 2. Select Generate new token > Generate new token (classic) 3. Enter token name “IIASA UnICC” 4. Select “No expiration”. Add Email and Username to Global Git Config ------------------------------------------- .. code:: bash git config --global user.email "you@email.com" # replace with your GitHub email git config --global user.username "username" # replace with your GitHub username git config --global user.name "Firstname Lastname" # replace with your name Auto Load Python and Java on Startup ------------------------------------ Add the following to :file:`$HOME/.bash_profile` (by entering :code:`nano ~/.bash_profile`): .. code:: bash module purge module load Python/3.11.5-GCCcore-13.2.0 module load Java module load git-lfs This ensures that the correct Python version is loaded (and added to $PATH) and that Java is loaded (and added to $PATH) each time the terminal is loaded. Create Virtual Environment -------------------------- A lot of people on the team use ``conda`` but Python’s ``venv`` is used to create the virtual environment. .. important:: When initially trying to create a virtual environment by just running :code:`python -m venv my_env`, it caused issues when trying to activate the environment in a Slurm job. It works just fine interactively on the node, but when using within a job, it would fail to activate. The reason is because the default :program:`python` command on the interactive node creates an environment using the default Python instance, inherited from Jupyter, which is not accessible from the compute nodes where the Slurm job will run. So it’s necessary to create an environment the following way. In the home directory (:file:`~` or :file:`$HOME`), run the following to create and activate the virtual environment (note that if the instructions earlier to run :code:`module purge`` or :code:`module load` in your :file:`~/.bash_profile` were followed, these steps probably don’t have to be done again): .. code:: bash module purge module load Python/3.11.5-GCCcore-13.2.0 python3 -m venv env/env_name source ~/env/env_name/bin/activate Install MESSAGEix Ecosystem by Source ===================================== Get ``message_ix`` Repository ----------------------------- Run: .. code:: bash git clone https://github.com/username/message_ix.git # replace with your own fork or the IIASA repo cd message_ix git remote add upstream https://github.com/iiasa/message_ix git pull upstream main git fetch --all --tags Install ``message_ix`` ---------------------- 1. Navigate to the local ``message_ix`` repo root directory. 2. Ensure you’re on the ``main`` branch: .. code:: bash git checkout main 3. Ensure branch is up-to-date: .. code:: bash git pull upstream main 4. Fetch the version tags: .. code:: bash git fetch --all --tags 5. Install from source: .. code:: bash pip install --editable .[docs,reporting,tests,tutorial] 6. Check ``message_ix`` is installed correctly: .. code:: bash message-ix show-versions Get ``ixmp`` Repository ----------------------- .. code:: bash git clone https://github.com/username/ixmp.git # replace with your own fork or the IIASA repo cd ixmp git remote add upstream https://github.com/iiasa/ixmp git pull upstream main git fetch --all --tags Install ``ixmp`` ---------------- 1. Navigate to the local ``ixmp`` repo root directory. 2. Ensure you’re on the ``main`` branch. .. code:: bash git checkout main 3. Ensure branch is up-to-date: .. code:: bash git pull upstream main 4. Fetch the version tags: .. code:: bash git fetch --all --tags 5. Install from source: .. code:: bash pip install --editable .[docs,tests,tutorial] Get ``message-ix-models`` Repository ------------------------------------ .. code:: bash git clone https://github.com/username/message-ix-models.git # replace with your own fork or the IIASA repo cd message-ix-models git remote add upstream https://github.com/iiasa/message-ix-models git fetch --all --tags git pull upstream main Install ``message-ix-models`` ----------------------------- 1. Navigate to the local ``message-ix-models`` root directory. 2. Ensure you’re on the ``main`` branch: .. code:: bash git checkout main 3. Ensure branch is up-to-date: .. code:: bash git pull upstream main 4. Fetch the version tags: .. code:: bash git fetch --all --tags 5. Install from source: .. code:: bash pip install --editable . Install :program:`git-lfs` -------------------------- UnICC already has :program:`git lfs` installed on the system, but you may still need install large file storage for ``message_data`` or ``message-ix-models``. Note that you may not have to, as perhaps you don't need to access the large files in these repositories for your work. The benefit of not installing is that you don't end up using all the needed storage space. But if you do need access to those files, then follow the instructions below. The same instructions can be followed from the root directory of ``message_data`` or ``message_ix_models``. Load ``git lfs`` (if included in your ``~/.bash_profile`` like written earlier, this line doesn’t have to be run): .. code:: bash module load git-lfs Then, within the root directory of ``message-ix-models`` or ``message_data`` run the following: .. code:: bash git lfs install Then fetch and pull the lfs files (this might take a while): .. code:: bash git lfs fetch --all git lfs pull Get ``message_data`` Repository ------------------------------- .. code:: bash git clone git clone git@github.com:username/message_data.git # replace with your own fork or the IIASA repo cd message_data git remote add upstream https://github.com/iiasa/message_data git fetch --all --tags Install ``message_data`` ------------------------ 1. Navigate to the local ``message_data`` root directory. 2. Ensure you're on the branch you want to be on: .. code:: bash git checkout branch # replace "branch" with the branch you want to be on 3. Ensure branch is up-to-date: .. code:: bash git pull upstream branch 4. Fetch the version tags: .. code:: bash git fetch --all --tags 5. Install from source with all options: .. code:: zsh pip install --no-build-isolation --editable .[ci,dl,scgen,tests] If the above doesn’t work, remove the ``--no-build-isolation``: .. code:: zsh pip install --editable .[ci,dl,scgen,tests] Also grab lfs: .. code:: bash git lfs fetch --all git lfs pull GAMS ---- From module ~~~~~~~~~~~ GAMS is provided as a module. Load the module: .. code:: bash module load gams Install manually ~~~~~~~~~~~~~~~~ Go to the following website to get the download of GAMS: https://www.gams.com/download/ Click on the Linux download link, and then when the download popup window shows up, right click and copy the link instead. Use the link to put in the terminal to download the file: .. code:: bash cd downloads wget https://d37drm4t2jghv5.cloudfront.net/distributions/46.5.0/linux/linux_x64_64_sfx.exe The Linux installation instructions are here: https://www.gams.com/46/docs/UG_UNIX_INSTALL.html Create a location/directory where GAMS will be installed and navigate to it (in this case, it is in a folder called ``~/opt/gams``) .. code:: bash cd ~ mkdir opt cd opt/ mkdir gams cd gams/ Run the installation file by simply inputting the filename (complete with path) into the command line: .. code:: bash ~/downloads/linux_x64_64_sfx.exe # replace with your own path However, a permissions error was received: .. code:: bash bash: /home/username/downloads/linux_x64_64_sfx.exe: Permission denied If so, run the following: .. code:: bash chmod 754 /home/username/downloads/linux_x64_64_sfx.exe # replace path with your own path to the .exe file Then try to run the executable file again: .. code:: bash ~/downloads/linux_x64_64_sfx.exe This should start the installation of GAMS and create a folder in ``~/opt/gams`` (or wherever GAMS is being installed) called ``gams46.5_linux_x64_64_sfx``. Navigate into this folder: .. code:: bash cd gams46.5_linux_x64_64_sfx When within the ``/home/username/opt/gams/gams46.5_linux_x64_64_sfx``, run the ``gams`` command to see if it works (but at this moment the full path of the ``gams`` command has to be referenced, which is ``/home/username/opt/gams/gams46.5_linux_x64_64_sfx/gams``): .. code:: bash → /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gams --- Job ? Start 06/11/24 14:18:48 46.5.0 a671108d LEX-LEG x86 64bit/Linux *** *** GAMS Base Module 46.5.0 a671108d May 8, 2024 LEG x86 64bit/Linux *** *** GAMS Development Corporation *** 2751 Prosperity Ave, Suite 210 *** Fairfax, VA 22031, USA *** +1 202-342-0180, +1 202-342-0181 fax *** support@gams.com, www.gams.com *** *** GAMS Release : 46.5.0 a671108d LEX-LEG x86 64bit/Linux *** Release Date : May 8, 2024 *** To use this release, you must have a valid license file for *** this platform with maintenance expiration date later than *** Feb 17, 2024 *** System Directory : /home/username/opt/gams/gams46.5_linux_x64_64_sfx/ *** *** License : /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gamslice.txt *** GAMS Demo, for EULA and demo limitations see G240131/0001CB-GEN *** https://www.gams.com/latest/docs/UG%5FLicense.html *** DC0000 00 *** *** Licensed platform : Generic platforms *** The installed license is valid. *** Evaluation expiration date (GAMS base module) : Jun 29, 2024 *** Note: For solvers, other expiration dates may apply. *** Status: Normal completion --- Job ? Stop 06/11/24 14:18:48 elapsed 0:00:00.001 Based on the output, there already is a gamslice (located in ``~/opt/gams/gams46.5_linux_x64_64_sfx``), which the contents can be checked: .. code:: bash → cat gamslice.txt GAMS_Demo,_for_EULA_and_demo_limitations_see_________________ […] https://www.gams.com/latest/docs/UG%5FLicense.html_______________ […] This seems to be a demo gamslice license, so rename it to ``gamslice_demo.txt`` so it can be replaced with a proper license. .. code:: bash mv gamslice.txt gamslice_demo.txt Copy one of the GAMS licenses in the ECE program folder and put it into the H: drive in a folder called ``gams``. Within UnICC, the H: drive can be accessed via: ``/hdrive/all_users/username/``. So, copy the GAMS license from the H: drive to the GAMS installation location (the paths will be different depending on where the file is saved on your own H: drive): .. code:: bash cp /hdrive/all_users/username/gams/gamslice_wCPLEX_2024-12-20.txt /home/username/opt/gams/gams46.5_linux_x64_64_sfx/ Then, within the ``/home/username/opt/gams/gams46.5_linux_x64_64_sfx/`` folder, rename the ``gamslice_wCPLEX_2024-12-20.txt`` file to just ``gamslice.txt``: .. code:: bash mv gamslice_wCPLEX_2024-12-20.txt gamslice.txt Now, when the ``gams`` command is called, the output looks like this: .. code:: bash → /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gams --- Job ? Start 06/11/24 14:24:43 46.5.0 a671108d LEX-LEG x86 64bit/Linux *** *** GAMS Base Module 46.5.0 a671108d May 8, 2024 LEG x86 64bit/Linux *** *** GAMS Development Corporation *** 2751 Prosperity Ave, Suite 210 *** Fairfax, VA 22031, USA *** +1 202-342-0180, +1 202-342-0181 fax *** support@gams.com, www.gams.com *** *** GAMS Release : 46.5.0 a671108d LEX-LEG x86 64bit/Linux *** Release Date : May 8, 2024 *** To use this release, you must have a valid license file for *** this platform with maintenance expiration date later than *** Feb 17, 2024 *** System Directory : /home/username/opt/gams/gams46.5_linux_x64_64_sfx/ *** *** License : /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gamslice.txt *** Small MUD - 5 User License S230927|0002AP-GEN *** IIASA, Information and Communication Technologies Dep. *** DC216 01M5CODICLPTMB *** License Admin: Melanie Weed-Wenighofer, wenighof@iiasa.ac.at *** *** Licensed platform : Generic platforms *** The installed license is valid. *** Maintenance expiration date (GAMS base module): Dec 20, 2024 *** Note: For solvers, other expiration dates may apply. *** Status: Normal completion --- Job ? Stop 06/11/24 14:24:43 elapsed 0:00:00.000 I then add the GAMS path to my ``~/.bash_profile``: .. code:: bash # add GAMS to path export PATH=$PATH:/home/username/opt/gams/gams46.5_linux_x64_64_sfx I also add the GAMS aliases: .. code:: bash # add GAMS to aliases alias gams=/home/username/opt/gams/gams46.5_linux_x64_64_sfx/gams alias gamslib=/home/username/opt/gams/gams46.5_linux_x64_64_sfx/gamslib Now, running just ``gams`` anywhere in the terminal gives the following output: .. code:: bash → gams --- Job ? Start 06/11/24 15:14:28 46.5.0 a671108d LEX-LEG x86 64bit/Linux *** *** GAMS Base Module 46.5.0 a671108d May 8, 2024 LEG x86 64bit/Linux *** *** GAMS Development Corporation *** 2751 Prosperity Ave, Suite 210 *** Fairfax, VA 22031, USA *** +1 202-342-0180, +1 202-342-0181 fax *** support@gams.com, www.gams.com *** *** GAMS Release : 46.5.0 a671108d LEX-LEG x86 64bit/Linux *** Release Date : May 8, 2024 *** To use this release, you must have a valid license file for *** this platform with maintenance expiration date later than *** Feb 17, 2024 *** System Directory : /home/username/opt/gams/gams46.5_linux_x64_64_sfx/ *** *** License : /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gamslice.txt *** Small MUD - 5 User License S230927|0002AP-GEN *** IIASA, Information and Communication Technologies Dep. *** DC216 01M5CODICLPTMB *** License Admin: Melanie Weed-Wenighofer, wenighof@iiasa.ac.at *** *** Licensed platform : Generic platforms *** The installed license is valid. *** Maintenance expiration date (GAMS base module): Dec 20, 2024 *** Note: For solvers, other expiration dates may apply. *** Status: Normal completion --- Job ? Stop 06/11/24 15:14:28 elapsed 0:00:00.000 I can also test if GAMS is working properly by running ``gams trnsport``: .. code:: bash → gams trnsport --- Job trnsport Start 06/11/24 15:15:00 46.5.0 a671108d LEX-LEG x86 64bit/Linux --- Applying: /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gmsprmun.txt --- GAMS Parameters defined Input /home/username/opt/gams/gams46.5_linux_x64_64_sfx/trnsport.gms ScrDir /home/username/opt/gams/gams46.5_linux_x64_64_sfx/225a/ SysDir /home/username/opt/gams/gams46.5_linux_x64_64_sfx/ Licensee: Small MUD - 5 User License S230927|0002AP-GEN IIASA, Information and Communication Technologies Dep. DC216 /home/username/opt/gams/gams46.5_linux_x64_64_sfx/gamslice.txt License Admin: Melanie Weed-Wenighofer, wenighof@iiasa.ac.at The maintenance period of the license will expire on Dec 20, 2024 Processor information: 2 socket(s), 128 core(s), and 256 thread(s) available GAMS 46.5.0 Copyright (C) 1987-2024 GAMS Development. All rights reserved --- Starting compilation --- trnsport.gms(66) 3 Mb --- Starting execution: elapsed 0:00:00.022 --- trnsport.gms(43) 4 Mb --- Generating LP model transport --- trnsport.gms(64) 4 Mb --- 6 rows 7 columns 19 non-zeroes --- Range statistics (absolute non-zero finite values) --- RHS [min, max] : [ 2.750E+02, 6.000E+02] - Zero values observed as well --- Bound [min, max] : [ NA, NA] - Zero values observed as well --- Matrix [min, max] : [ 1.260E-01, 1.000E+00] --- Executing CPLEX (Solvelink=2): elapsed 0:00:00.053 IBM ILOG CPLEX 46.5.0 a671108d May 8, 2024 LEG x86 64bit/Linux --- GAMS/CPLEX Link licensed for continuous and discrete problems. --- GMO setup time: 0.00s --- GMO memory 0.50 Mb (peak 0.50 Mb) --- Dictionary memory 0.00 Mb --- Cplex 22.1.1.0 link memory 0.00 Mb (peak 0.00 Mb) --- Starting Cplex Version identifier: 22.1.1.0 | 2022-11-28 | 9160aff4d CPXPARAM_Advance 0 CPXPARAM_Simplex_Display 2 CPXPARAM_MIP_Display 4 CPXPARAM_MIP_Pool_Capacity 0 CPXPARAM_MIP_Tolerances_AbsMIPGap 0 Tried aggregator 1 time. LP Presolve eliminated 0 rows and 1 columns. Reduced LP has 5 rows, 6 columns, and 12 nonzeros. Presolve time = 0.00 sec. (0.00 ticks) Iteration Dual Objective In Variable Out Variable 1 73.125000 x(seattle,new-york) demand(new-york) slack 2 119.025000 x(seattle,chicago) demand(chicago) slack 3 153.675000 x(san-diego,topeka) demand(topeka) slack 4 153.675000 x(san-diego,new-york) supply(seattle) slack --- LP status (1): optimal. --- Cplex Time: 0.00sec (det. 0.01 ticks) Optimal solution found Objective: 153.675000 --- Reading solution for model transport --- Executing after solve: elapsed 0:00:00.482 --- trnsport.gms(66) 4 Mb *** Status: Normal completion --- Job trnsport.gms Stop 06/11/24 15:15:01 elapsed 0:00:00.483 Set Up ``ixmp_dev`` ------------------- If you are a MESSAGEix developer with access to the `ixmp_dev` database, set up your access to the `ixmp_dev` database. Running MESSAGEix on the cluster ================================ Example script -------------- Here is a simple Python script to simply grab, clone, and solve a MESSAGE. Create it by calling `nano ~/job/message/solve.py`, then pasting the following: .. code:: python import message_ix # select scenario model_orig = "model" # replace with name of real model scen_orig = "scenario" # replace with name of real scenario # target scenario model_tgt = "unicc_test" scen_tgt = scen_orig + "_cloned" comment = "Cloned " + model_orig + "/" + scen_orig # load scenario print("Loading scenario...") s, mp = message_ix.Scenario.from_url("ixmp://ixmp_dev/" + model_orig + "/" + scen_orig) # clone scenario print("Cloning scenario...") s_new = s.clone(model_tgt, scen_tgt, comment, keep_solution=False) # solve the cloned scenario print("Solving scenario...") s_new.set_as_default() s_new.solve( "MESSAGE", ) # close db print("Closing database...") mp.close_db() Submitting Jobs --------------- To submit a job, create a new file called ``job.do``, but it doesn’t have to be called that and it can have any file extension. For example, it can be called ``submit.job`` or even ``hi.jpeg``, and those would all work. So, run: .. code:: bash nano ~/job/message/job.do In the editor, write/paste: .. code:: bash #!/bin/bash #SBATCH --time=3:00:00 #SBATCH --mem=40G #SBATCH --mail-type=BEGIN,END,FAIL #SBATCH --mail-user=username@iiasa.ac.at #SBATCH -o ~/out/solve_%J.out #SBATCH -e ~/err/solve_%J.err module purge source /opt/apps/lmod/8.7/init/bash module load Python/3.11.5-GCCcore-13.2.0 module load Java echo "Activating environment..." source ~/env/env-name/bin/activate echo "Running python script..." python ~/job/message/solve.py This script requests the following: - 3 hours of time - 40 GB of memory - Send an email when the job begins and ends (or fails) - Send email to the address provided - Save the outputs of the job (not the solved scenario, just any print statements in the Python script or anything like that) in ``/home/username/out/message/``, and the file would be called ``solve_%J.out`` where the “%J” is the job number - Same as above, but saves the errors in an ``err`` folder. This is helpful when the script outputs a lot of warnings or errors and now there is a separate file for errors/warnings and a separate file for just the output. You can choose to forego saving the outputs and errors to files, but it is helpful to have them saved somewhere in case you need to refer back to them or to see what happened during the job. If using the exact same script as above, you will have to manually create the ``out`` and ``err`` folders in the home directory first, if they don't already exist. You can do this by running: .. code:: bash mkdir ~/out mkdir ~/err It is important (I think) to load the Python and Java modules. I’m not sure why the ``source /opt/apps/lmod/8.7/init/bash`` line is there, but ICT included that in an email to me when I was asking for help. To submit the job, run the following (assuming you are in the folder where ``job.do`` is located): .. code:: bash sbatch job.do The ``sbatch`` command is what submits the job, and whatever argument that comes after it is your job file. Checking queue -------------- To check the status of the job(s) by the user: .. code:: bash squeue -u username While the job is waiting/pending, your queue may look like this: .. code:: bash JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1234567 batch job1 username PD 0:00 1 (Resources) The ``ST`` column shows the status of the job. ``PD`` means pending. When the job is running, the queue may look like this: .. code:: bash JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1234567 batch job1 username R 0:01 1 node1 Usually my jobs run right away or within a few minutes of being submitted, but sometimes they can sit in the queue for a while. This is usually because there are a lot of jobs in the queue, and the cluster is busy. To check where all jobs submitted by all users are in the queue: .. code:: bash squeue Checking job run information ---------------------------- To check information about a specific job, a helpful command is (replace ``1234567`` with the actual job ID): .. code:: bash scontrol show jobid 1234567 Your output will look something like this: .. code:: bash JobId=404543 JobName=job.do UserId=mengm(32712) GroupId=mengm(60100) MCS_label=N/A Priority=10000 Nice=0 Account=default QOS=normal JobState=FAILED Reason=NonZeroExitCode Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=1:0 DerivedExitCode=0:0 RunTime=00:00:11 TimeLimit=03:00:00 TimeMin=N/A SubmitTime=2025-01-22T05:56:31 EligibleTime=2025-01-22T05:56:31 AccrueTime=2025-01-22T05:56:31 StartTime=2025-01-22T05:56:35 EndTime=2025-01-22T05:56:46 Deadline=N/A PreemptEligibleTime=2025-01-22T05:56:35 PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-01-22T05:56:35 Scheduler=Backfill Partition=generic AllocNode:Sid=10.42.153.116:248 ReqNodeList=(null) ExcNodeList=(null) NodeList=compute2 BatchHost=compute2 NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* ReqTRES=cpu=1,mem=40G,node=1,billing=1 AllocTRES=cpu=1,mem=40G,node=1,billing=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* JOB_GRES=(null) Nodes=compute2 CPU_IDs=2 Mem=40960 GRES= MinCPUsNode=1 MinMemoryNode=40G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/mengm/job/message/job.do WorkDir=/home/mengm StdErr=/home/mengm/~/err/solve_%J.err StdIn=/dev/null StdOut=/home/mengm/~/out/solve_%J.out Power= MailUser=username@iiasa.ac.at MailType=BEGIN,END,FAIL Here you see the job information, including submit time, the associated commands/files, and the output files. Additionally, here you can see the resources requested and allocated for the job, such as number of nodes, CPUs, memory, etc. The ``JobState`` will show the status of the job. If it is ``FAILED``, the ``Reason`` will show why it failed. The ``ExitCode`` will show the exit code of the job. If it is ``0:0``, then the job ran successfully. If it is ``1:0``, then the job failed. When my job fails, I usually go ahead and check both the ``err`` and ``out`` files to see what happened. The ``err`` file will show any errors or warnings that occurred during the job, and the ``out`` file will show any print statements or output from the Python script. Another useful command to check recent jobs and their information is: .. code:: bash sacct -l However, this will show a lot of information, so it might be better to run a more specific command like: .. code:: bash sacct --format=jobid,MaxRSS,MaxVMSize,start,end,CPUTimeRAW,NodeList Resources to request for reducing MESSAGEix run time ---------------------------------------------------- The following information is based on non-scientific "testing" (goofing around), so take it with a grain of salt. I have found that requesting more CPUs per task can help reduce the run time of a MESSAGEix solve. For example, a MESSAGE job with ``#SBATCH --cpus-per-task=4`` took over 30 minutes to finish, whereas the same job with ``#SBATCH --cpus-per-task=16`` took about 20 minutes to finish. I also tried changing ``#SBATCH --ntasks=1`` to ``#SBATCH --ntasks=4``, but that didn't seem to make a difference in run time. So usually my ``SBATCH`` job request settings look like this: .. code:: bash #SBATCH --time=20:00:00 #SBATCH --mem=100G #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=16 I usually request lots of run time (20 hours) and lots of memory (100 GB) because I don't want my job to fail for those reasons. .. caution:: Many users making such requests simultaneously is likely to worsen congestion on UnICC and make it less usable for all users. A better approach is to use one's own best estimates of the actual resource use, multiplied by a safety factor. I keep ``--nodes=1`` because I don't know enough about running on multiple nodes, and I don't really do any parallel computing, so I don't think I need to request more than one node. In general though I'm sure there are other settings people can play around with to optimize their job run time, including maybe on the CPLEX side for example, but I haven't really looked into that, and this is just what I've found so far. Note on memory -------------- If this is not specified, the default amount of memory that gets assigned to the job is 2GB. I think more CPUs per job could also be requested instead, which would also give more memory (2 GB times the number of CPUs). But instead, just request more memory. I especially recommend this because if you're running legacy reporting, that requires a bit of memory, so your job might fail if you don't request enough memory. Changes ======= 2025-01-16 Initial version of the guide by :gh-user:`measrainsey`.