`glurmo` by example

Version 0.1

A brief guide to using the `glurmo` command line utility

Configuration files

Overview

At the core of every glurmo simulation there is a settings directory, .glurmo, which contains three configuration files: settings.json, script_template, and slurm_template. We will start by going through each of these files in turn.

settings.json

settings.json is the file in which you specify the general settings of your simulation as well as the specific parameters for the script and the slurm script. We will go over each of these in turn, but first, an important note: you cannot use comments in settings.json! Unfortunately this seems to break the parser shipped with go- I’m hoping to fix this when I have time, or simply switch to a different file format like .toml. Sorry about this!

General settings

General settings are stored under the “general” sub-object of the overall settings object. Right now, there are only two things you need to specify here: the number of simulations (“n_sims”) and the simulation id (“id”). The “id” setting should be unique, since this is how glurmo identifies simulations associated with this studies. If you re-use ids, bad things could happen if you try to cancel jobs.

Script settings

Script settings are stored under the “templates” sub-object of the overall settings object. Entries here represent parameters that you may want to vary across your simulation scripts or slurm scripts. For example, this settings file specifies the number of data points to simulate (N) as well as the number of CPUs to use in the simulation study, among other things.

There are two settings that must be specified under script settings: “script_extension” and “result_extension.” The first tells glurmo what kind of scripts you’re running. In this case, we’re running our simulations with R, so we’re using the “.R” extension. The second tells glurmo the file extension for the results, i.e. how we’ll store the results of a simulation. In this case, we’re storing our results as .RData files, so the “result_extension” is “.RData”.

script_template

The script_template file serves as a template for- you guessed it- your scripts. A script template is a mixture between code in a certain language (in this case, R) as well as templating markdown (denoted by pairs of curly brackets, i.e. {{.parameter}}). The key utility of this file is that for each simulation, the templating markdown will be replaced by its corresponding parameter in the “template” section of settings.json. For example, {{.N}} will be replaced with 100 in this particular simulation.

If you could only use a static set of variables, glurmo would not be very useful, since it would just create and run the same script a certain number of times. This might yield different results, but it wouldn’t be reproducible. That is why glurmo makes a couple of “script specific” variables available to you, even though they aren’t specified in settings.json. These variables are index (see line 1) and results_path (see line 33). index captures the number of the script you’re in, and is zero indexed. So for the first script, it takes on the value of 0; for the second, 1; and so on. results_path is the path that you should use to save the result of the current simulation. What you save and how you save it is up to you, but you must create a file at the results path with the result extension from settings.json; otherwise, glurmo will have no way of knowing that this simulation has completed. Note that this can be as simple as creating an empty file with whatever extension you choose.

slurm_template

slurm_template is analogous to script_template, but for templating your slurm submissions. Note again that we use templating markdown to substitute in parameters from settings.json, and that once again glurmo makes certain script specific variables available to you. These are:

  • job_id: this is the id of the specific simulation. Note that you must set the job name to {{.job_id}} for glurmo to be able to properly manage your simulations.
  • error_path: this is the path to the error output file for this simulation, which will be /absolute/path/to/dir/slurm_errors/error___{index}
  • output_path: this is the path to the general output file for this simulation, which will be /absolute/path/to/dir/slurm_out/output___{index}
  • path_to_script: this is the absolute path to the script for this simulation, which will be /absolute/path/to/dir/scripts/script_{index}{extension}

Conclusion

If this all sounds somewhat abstract right now, that’s fine- it will become much clearer in the next section when we set up and run this simulation. The key takeaway is that using just these three files, we can specify the script, slurm script, and parameter settings of a simulation. This might seem like overkill right now, because it is- this particular example could be run just as well using a job array. But there are still a few benefits to using glurmo even in this simple setting, as we’ll see in the next section.

Last updated on 26 Aug 2024
Published on 26 Aug 2024
 Edit on GitHub