.. _checkpointing: ********************************************* Checkpointing Overview ********************************************* .. Git Repo SHA1 ID: a1abdd291b75176d6581df41329781ae5d5e1b7d .. note:: The simulations and visualizations in this tutorial were generated using Blender 2.67 and CellBlender 1.0 RC. It may or may not work with other versions. Checkpointing allows you to stop a simulation at a specified iteration and resume it at some later point. This can be beneficial for several different reasons: * You are using any sort of multi-user system that requires you to share time with others * The computer you are using crashes or is shutdown unexpectedly * There are parameters you want to change partway through a simulation We'll cover how to set up checkpointing in the next several sections, starting with a simple case where we modify a couple parameters. .. contents:: :local: .. _checkpointing_mdl: Creating the MDL --------------------------------------------- Eventually, it will be possible to use checkpointing directly within CellBlender. Until that time, you can still do it by manually hand-editing some files. Inside of **/home/user/mcell_tutorial**, type the following command:: mkdir -p change_dc/change_dc_files/mcell .. note:: CellBlender expects a certain directory structure for visualization. That is why we are creating these sub-directories in a very specific way. Then within that directory (i.e. **/home/user/mcell_tutorial/change_dc/change_dc_files/mcell**), create a file called **change_dc1.mdl**. Add the following text to that file: .. code-block:: mdl :emphasize-lines: 1-3 CHECKPOINT_INFILE = "dc_chkpt" CHECKPOINT_OUTFILE = "dc_chkpt" CHECKPOINT_ITERATIONS = 100 ITERATIONS = 200 TIME_STEP = 1E-6 DEFINE_MOLECULES { vol1 {DIFFUSION_CONSTANT_3D = 1E-7} } INSTANTIATE World OBJECT { vol1_rel RELEASE_SITE { SHAPE = SPHERICAL LOCATION = [0,0,0] SITE_DIAMETER = 0.0 MOLECULE = vol1 NUMBER_TO_RELEASE = 100 } } sprintf(seed,"%05g",SEED) VIZ_OUTPUT { MODE = CELLBLENDER FILENAME = "./viz_data/seed_" & seed & "/Scene" MOLECULES { NAME_LIST {vol1} ITERATION_NUMBERS {ALL_DATA @ ALL_ITERATIONS} } } :index:`\ ` :index:`\ ` :index:`\ ` There are three new commands in this file (which have been highlighted): **CHECKPOINT_INFILE**, **CHECKPOINT_OUTFILE**, and **CHECKPOINT_ITERATIONS**. As we mentioned earlier, checkpointing allows you to stop a simulation and resume it later. This is accomplished by means of a checkpoint file that is written (**CHECKPOINT_OUTFILE**) when the simulation is temporarily stopped and later read (**CHECKPOINT_INFILE**) when the simulation is resumed. The value assigned to these two commands is the name of the file that is written or read. In this case, they both have the same name, although that is not required. **CHECKPOINT_ITERATIONS** indicates at what iteration the simulation is temporarily stopped and the checkpoint file is created. Now make a copy of **change_dc1.mdl** called **change_dc2.mdl** by entering the command:: cp change_dc1.mdl change_dc2.mdl Then change the diffusion constant from **1E-7** to **1E-5** in the second mdl. Once again, save and quit. Running the Simulation --------------------------------------------- Now run the first mdl by entering the command:: mcell change_dc1.mdl When it is finished running, enter the command:: ls Notice that a file called **dc_chkpt** was created. This file stores the information needed to recommence running the simulation. Let's finish it now by entering the command:: mcell change_dc2.mdl Visualizing the Results --------------------------------------------- Start Blender. Save your blend file with the name **change_dc.blend** in **/home/user/mcell_tutorial/change_dc**. Be careful to name it correctly, as the directory structure we set up earlier depends upon it. Normally, this is all handled automatically by CellBlender, but we must be careful when hand-editing files. Delete the default **Cube** now (select and hit **x**), since it's not actually a part of our simulation. Hit **Read Viz Data** under the **Visualize Simulation Results** panel. Hit **Alt-a** to begin playing back the animation. You will notice that the molecules start off moving rather slowly, and then speed up halfway through the simulation, coinciding with the change in diffusion constant. This is just a simple example of one parameter you can change. Here is a partial list of some other parameters that you could change: * **TIME_STEP** * reaction rates * **SURFACE_CLASS** properties (**ABSORPTIVE**, **TRANSPARENT**, **REFLECTIVE**) Time Based Checkpointing --------------------------------------------- Instead of checkpointing at a specific iteration, you can alternatively create a checkpoint at a set time. To do this, replace **CHECKPOINT_ITERATIONS** with **CHECKPOINT_REALTIME**. The value assigned to this is a series of numbers separated by colons. The units and formatting are illustrated below: * **days:hours:minutes:seconds** * **hours:minutes:seconds** * **minutes:seconds** * **seconds** For example, if you set **CHECKPOINT_REALTIME = 1:30**, then the simulation would create a checkpoint after running for 1 minute and 30 seconds. Or if you set **CHECKPOINT_ITERATIONS = 2:6:3:40**, then the simulation would create a checkpoint after running for 2 days, 6 hours, 3 minutes, and 40 seconds. If you want the simulation to automatically continue running after writing a checkpoint file, you have to put the keyword **NOEXIT** at the end of the **CHECKPOINT_REALTIME** command, like this: **CHECKPOINT_REALTIME = 1:30 NOEXIT**. You will know that a checkpoint file has been created, because MCell will report something like this while it is running:: MCell: time = 1098, writing to checkpoint file chkpt (periodic). Checkpointing with SIGUSR1 and SIGUSR2 --------------------------------------------- Sometimes, you need to end a simulation *right now*, but a lot of time can be wasted if you haven't checkpointed recently. To deal with this problem, pass the **SIGUSR1** or **SIGUSR2** flags to the **kill** command along with MCell's PID. If you use **SIGUSR1**, MCell will create a checkpoint and continue running. If you use **SIGUSR2**, MCell will create a checkpoint and end the simulation. You can use the **top** or **ps** commands to find MCell's PID. For example, if your MCell executable is called **mcell**, then type the following command while MCell is running:: ps -e | grep mcell This will output something similar to this:: 7984 pts/4 00:00:10 mcell The first number listed, **7984**, is the PID. Next, enter the following command (using your own PID in place of **7984**):: kill -SIGUSR1 7984 This creates a checkpoint and keeps the simulation running. However, to create a checkpoint and *kill* the simulation, you would enter the following command:: kill -SIGUSR2 7984 You will know that these worked if MCell reports something like this:: MCell: time = 1282, writing to checkpoint file chkpt (user signal detected).