1. Getting Started

This notebook demonstrate a walkthrough example of how to run RADICAL-Pilot on any linux or macOS machine.

The tutorial includes an example of how to a execute workload of tasks using a Bag of Tasks approach (BOT).

1.1. Activate your environment

source ~/.virtualenvs/radical-pilot-env/bin/activate

1.2. Check the versions

!radical-stack

python               : /home/workstation/.local/share/virtualenvs/radical-pilot-env/bin/python3
pythonpath           :
version              : 3.6.15
virtualenv           : /home/workstation/.local/share/virtualenvs/radical-pilot-env

radical.gtod         : 1.6.7
radical.pilot        : 1.13.0-v1.13.0-161-gef63995ca@feature-issue_1578
radical.saga         : 1.13.0
radical.utils        : 1.13.0

Loading the environment variables from .env file. To read on how to setup .env for RP see this. RADICAL_PILOT_DBURL is required in .env file for RP to work.

[1]:
%load_ext dotenv
%dotenv ../../../.env

cannot find .env file
[2]:
import os
import sys

import radical.pilot as rp
import radical.utils as ru

1.3. Reporter for a better visualization

All code examples of this guide use the reporter facility of RADICAL-Utils to print well formatted runtime and progress information. You can control that output with the RADICAL_PILOT_REPORT variable, which can be set to TRUE or FALSE to enable / disable reporter output. We assume the setting to be TRUE when referencing any output in this chapter.

[3]:
report = ru.Reporter(name='radical.pilot')
report.title('Getting Started (RP version %s)' % rp.version)

================================================================================
 Getting Started (RP version 1.18.1)
================================================================================


1.4. Setting up the session

Create a new session as it is the root object for all other objects in RADICAL-Pilot. A radical.pilot.Session is the root object for all other objects in RADICAL-Pilot. radical.pilot.PilotManager and radical.pilot.TaskManager instances are always attached to a Session, and their lifetime is controlled by the session.

[4]:
session = rp.Session()

new session: [rp.session.e8881108-70bb-11ed-ab2e-0242ac110002]                 \
database   : [mongodb://kartikmodi:****@95.217.193.116:27017/rp_km]           ok

[5]:
pmgr = rp.PilotManager(session=session)
tmgr = rp.TaskManager(session=session)
create pilot manager                                                          ok
create task manager                                                           ok

1.5. Resource configuration

RP supports 2 levels of resource configuration (pilot level and task level).

1.6. Pilot level resource specification

we first specify how many cores and gpus the pilot requires via pilot description:

[6]:
resource = 'local.localhost'
#TODO
# config = ru.read_json('%s/config.json'
#                             % os.path.dirname(__file__)).get(resource, {})

pd_init = {'resource'      : 'local.localhost',
                   'runtime'       : 30,  # pilot runtime (min)
                   'exit_on_error' : True,
                   'project'       : None,
                   'queue'         : None,
                   'cores'         : 1,
                   'gpus'          : 0
                  }
pdesc = rp.PilotDescription(pd_init)

# pdesc.resource       = 'local.localhost'
# pdesc.runtime        = 30  # pilot runtime (min)
# pdesc.exit_on_error  = True
# pdesc.cores          = 4
# pdesc.gpus           = 1

1.7. Setting up the Pilots

Pilots are created via a radical.pilot.PilotManager, by passing a radical.pilot.PilotDescription. The most important elements of the PilotDescription are:

resource: a label which specifies the target resource, either local or remote, on which to run the pilot, i.e., the machine on which the pilot executes; cores : the number of CPU cores the pilot is expected to manage, i.e., the size of the pilot; runtime : the numbers of minutes the pilot is expected to be active, i.e., the runtime of the pilot.

Now we created the pilot, let’s Launch it.

[7]:
pilot = pmgr.submit_pilots(pdesc)
submit 1 pilot(s)
        pilot.0000   local.localhost           1 cores       0 gpus           ok

It is required to register the pilot in a TaskManager object once the it is launced,

[8]:
tmgr.add_pilots(pilot)

1.8. Task level resource specification

For a more fine grained resource specification, RP allows to specify how many cores, threads per cpus and gpus the task requires via task description:

[9]:
def shell_task():
    t = rp.TaskDescription()
    t.stage_on_error = True
    t.executable     = '/bin/date'
    t.cpu_processes  = 1
    t.cpu_threads    = 2
    t.gpu_processes  = 1
    t_gpu_threads    = 1
    return t
[10]:
bot = list() # Bag of tasks to append the launced tasks to it
for i in range(0, 10):
    task = shell_task()
    bot.append(task)
    report.progress()

report.progress_done()
..........

Submit the previously created task descriptions to the PilotManager. This will trigger the selected scheduler to start assigning tasks to the pilots.

[11]:
tmgr.submit_tasks(bot)
submit: ########################################################################

[11]:
[<Task object, uid task.000000>,
 <Task object, uid task.000001>,
 <Task object, uid task.000002>,
 <Task object, uid task.000003>,
 <Task object, uid task.000004>,
 <Task object, uid task.000005>,
 <Task object, uid task.000006>,
 <Task object, uid task.000007>,
 <Task object, uid task.000008>,
 <Task object, uid task.000009>]

Now Wait for all tasks to reach a final state (DONE, CANCELED or FAILED).

[12]:
tmgr.wait_tasks()
wait  : ########################################################################
     DONE      :    10
                                                                              ok

[12]:
['DONE',
 'DONE',
 'DONE',
 'DONE',
 'DONE',
 'DONE',
 'DONE',
 'DONE',
 'DONE',
 'DONE']

Once the wait is finished, clean up the session

[13]:
report.header('finalize')
session.close(cleanup=True)

--------------------------------------------------------------------------------
finalize

closing session rp.session.e8881108-70bb-11ed-ab2e-0242ac110002                \
close task manager                                                            ok
close pilot manager                                                            \
wait for 1 pilot(s)
        O/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|      0                                                          timeout
                                                                              ok
session lifetime: 89.7s                                                       ok