1. Getting Started
This notebook demonstrate a walkthrough example of how to run RADICAL-Pilot on any linux or macOS machine.
The tutorial includes an example of how to a execute workload of tasks using a Bag of Tasks approach (BOT).
1.1. Activate your environment
source ~/.virtualenvs/radical-pilot-env/bin/activate
1.2. Check the versions
!radical-stack
python : /home/workstation/.local/share/virtualenvs/radical-pilot-env/bin/python3
pythonpath :
version : 3.6.15
virtualenv : /home/workstation/.local/share/virtualenvs/radical-pilot-env
radical.gtod : 1.6.7
radical.pilot : 1.13.0-v1.13.0-161-gef63995ca@feature-issue_1578
radical.saga : 1.13.0
radical.utils : 1.13.0
Loading the environment variables from .env file. To read on how to setup .env for RP see this. RADICAL_PILOT_DBURL is required in .env file for RP to work.
[1]:
%load_ext dotenv
%dotenv ../../../.env
cannot find .env file
[2]:
import os
import sys
import radical.pilot as rp
import radical.utils as ru
1.3. Reporter for a better visualization
All code examples of this guide use the reporter facility of RADICAL-Utils to print well formatted runtime and progress information. You can control that output with the RADICAL_PILOT_REPORT variable, which can be set to TRUE or FALSE to enable / disable reporter output. We assume the setting to be TRUE when referencing any output in this chapter.
[3]:
report = ru.Reporter(name='radical.pilot')
report.title('Getting Started (RP version %s)' % rp.version)
================================================================================
Getting Started (RP version 1.18.1)
================================================================================
1.4. Setting up the session
Create a new session as it is the root object for all other objects in RADICAL-Pilot. A radical.pilot.Session is the root object for all other objects in RADICAL-Pilot. radical.pilot.PilotManager and radical.pilot.TaskManager instances are always attached to a Session, and their lifetime is controlled by the session.
[4]:
session = rp.Session()
new session: [rp.session.e8881108-70bb-11ed-ab2e-0242ac110002] \
database : [mongodb://kartikmodi:****@95.217.193.116:27017/rp_km] ok
[5]:
pmgr = rp.PilotManager(session=session)
tmgr = rp.TaskManager(session=session)
create pilot manager ok
create task manager ok
1.5. Resource configuration
RP supports 2 levels of resource configuration (pilot level and task level).
1.6. Pilot level resource specification
we first specify how many cores and gpus the pilot requires via pilot description:
[6]:
resource = 'local.localhost'
#TODO
# config = ru.read_json('%s/config.json'
# % os.path.dirname(__file__)).get(resource, {})
pd_init = {'resource' : 'local.localhost',
'runtime' : 30, # pilot runtime (min)
'exit_on_error' : True,
'project' : None,
'queue' : None,
'cores' : 1,
'gpus' : 0
}
pdesc = rp.PilotDescription(pd_init)
# pdesc.resource = 'local.localhost'
# pdesc.runtime = 30 # pilot runtime (min)
# pdesc.exit_on_error = True
# pdesc.cores = 4
# pdesc.gpus = 1
1.7. Setting up the Pilots
Pilots are created via a radical.pilot.PilotManager, by passing a radical.pilot.PilotDescription. The most important elements of the PilotDescription are:
resource: a label which specifies the target resource, either local or remote, on which to run the pilot, i.e., the machine on which the pilot executes; cores : the number of CPU cores the pilot is expected to manage, i.e., the size of the pilot; runtime : the numbers of minutes the pilot is expected to be active, i.e., the runtime of the pilot.
Now we created the pilot, let’s Launch it.
[7]:
pilot = pmgr.submit_pilots(pdesc)
submit 1 pilot(s)
pilot.0000 local.localhost 1 cores 0 gpus ok
It is required to register the pilot in a TaskManager object once the it is launced,
[8]:
tmgr.add_pilots(pilot)
1.8. Task level resource specification
For a more fine grained resource specification, RP allows to specify how many cores, threads per cpus and gpus the task requires via task description:
[9]:
def shell_task():
t = rp.TaskDescription()
t.stage_on_error = True
t.executable = '/bin/date'
t.cpu_processes = 1
t.cpu_threads = 2
t.gpu_processes = 1
t_gpu_threads = 1
return t
[10]:
bot = list() # Bag of tasks to append the launced tasks to it
for i in range(0, 10):
task = shell_task()
bot.append(task)
report.progress()
report.progress_done()
..........
Submit the previously created task descriptions to the PilotManager. This will trigger the selected scheduler to start assigning tasks to the pilots.
[11]:
tmgr.submit_tasks(bot)
submit: ########################################################################
[11]:
[<Task object, uid task.000000>,
<Task object, uid task.000001>,
<Task object, uid task.000002>,
<Task object, uid task.000003>,
<Task object, uid task.000004>,
<Task object, uid task.000005>,
<Task object, uid task.000006>,
<Task object, uid task.000007>,
<Task object, uid task.000008>,
<Task object, uid task.000009>]
Now Wait for all tasks to reach a final state (DONE, CANCELED or FAILED).
[12]:
tmgr.wait_tasks()
wait : ########################################################################
DONE : 10
ok
[12]:
['DONE',
'DONE',
'DONE',
'DONE',
'DONE',
'DONE',
'DONE',
'DONE',
'DONE',
'DONE']
Once the wait is finished, clean up the session
[13]:
report.header('finalize')
session.close(cleanup=True)
--------------------------------------------------------------------------------
finalize
closing session rp.session.e8881108-70bb-11ed-ab2e-0242ac110002 \
close task manager ok
close pilot manager \
wait for 1 pilot(s)
O/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\| 0 timeout
ok
session lifetime: 89.7s ok