13. Release Notes

For a list of open issues and known problems, see: https://github.com/radical-cybertools/radical.pilot/issues/

13.1. 1.21.0 Release 2023-02-01

  • add worker rank heartbeats to raptor

  • ensure descr defaults for raptor worker submission

  • move blocked_cores/gpus under system_architecture in resource config

  • fix blocked_cores/gpus parameters in configs for ACCESS and ORNL resources

  • fix core-option in JSRun LM

  • fix inconsistency in launching order if some LMs failed to be created

  • fix thread-safety of PilotManager staging operations.

  • add ANL’s polaris and polaris_interactive support

  • refactor raptor dispatchers to worker base class

13.2. 1.20.1 Hotfix Release 2023-01-07

  • fix task cancellation call

13.3. 1.20.0 Release 2022-12-16

  • interactive amarel cfg

  • add docstring for run_task, remove sort

  • add option -r (number of RS per node) is case of GPU tasks

  • add TaskDescription attribute pre_exec_sync

  • add test for Master.wait

  • add test for tasks cancelling

  • add test for TMGR StagingIn

  • add comment for config addition Fixes #2089

  • add TASK_BULK_MKDIR_THRESHOLD as configurable Fixes #2089

  • agent does not need to pull failed tasks

  • bump python test env to 3.7

  • cleanup error reporting

  • document attributes as attr, not data.

  • extended tests for RM PBSPro

  • fix allocated_cores/gpus in PMGR Launching

  • fix commands per rank (either a single string command or list of commands)

  • fix JSRun test

  • fix nodes indexing (node_id)

  • fix option -b (–bind)

  • fix setup procedure for agent staging test(s)

  • fix executor test

  • fix task cancelation if task is waiting in the scheduler wait queue

  • fix Sphinx syntax.

  • fix worker state statistics

  • implement task timeout for popen executor

  • refactor popen task cancellation

  • removed pre_rank and post_rank from Popen executor

  • rename XSEDE to ACCESS #2676

  • reorder env setup per rank (by RP) and consider (enforce) CPU/GPU types

  • reorganized task/rank-execution processes and synced that with launch processes

  • support schema aliases in resource configs

  • task attribute slots is not required in an executor

  • unify raptor and non-raptor prof traces

  • update amarel cfg

  • update RM Fork

  • update RM PBSPro

  • update SRun option cpus-per-task - set the option if cpu_threads > 0

  • update test for PMGR Launching

  • update test for Popen (for pre/post_rank transformation)

  • update test for RM Fork

  • update test for JSRun (w/o ERF)

  • update test for RM PBSPro

  • update profile events for raptor tasks

  • interactive amarel cfg

13.4. 1.18.1 Hotfix Release 2022-11-01

  • fix Amarel configuration

13.5. 1.18.0 Release 2022-10-11

  • move raptor profiles and logfiles into sandboxes

  • consistent use of task modes

  • derive etypes from task modes

  • clarify and troubleshoot raptor.py example

  • docstring update

  • make sure we issue a bootstrap_0_stop event

  • raptor tasks now create rank_start/ranks_stop events

  • reporte allocated resources for RA

  • set MPIRun as default LM for Summit

  • task manager cancel wont block: fixes #2336

  • update task description (focus on ranks)

13.6. 1.17.0 Release 2022-09-15

  • add docker compose recipe.

  • add option -gpu for IBM Spectrum MPI

  • add comet resource config

  • add doc of env variable

  • add interactive schema to frontera config

  • add rcfg inspection utilities

  • also tarball log files, simplify code

  • clarify semantics on file and pwd schemas

  • document programmatical inspection resource definitions

  • ensure RADICAL_SMT setting, document for end user

  • fixed session cache (resolved cachedir)

  • fix ornl resource sbox and summit interactive mode

  • fix session test cleanup

  • keep Spock’s resource config in sync with Crusher’s config

  • make pilot launch and bootstrap CWD-independent

  • make staging schemas consistent for pilot and task staging

  • only use major and minor version for prep_env spec version

  • pilot profiles and logfiles are now transferred as tarball #2663

  • fix scheduler termination

  • remove deprecated FUNCS executor

  • support RP within interactive jobs

  • simple attempt on api level reconnect

  • stage_in.target fix for absolute path Fixes #2590

  • update resource config for Crusher@ORNL

  • use current working tree for docker rp source.

13.7. 1.16.0 Release 2022-08-15

  • add check for exception message

  • add test for Agent_0

  • fix cpu_threads for special tasks (service, sub-agent)

  • fix task[‘resources’] value

  • fix uid generation for components (use shared file for counters)

  • fix master task tmgr

  • fix raptor tests

  • fix rp serializer unittest

  • fix sub_agent keyerror

  • keep agent’s config with sub-agents in sync with default one

  • remove confusion of task attribute names (slots vs. resources)

  • set default values for agent and service tasks descriptions

  • set env variable (RP_PILOT_SANDBOX) for agent and service tasks launchers

  • update exec profile events

  • update headers for mpirun- and mpiexec-modules

  • update LM env setup for MPIRun and MPIExec special case (MPT=true)

  • update LM IBRun

  • update mpi-info extraction

13.8. 1.15.1 Hotfix Release 2022-07-04

  • fix syntactic error in env prep script

13.9. 1.15.0 Release 2022-07-04

  • added tests for PRTE LM

  • added tests for rank_cmd (IBRun and SRun LMs)

  • adding TMGR stats

  • adding xsede.expanse to the resource config

  • always interprete prep_env version request

  • anaconda support for prepare_env

  • Checking input staging exists before tar-ing Fixes #2483

  • ensure pip in venv mode

  • fixed _rm_info in IBRun LM

  • fixed status callback for SAGA Launcher

  • fixed type in ornl.summit_prte config

  • fix Ibrun set rank env

  • fix raptor env vals

  • use os.path to check if file exists Fixes #2483

  • remove node names duplication in SRun LM command

  • hide node-count from saga job description

  • ‘state_history’ is no longer supported

  • support existing VEs for prepare_env

  • updated installation of dependencies in bootstrapper

  • updated PRTE LM setup and config (including new release of PRRTE on Summit)

  • updating PMGR/AGENT stats - see #2401

13.10. 1.14.0 Release 2022-04-13

  • support for MPI function tasks

  • support different RAPTOR workers

  • simplify / unify task and function descriptions

  • refactor resource aquisition

  • pilot submission via PSIJ or SAGA

  • added resource config for Crusher@OLCF/ORNL

  • support for execution of serialized function

  • pilot size can now be specified in number of nodes

  • support for PARSL integration

  • improved SMT handling

  • fixed resource configuration for jsrun

  • fix argument escapes

  • raptor consistently reports exceptions now

13.11. 1.13.0 Release 2022-03-21

  • fix slurm nodefile/nodelist

  • clean temporary setup files

  • fixed test for LM Srun

  • local execution needs to check FORK first

  • fix Bridges-2 resource config

13.12. 1.12.0 Release 2022-02-28

  • fix callback unregistration

  • fix capturing of task exit code

  • fix srun version command

  • fix metric setup / lookup in tmgr

  • get backfilling scheduler back in sync

  • re-introduced LM to handle aprun

  • Remove task log and the state_history

  • ru.Description -> ru.TypedDict

  • set LM’s initial env with activated VE

  • updated LSF handling cores indexing for LM JSRun

  • use proper shell quoting

  • use ru.TypedDict for Munch, fix tests

13.13. 1.11.2 Hotfix Release 2022-01-21

  • for non-mpi tasks, ensure that $RP_RANK is set to 0

13.14. 1.11.0 Release 2022-01-19

  • improve environment isolation for tasks and RCT components

  • add test for LM Srun

  • add resource manager instance to Executor base class

  • add test for blocked cores and gpus parameters (RM base)

  • add unittest to test LM base class initialization from Registry

  • add raptor test

  • add prepare_env example

  • add raptor request and result cb registration

  • avoid shebang use during bootstrap, pip sometimes screws it up

  • detect slurm version and use node file/list

  • enable nvme on summit

  • ensure correct out/err file paths

  • extended GPU handling

  • fix configs to be aligned with env isolation setup

  • fix LM PRTE rank setup command

  • fix cfg.task_environment handling

  • simplify BS env setup

  • forward resource reqs for raptor tasks

  • iteration on flux executor integration

  • limit pymongo version

  • provision radical-gtod

  • reconcile named env with env isolation

  • support Spock

  • support ALCF/JLSE Arcticus and Iris testbeds

  • fix staging behavior under stage_on_error

  • removed dead code

13.15. 1.10.2 Hotfix Release 2021-12-14

  • constrain mongodb version dependency

13.16. 1.10.0 Release 2021-11-22

  • Add fallback for ssh tunnel on ifconfig-less nodes

  • cleanup old resources

  • removed OSG leftovers

  • updating test cases

  • fix recursive flag

13.17. 1.9.2 Hotfix Release 2021-10-27

  • fix shell escaping for task arguments

13.18. 1.9.0 Release 2021-10-18

  • amarel cfg

13.19. 1.8.0 Release 2021-09-23

  • fixed pilot staging for input directories

  • clean up configs

  • disabled os.setsid in Popen executor/spawner (in subprocess.Popen)

  • refreshed module list for Summit

  • return virtenv setup parameters

  • Support for radical.pilot.X links. (@eirrgang)

  • use local virtual env (either venv or conda) for Summit

13.20. 1.6.8 Hotfix Release 2021-08-24

  • adapt flux integration to changes in flux event model

  • fix a merge problem on flux termination handling

13.21. 1.6.7 Release 2021-07-09

  • artifact upload for RA integration test

  • encapsulate kwargs handling for Session.close().

  • ensure state updates

  • fail tasks which can never be scheduled

  • fixed jsrun resource_set_file to use cpu_index_using: logical

  • separate cpu/gpu utilization

  • fix error handling in data stager

  • use methods from the new module host within RU (>=1.6.7)

13.22. 1.6.6 Release 2021-05-18

  • added flags to keep prun aware of gpus (PRTE2 LM)

  • add service node support

  • Bridges mpiexec confing fix

  • task level profiling now python independent

  • executor errors should not affect task bulks

  • revive ibrun support, include layout support

  • MPI standard prescribes -H, not -host

  • remove pilot staging area

  • reduce profiling verbosity

  • restore original env before task execution

  • scattered repex staging fixes

  • slurm env fixes

  • updated documentation for PilotDescription and TaskDescription

13.23. 1.6.5 Release 2021-04-14

  • added flag exclusive for tags (in task description, default False)

  • Adding Bridges2 and Comet

  • always specifu GPU number on srun

  • apply RP+* env vars to raptor tasks

  • avoid a termination race

  • Summit LFS config and JSRUN integration tests

  • gh workflows and badges

  • ensure that RU lock names are unique

  • fixed env creation command and updated env setup check processes

  • fixed launch command for PRTE2 LM

  • fix missing event updates

  • fix ve isolation for prep_env

  • keep track of tagged nodes (no nodes overlapping between different tags)

  • ensure conda activate works

  • allow output staging on failed tasks

  • python 2 -> 3 fix for shebangs

  • remove support for add_resource_config

  • Stampede2 migrates to work2 filesystem

  • update setup module (use python3)

13.24. 1.6.3 Hotfix Release 2021-04-03

  • fix uid assignment for managers

13.25. 1.6.2 Hotfix Release 2021-03-26

  • switch to pep-440 for sdist and wheel versioning, to keep pip happy

13.26. 1.6.1 Release 2021-03-09

  • support for Andes@ORNL, obsolete Rhea@ORNL

  • add_pilot() also accepts pilot dict

  • fixed conda activation for PRTE2 config (Summit@ORNL)

  • fixed partitions handling in LSF_SUMMIT RM

  • reorganized DVM start process (prte2)

  • conf fixes for comet

  • updated events for PRTE2 LM

  • integration test for Bridges2

  • prepare partitioning

13.27. 1.6.0 Release 2021-02-13

  • rename ComputeUnit -> Task

  • rename ComputeUnitDescription -> TaskDescription

  • rename ComputePilot -> Pilot

  • rename ComputePilotDescription -> PilotDescription

  • rename UnitManager -> TaskManager

  • related renames to state and constant names etc

  • backward compatibility for now deprecated names

  • preparation for agent partitioning (RM)

  • multi-DVM support for PRTE.v1 and PRTE.v2

  • RM class tests

  • Bridges2 support

  • fix to co-scheduling tags

  • fix handling of IP variable in bootstrap

  • doc and test updates, linter fixes, etc

  • update scheduler tag types

13.28. 1.5.12 Release 2021-02-02

  • multi-dvm support

  • cleanup of raptor

  • fix for bootstrap profiling

  • fix help string in bin/radical-pilot-create-static-ve

  • forward compatibility for tags

  • fix data stager for multi-pilot case

  • parametric integration tests

  • scattered fixes for raptor and sub-agent profiling

  • support new resource utilization plots

13.29. 1.5.11 Release 2021-01-19

  • cleanup pypi tarball

13.30. 1.5.10 Release 2021-01-18

  • gpu related fixes (summit)

  • avoid a race condition during termination

  • fix bootstrapper timestamps

  • fixed traverse config

  • fix nod counting for FORK rm

  • fix staging context

  • move staging ops into separate worker

  • use C locale in bootstrapper

13.31. 1.5.8 Release 2020-12-09

  • improve test coverage

  • add env isolation prototype and documentation

  • change agent launcher to ssh for bridges

  • fix sub agent init

  • fix Cheyenne support

  • define an intel-friendly bridges config

  • add environment preparation to pilot

  • example fixes

  • fixed procedure of adding resource config to the session

  • fix mpiexec_mpt LM

  • silence scheduler log

  • removed resource aliases

  • updated docs for resource config

  • updated env variable RADICAL_BASE for a job description

  • work around pip problem on Summit

13.32. 1.5.7 Release 2020-10-30

  • Adding init files in all test folders

  • document containerized tasks

  • Fix #2221

  • Fix read_config

  • doc fixes / additions

  • adding unit tests, component tests

  • remove old examples

  • fixing rp_analytics #2114

  • inject workers as MPI task

  • remove debug prints

  • mpirun configs for traverse, stampede2

  • ru.Config is responsible to pick configs from correct paths

  • test agent execution/base

  • unit test for popen/spawn #1881

13.33. 1.5.4 Release 2020-10-01

  • fix jsrun GPU mapping

13.34. 1.5.4 Release 2020-09-14

  • Arbitrary udurations for consumed resources

  • Fix unit tests

  • Fix python stack on Summit

  • add module test

  • added PRTE2 for PRRTEv2

  • added attribute for SAGA job description using env variable (SMT)

  • added config for PRRTE launch method at Frontera

  • added test for PRTE2

  • added test for rcfg parameter SystemArchitecture

  • allow virtenv_mode=local to reuse client ve

  • bulk communication for task overlay

  • fixed db close/disconnect method

  • fixed tests and pylint

  • PRTE fixes / updates

  • remove “debug” rp_version remnant

13.35. 1.5.2 Hotfix Release 2020-08-11

  • add/fix RA prof metrics

  • clean dependencies

  • fix RS file system cache

13.36. 1.5.1 Hotfix Release 2020-08-05

  • added config parameter for MongoDB tunneling

  • applied exception chaining

  • filtering for login/batch nodes that should not be considered (LSF RM)

  • fix for Resource Set file at JSRUN LM

  • support memory required per node at the RP level

  • added Profiler instance into Publisher and Subscriber (zmq.pubsub)

  • tests added and fixed

  • configs for Lassen, Frontera

  • radical-pilot-resources tool

  • document event model

  • comm bulking

  • example cleanup

  • fix agent base dir

  • Fix durations and add defaults for app durations

  • fixed flux import

  • fixing inconsistent nodelist error

  • iteration on task overlay

  • hide passwords on dburl reports / logs

  • multi-master load distribution

  • pep8

  • RADICAL_BASE_DIR -> RADICAL_BASE

  • remove private TMPDIR export - this fixes #2158

  • Remove SKIP_FAILED (unused)

  • support for custom batch job names

  • updated cuda hook for JSRUN LM

  • updated license file

  • updated readme

  • updated version requirement for python (min is 3.6)

13.37. 1.4.1 Hotfix Release 2020-06-09

  • fix tmpdir mosconfiguration for summit / prrte

13.38. 1.4.0 Release 2020-05-12

  • merge #2122: fixed n_nodes for the case when slots are set

  • merge #2123: fix #2121

  • merge #2124: fixed conda-env path definition

  • merge #2127: bootstrap env fix

  • merge #2133, #2138: IBRun fixes

  • merge #2134: agent stage_in test1

  • merge #2137: agent_0 initialization fix

  • merge #2142: config update

  • add deactivate support for tasks

  • add cancelation example

  • added comet_mpirun to resource_xsede.json

  • added test for launch method “srun”

  • adding cobalt test

  • consistent process counting

  • preliminary FLUX support

  • fix RA utilization in case of no agent nodes

  • fix queue naming, prte tmp dir and process count

  • fix static ve location

  • fixed version discovery (srun)

  • cleanup bootstrap_0.sh

  • separate tacc and xsede resources

  • support for Princeton’s Traverse cluster

  • updated IBRun tests

  • updated LM IBRun

13.39. 1.3.0 Release 2020-04-10

  • task overlay + docs

  • iteration on srun placement

  • add env support to srun

  • theta config

  • clean up launcher termination guard against lower level termination errors

  • cobalt rm

  • optional output stager

  • revive ibrun support

  • switch comet FS

13.40. 1.2.1 Hotfix Release 2020-02-11

  • scattered fixes cfor summit

13.41. 1.2.0 Release 2020-02-11

  • support for bulk callbacks

  • fixed package paths for launch methods (radical.pilot.agent.launch_method)

  • updated documentation references

  • raise minimum Python version to 3.6

  • local submit configuration for Frontera

  • switch frontera to default agent cfg

  • fix cray agent config

  • fix issue #2075 part 2

13.42. 1.1.1 Hotfix Release 2020-02-11

  • fix dependency version for radical.utils

13.43. 1.1 Release 2020-02-11

  • code cleanup

13.44. 1.0.0 Release 2019-12-24

  • transition to Python3

  • migrate Rhea to Slurm

  • ensure PATH setting for sub-agents

  • CUDA is now handled by LM

  • fix / improve documentation

  • Sched optimization: task lookup in O(1)

  • Stampede2 prun config

  • testing, flaking, linting and travis fixes

  • add pilot.stage_out (symmetric to pilot.stage_in)

  • add noop sleep executor

  • improve prrte support

  • avoid state publish during idle times

  • cheyenne support

  • default to cont scheduler

  • configuration system revamp

  • heartbeat based process management

  • faster termination

  • support for Frontera

  • lockfree scheduler base class

  • switch to RU ZMQ layer

13.45. 0.90.1 Release 2019-10-12

  • port pubsub hotfix

13.46. 0.90.0 Release 2019-10-07

  • transition to Python3

13.47. 0.73.1 Release 2019-10-07

  • Stampede-2 support

13.48. 0.72.2 Hotfix Release 2019-09-30

  • fix sandbox setting on absolute paths

13.49. 0.72.0 Release 2019-09-11

  • implement function executor

  • implement / improve PRTE launch method

  • PRTE profiling support (experimental)

  • agent scheduler optimizations

  • summit related configuration and fixes

  • initial frontera support

  • archive ORTE

  • increase bootstrap timeouts

  • consolidate MPI related launch methods

  • unit testing and linting

  • archive ORTE, issue #1915

  • fix get_mpi_info for Open MPI

  • base classes to raise notimplemented. issue #1920

  • remove outdated resources

  • ensure that pilot env reaches func executor

  • ensureID uniqueness across processes

  • fix inconsistencies in task sandbox handling

  • fix gpu placement alg

  • fix issue #1910

  • fix torque nodefile name and path

  • add metric definitions in RA support

  • make DB comm bulkier

  • expand resource configs with pilot description keys

  • better tiger support

  • add NOOP scheduler

  • add debug executor

13.50. 0.70.3 Hotfix Release 2019-08-02

  • fix example and summit configuration

13.51. 0.70.2 Hotfix Release 2019-07-31

  • fix static ve creation for Tiger (Princeton)

13.52. 0.70.1 Hotfix Release 2019-07-30

  • fix configuration for Tiger (Princeton)

13.53. 0.70.0 Release 2019-07-07

  • support summitdev, summit @ ORNL (JSRUN, PRTE, RS, ERF, LSF, SMT)

  • support tiger @ princeton (JSRUN)

  • implement NOOP scheduler

  • backport application communicators from v2

  • ensure session close on some tests

  • continous integration: pep8, travis, increasing test coverage

  • fix profile settings for several LMs

  • fix issue #1827

  • fix issue #1790

  • fix issue #1759

  • fix HOMBRE scheduler

  • remove cprof support

  • unify mpirun / mpirun_ccmrun

  • unify mpirun / mpirun_dplace

  • unify mpirun / mpirun_dplace

  • unify mpirun / mpirun_dplace

  • unify mpirun / mpirun_mpt

  • unify mpirun / mpirun_rsh

13.54. 0.63.0 Release 2019-06-25

  • support for summit (experimental, jsrun + ERF)

  • PRRTE support (experimental, summit only)

  • many changes to the test setup (pytest, pylint, flake8, coverage, travis)

  • support for Tiger (adds SRUN launch method)

  • support NOOP scheduler

  • support application level communication

  • support ordered scheduling of tasks

  • partial code cleanup (coding guidelines)

  • simplifying MPI base launch methods

  • support for resource specific SMT settings

  • resource specific ranges of cores/threads can now be blocked from use

  • ORTE support is doscontinued

  • fixes in hombre scheduler

  • improvements on GPU support

  • fix in continuous scheduler which caused underutilization on heterogeneous tasks

  • fixed: #1758, #1764, #1792, #1790, #1827, #187

13.55. 0.62.0 Release 2019-06-08

  • add unit test

  • trigger tests

  • remove obsolete fifo scheduler (use the ordered scheduler instead)

  • add ordered scheduler

  • add tiger support

  • add ssh access to cheyenne

  • cleanup examples

  • fix dplace support

  • support app specified task sandboxes

  • fix pilot statepush over tunnels

  • fix titan ve creation, add new static ve

  • fix for cheyenne

13.56. 0.61.0 Release 2019-05-07

  • add travis support, test cleanup

  • ensure safe bootstrapper termination on faulty installs

  • push node_list to mongodb for analytics

  • fix default dburl

  • fix imports in tests

  • remove deprecated special case in bootstrapper

13.57. 0.60.1 Hotfix 2019-04-12

  • work around a pip install problem

13.58. 0.60.0 Release 2019-04-10

  • add issue template

  • rename RP_PILOT_SBOX to RP_PILOT_STAGING and expose to tasks

  • fix bridges default partition (#1816)

  • fix #1826

  • fix off-by-one error on task state check

  • ignore failing DB disconnect

  • follow rename of saga-python to radical.saga

13.59. 0.50.23 Release 2019-03-20

  • hotfix: use popen spawner for localhost

13.60. 0.50.22 Release 2019-02-11

  • another fix LSF var expansion

13.61. 0.50.21 Release 2018-12-19

  • fix LSF var expansion

13.62. 0.50.20 Release 2018-11-25

  • fix Titan OMPI installation

  • support metdata for tasks

  • fix git error detection during setup

13.63. 0.50.19 Release 2018-11-15

  • ensure profile fetching on empty tarballs

13.64. 0.50.18 Release 2018-11-13

  • support for data locality aware scheduling

13.65. 0.50.17 Release 2018-10-31

  • improve event documentation

  • support Task level metadata

13.66. 0.50.16 Release 2018-10-26

  • add new shell spawner as popen replacement

13.67. 0.50.15 Release 2018-10-24

  • fix recursive pilot staging

13.68. 0.50.14 Release 2018-10-24

  • add Cheyenne support - thanks Vivek!

13.69. 0.50.13 Release 2018-10-16

  • survive if SAGA does not support job.name (#1744)

13.70. 0.50.12 Release 2018-10-12

  • fix stacksize usage on BW

13.71. 0.50.11 Release 2018-10-09

  • fix ‘getting_started’ example (no MPI)

13.72. 0.50.10 Release 2018-09-29

  • ensure the correct code path in SAGA for Blue Waters

13.73. 0.50.9 Release 2018-09-28

  • fix examples

  • fix issue #1715 (#1716)

  • remove Stampede’s resource configs. issue #1711

  • supermic does not like curl -1 (#1723)

13.74. 0.50.8 Release 2018-08-03

  • make sure that CUD values are not None (#1688)

  • don’t limit pymongo version anymore (#1687)

13.75. 0.50.7 Release 2018-08-01

  • fix bwpy handling

13.76. 0.50.6 Release 2018-07-31

  • fix curl tssl negotiation problem (#1683)

13.77. 0.50.5 Release 2018-07-30

  • fix default values for process and thread types (#1681)

  • fix outdated links in ompi deploy script

  • fix/issue 1671 (#1680)

  • fix scheduler config checks (#1677)

13.78. 0.50.4 Release 2018-07-13

  • set oversubscribe default to True

13.79. 0.50.3 Release 2018-07-11

  • disable rcfg expnsion

13.80. 0.50.2 Release 2018-07-08

  • fix relative tarball unpack paths

13.81. 0.50.1 Release 2018-07-05

  • GPU support

  • many bug fixes

13.82. 0.47.14 Release 2018-06-13

  • fix recursive output staging

13.83. 0.47.13 Release 2018-06-02

  • catch up with RU log, rep and prof settings

13.84. 0.47.12 Release 2018-05-19

  • ensure that tasks are started in their own process group, to ensure clean cancellation semantics.

13.85. 0.47.11 Release 2018-05-08

  • fix schemas on BW (local orte, local aprun)

13.86. 0.47.10 Release 2018-04-19

  • fix #1602

13.87. 0.47.9 Release 2018-04-18

  • fix default scheduler for localhost

13.88. 0.47.8 Release 2018-04-16

  • hotfix to catch up with pypi upgrade

13.89. 0.47.7 Release 2018-04-15

  • bugfix related to radical.entk #255

13.90. 0.47.6 Release 2018-04-12

  • bugfix related to #1590

13.91. 0.47.5 Release 2018-04-12

  • make sure a dict object exists even on empty env settings (#1590)

13.92. 0.47.4 Release 2018-03-20

  • fifo agent scheduler (#1537)

  • hombre agent scheduler (#1536)

  • Fix/issue 1466 (#1544)

  • Fix/issue 1501 (#1541)

  • switch to new OMPI deployment on titan (#1529)

  • add agent configuration doc (#1540)

13.93. 0.47.3 Release 2018-03-20

  • add resource limit test

  • add tmp cheyenne config

  • api rendering proposal for partitions

  • fix bootstrap sequence (BW)

  • tighten bootstrap process, add documentation

13.94. 0.47.2 Release 2018-02-28

  • fix issue 1538

  • fix issue 1554

  • expose profiler to LM hooks (#1522)

  • fix bin names (#1549)

  • fix event docs, add an event for symmetry (#1527)

  • name attribute has been changed to uid, fixes issue #1518

  • make use of flags consistent between RP and RS (#1547)

  • add support for recursive data staging (#1513. #1514) (JD, VB, GC)

  • change staging flags to integers (inherited from RS)

  • add support for bulk data transfer (#1512) (IP, SM)

13.95. 0.47 Release 2017-11-19

  • Correctly added ‘lm_info.cores_per_node’ SLURM

  • Torque RM now respects config settings for cpn

  • Update events.md

  • add SIGUSR2 for clean termination on SGE

  • add information about partial event orders

  • add issue demonstrators

  • add some notes on cpython issue demonstrators

  • add xsede.supermic_orte configuration

  • add xsede.supermic_ortelib configuration

  • apply RU’s managed process to termination stress test

  • attempt to localize aprun tasks

  • better hops for titan

  • better integration of Task script and app profs

  • catch up with config changes for local testing

  • centralize URL derivation for pilot job service endpoints, hops, and sandboxes

  • clarify use of namespace vs. full qualified URL in the context of RP file staging

  • clean up config management, inheritance

  • don’t fetch json twice

  • ensure that profiles are flushed and packed correctly

  • fail missing pilots on termination

  • fix AGENT_ACTIVE profile timing

  • fix close-session purge mode

  • fix cray agent config, avoid termination race

  • fix duplicated transition events

  • fix osg config

  • fix #1283

  • fixing error from bootstrapper + aprun parsing error

  • force profile flush earlier

  • get cpn for ibrun

  • implement session.list_resources() per #1419

  • make sure a canceled pilot stays canceled

  • make cb return codes consistent

  • make sure profs are flushed on termination

  • make sure the tmgr only pulls tasks its interested in

  • profile mkdir

  • publish resource_details (incl. lm_info) again

  • re-add a profile flag to advance()

  • remove old controllers

  • remove old files

  • remove uid clashes for sub-agent components and components in general

  • setup number of cores per node on stampede2

  • smaller default pilot size for supermic

  • switch to ibrun for comet_ssh

  • track task drops

  • use js hop for untar

  • using new process class

  • GPU/CPU pinning test is now complete, needs some env settings in the launchers

13.96. 0.46.2 Release 2017-09-02

  • hotfix for #1426 - thanks Iannis!

13.97. 0.46.1 Release 2017-08-23

  • hotfix for #1415

13.98. Version 0.46 2017-08-11

  • TODO

13.99. 0.45.3 Release 2017-05-09

  • Documentation update for the BW tutorial

13.100. 0.45.1 Release 2017-03-05

  • NOTE: OSG and ORTE_LIB on titan are considered unsupported. You can enable

    those resources for experiments by setting the enabled keys in the respective config entries to true.

  • hotfix the configurations markers above

13.101. 0.45 Release 2017-02-28

  • NOTE: OSG and ORTE_LIB on titan are considered unsupported. You can enable

    those resources for experiments by removing the comment markers from the respective resource configs.

  • Adapt to new orte-submit interface.

  • Add orte-cffi dependency to bootstrapper.

  • Agent based staging directives.

  • Fixes to various resource configs

  • Change orte-submit to orterun.

  • Conditional importing of executors. Fixes #926.

  • Config entries for orte lib on Titan.

  • Corrected environment export in executing POPEN

  • Extend virtenv lock timeout, use private rp_installs by default

  • Fix non-mpi execution analogous to #975.

  • Fix/issue 1226 (#1232)

  • Fresh orte installation for bw.

  • support more OSG sites

  • Initial version of ORTE lib interface.

  • Make cprofiling of scheduler conditional.

  • Make list of cprofile subscribers configurable.

  • Move env safekeeping until after the pre bootstrap.

  • Record OSG site name in mongodb.

  • Remove bash’isms from shell script.

  • pylint motivated cleanups

  • Resolving issue #1211.

  • Resource and example config for Shark at LUMC.

  • SGE changes for non-homogeneous nodes.

  • Use ru.which

  • add allegro.json config file for FUB allegro cluster

  • add rsh launch method

  • switch to gsissh on wrangler

  • use new ompi installation on comet (#1228)

  • add a simple/stupid ompi deployment helper

  • updated Config for Stampede and YARN

  • fix state transition to UNSCHEDDULED to avoid repetition and invalid state ordering

13.102. 0.44.1 Release 2016-11-01

  • add an agent config for cray/aprun all on mom node

  • add anaconda config for examples

  • gsissh as default for wrangler, stampede, supermic

  • add conf for spark n wrangler, comet

  • add docs to the cu env inject

  • expose spark’s master url

  • fix Task env setting (stampede)

  • configuration for spark and anaconda

  • resource config entries for titan

  • disable PYTHONHOME setting in titan_aprun

  • dynamic configuration of spark_env

  • fix for gordon config

  • hardcode the netiface version until it is fixed upstream.

  • implement NON_FATAL for staging directives.

  • make resource config available to agent

  • rename scripts

  • update installation.rst

  • analytics backport

  • use profiler from RU

  • when calling a task state callback, missed states also trigger callbacks

13.103. 0.43.1 Release 2016-09-09

  • hotfix: fix netifaces to version 0.10.4 to avoid trouble on BlueWaters

13.104. 0.43 Release 2016-09-08

  • Add aec_handover for orte.

  • add a local confiuration for bw

  • add early binding eample for osg

  • add greenfield config (only works for single-node runs at the moment)

  • add PYTHONPATH to the vars we reset for Task envs

  • allow overloading of agent config

  • fix #1071

  • fix synapse example

  • avoid profiling of empty state transitions

  • Check of YARN start-all script. Raising Runtime error in case of error.

  • disable hwm altogether

  • drop clones before push

  • enable scheduling time measurements.

  • First commit for multinode YARN cluster

  • fix getip

  • fix iface detection

  • fix reordering of states for some update sequences

  • fix task cancellation

  • improve ve create script

  • make orte-submit aware of non-mpi CUs

  • move env preservation to an earlier point, to avoid pre-exec stuff

  • Python distribution mandatory to all confs

  • Remove temp agent config directory.

  • Resolving #1107

  • Schedule behind the real task and support multicore.

  • SchedulerContinuous -> AgentSchedulingComponent.

  • Take ccmrun out of bootstrap_2.

  • Tempfile is not a tempfile so requires explicit removal.

  • resolve #1001

  • Unbreak CCM.

  • use high water mark for ZMQ to avoid message drops on high loads

13.105. 0.42 Release 2016-08-09

  • change examples to use 2 cores on localhost

  • Iterate documentation

  • Manual cherry pick fix for getip.

13.106. 0.41 Release 2016-07-15

  • address some of error messages and type checks

  • add scheduler documentation simplify interpretation of BF oversubscription fix a log message

  • fix logging problem reported by Ming and Vivek

  • global default url, sync profile/logfile/db fetching tools

  • make staging path resilient against cwd changes

  • Switch SSH and ORTE for Comet

  • sync session cleanup tool with rpu

  • update allocation IDs

13.107. 0.40.4 Release 2016-05-18

  • point release with more tutorial configurations

13.108. 0.40.3 Release 2016-05-17

  • point release with tutorial configurations

13.109. 0.40.2 Release 2016-05-13

  • hotfix to fix vnode parsing on archer

13.110. 0.40.1 Release 2016-02-11

  • hotfix which makes sure agents don’t report FAILED on cancel()

13.111. 0.40 Release 2016-02-03

  • Really numberous changes, fixes and features, most prominently: - OSG support - Yarn support - new resource supported - ORTE used for more resources - improved examples, profiling - communication cleanup - large Task support - lrms hook fixes - agent code splitup

13.112. 0.38 Release 2015-12-22

  • fix busy mongodb pull

13.113. 0.37.10 Release 2015-10-20

  • config fix

13.114. 0.37.9 Release 2015-10-20

  • Example fix

13.115. 0.37.8 Release 2015-10-20

  • Allocation fix

13.116. 0.37.7 Release 2015-10-20

  • Allocation fix

13.117. 0.37.6 Release 2015-10-20

  • Documentation

13.118. 0.37.5 Release 2015-10-19

  • timing fix to ensure task state ordering

13.119. 0.37.3 Release 2015-10-19

  • small fixes, doc changes

13.120. 0.37.2 Release 2015-10-18

  • fix example installation

13.121. 0.37.1 Release 2015-10-18

  • update of documentation and examples

  • some small fixes on shutdown installation

13.122. 0.37 Release 2015-10-15

  • change default spawner to POPEN

  • use hostlist to avoid mpirun* limitations

  • support default callbacks on tasks and pilots

  • use a config for examples

  • add lrms shutdown hook for ORTE LM

  • various updates to examples and documentation

  • create logfile and profile tarballs on the fly

  • export some RP env vars to tasks

  • Fix a mongodb race

  • internally unregister pilot cbs on shutdown

  • move agent.stop to finally clause, to correctly react on signals

  • remove RADICAL_DEBUG, use proper logger in queue, pubsub

  • small changes to getting_started

  • add APRUN entry for ARCHER.

  • Updated APRUN config for ARCHER. Thanks Vivek!

  • Use designated termination procedure for ORTE.

  • Use statically compiled and linked OMPI/ORTE.

  • Wait for its component children on termination

  • make localhost (ForkLRMS) behave like a resource with an inifnite number of cores

13.123. 0.36 Release 2015-10-08

(the release notes also cover some changes from 0.34 to 0.35)

  • simplify agent process tree, process naming

  • improve session and agent termination

  • several fixes and chages to the task state model (refer to documentation!)

  • fix POPEN state reporting

  • split agent component into individual, relocatable processes

  • improve and generalize agent bootstrapping

  • add support for dynamic agent layout over compute nodes

  • support for ORTE launch method on CRAY (and others)

  • add a watcher thread for the ORTE DVM

  • improves profiling support, expand to RP module

  • add various profiling analysis tools

  • add support for profile fetching from remote pilot sandbox

  • synchronize and recombine profiles from different pilots

  • add a simple tool to run a recorded session.

  • add several utility classes: component, queue, pubsub

  • clean configuration passing from module to agent.

  • clean tunneling support

  • support different data frame formats for profiling

  • use agent infrastructure (LRMS, LM) for spawning sub-agents

  • allow LM to specify env vars to be unset.

  • allow agent on mom node to use tunnel.

  • fix logging to avoid log leakage from lower layers

  • avoid some file system bottlenecks

  • several resource specific configuration fixes (mostly stampede, archer, bw)

  • backport stdout/stderr/log retrieval

  • better logging of clone/drops, better error handling for configs

  • fix, improve profiling of Task execution

  • make profile an object

  • use ZMQ pubsub and queues for agent/sub-agent communication

  • decouple launch methods from scheduler for most LMs NOTE: RUNJOB remains coupled!

  • detect disappearing orte-dvm when exit code is zero

  • perform node allocation for sub-agents

  • introduce a barrier on agent startup

  • fix some errors on shell spanwer (quoting, monotoring delays)

  • make localhost layout configurable via cpn

  • make setup.py report a decent error when being used with python3

  • support nodename lookup on Cray

  • only mkdir in input staging controller when we intent to stage data

  • protect agent cb invokation by lock

  • (re)add command line for profile fetching

  • cleanup of data staging, with better support for different schemas (incl. GlobusOnline)

  • work toward better OSG support

  • Use netifaces for ip address mangling.

  • Use ORTE from the 2.x branch.

  • remove Url class

13.124. 0.35.1 Release 2015-09-29

  • hotfix to use popen on localhost

13.125. 0.35 Release 2015-07-14

  • numerous bug fixes and support for new resources

13.126. 0.34 Release 2015-07-14

  • Hotfix release for an installation issue

13.127. 0.33 Release 2015-05-27

  • Hotfix release for off-by-one error (#621)

13.128. 0.32 Release 2015-05-18

  • Hotfix release for MPIRUN_RSH on Stampede (#572).

13.129. 0.31 Release 2015-04-30

  • version bump to trigger pypi release update

13.130. 0.30 Release 2015-04-29

  • hotfix to handle broken pip/bash combo on archer

13.131. 0.29 Release 2015-04-28

  • hotfix to handle stale ve locks

13.132. 0.28 Release 2015-04-16

  • This release contains a very large set of commits, and covers a fundamental overhaul of the RP agent (amongst others). It also includes: - support for agent profiling - removes a number of state race conditions - support for new backends (ORTE, CCM) - fixes for other backends - revamp of the integration tests

13.133. 0.26 Release 2015-04-08

  • hotfix to cope with API changing pymongo release

13.134. 0.25 Release 2015-04-01

  • hotfix for a stampede configuration change

13.135. 0.24 Release 2015-03-30

  • More support for URLs in StagingDirectives (#489).

  • Create parent directories of staged files.

  • Only process entries for Output FTW, fixes #490.

  • SuperMUC config change.

  • switch from bson to json for session dumps

  • fixes #451

  • update resources.rst

  • remove superfluous n

  • fix #438

  • add documentation on resource config changes, closes #421

  • .ssh/authorized_keys2 is deprecated since 2011

  • improved intra-node SSH FAQ item

13.136. 0.23 Release 2014-12-13

  • fix #455

13.137. 0.22 Release 2014-12-11

  • several state races fixed

  • fix to tools for session cleanup and purging

  • partial fix for pilot cancelation

  • improved shutdown behavior

  • improved hopper support

  • adapt plotting to changed slothistory format

  • make instructions clearer on data staging examples

  • addresses issue #216

  • be more resilient on pilot shutdown

  • take care of cancelling of active pilots

  • fix logic error on state check for pilot cancellation

  • fix blacklight config (#360)

  • attempt to cancel pilots timely…

  • as fallback, use PPN information provided by SAGA

  • hopper usues torque (thanks Mark!)

  • Re-fix blacklight config. Addresses #359 (again).

  • allow to pass application data to callbacks

  • threads should not be daemons…

  • workaround on failing bson encoding…

  • report pilot id on cu inspection

  • ignore caching errors

  • also use staging flags on input staging

  • stampede environment fix

  • Added missing stampede alias

  • adds timestamps to task and pilot logentries

  • fix state tags for plots

  • fix plot style for waitq

  • introduce UNSCHEDULED state as per #233

  • selectable terminal type for plot

  • document pilot log env

  • add faq about VE problems on setuptools upgrade

  • allow to specify session cache files

  • added configuration for BlueBiou (Thanks Jordane)

  • better support for json/bson/timestamp handling; cache mongodb data for stats, plots etc

  • localize numpy dependency

  • retire input_data and output_data

  • remove obsolete staging examples

  • address #410

  • fix another subtle state race

13.138. 0.21 Release 2014-10-29

  • Documentation of MPI support

  • Documentation of data staging operations

  • correct handling of data transfer exceptions

  • fix handling of non-ascii data in task stdio

  • simplify switching of access schemas on pilot submission

  • disable pilot virtualenv for task execution

  • MPI support for DaVinci

  • performance optimizations on file transfers, task sandbox setup

  • fix ibrun tmp file problem on stampede

13.139. 0.19 Release September 12. 2014

13.140. 0.18 Release July 22. 2014

13.141. 0.17 Release June 18. 2014

Bugfix release - fixed file permissions et al. :/

13.142. 0.16 Release June 17. 2014

Bugfix release - fixed file permissions et al.

13.143. 0.15 Release June 12. 2014

Bugfix release - fixed distribution MANIFEST:

https://github.com/radical-cybertools/radical.pilot/issues/174

13.144. 0.14 Release June 11. 2014

Closed Tickets:

New Features

  • Experimental pilot-agent for Cray systems

  • New multi-core agent with MPI support

  • New ResourceConfig mechanism does not reuquire the user to add resource configurations explicitly. Resources can be configured programatically on API-level.

API Changes:

  • TaskDescription.working_dir_priv removed

  • Extended state model

  • resource_configurations parameter removed from PilotManager c`tor

13.145. 0.13 Release May 19. 2014

  • ExTASY demo release

  • Support for project / allocation

  • Updated / simplified resource files

  • Refactored bootstrap mechnism

13.146. 0.12 Release May 09. 2014

13.147. 0.11 Release Apr. 29. 2014

  • Fixes error in state history reporting

13.148. 0.10 Release Apr. 29. 2014

13.149. 0.9 Release Apr. 16. 2014

13.150. 0.8 Release Mar. 24. 2014

13.151. 0.7 Release Feb. 25. 2014

13.152. 0.6 Release Feb. 24. 2014

  • BROKEN RELEASE

13.153. 0.5 Release Feb. 06. 2014

13.154. 0.4 Release

  • Tutorial 1 release (Github only)

  • Consistent naming (sagapilot instead of sinon)

13.155. 0.1.3 Release

13.156. 0.1.2 Release