btgym.algorithms.runner.base module

btgym.algorithms.runner.base.BaseEnvRunnerFn(sess, env, policy, task, rollout_length, summary_writer, episode_summary_freq, env_render_freq, atari_test, ep_summary, memory_config, log, **kwargs)[source]

Default function defining runtime logic of the thread runner. In brief, it constantly keeps on running the policy, and as long as the rollout exceeds a certain length, the thread runner appends all the collected data to the queue.

Parameters:
  • env – environment instance
  • policy – policy instance
  • task – int
  • rollout_length – int
  • episode_summary_freq – int
  • env_render_freq – int
  • atari_test – bool, Atari or BTGyn
  • ep_summary – dict of tf.summary op and placeholders
  • memory_config – replay memory configuration dictionary
  • log – logbook logger
Yelds:
collected data as dictionary of on_policy, off_policy rollouts and episode statistics.

btgym.algorithms.runner.threadrunner module

class btgym.algorithms.runner.threadrunner.RunnerThread(env, policy, task, rollout_length, episode_summary_freq, env_render_freq, test, ep_summary, runner_fn_ref=<function BaseEnvRunnerFn>, memory_config=None, log_level=13, **kwargs)[source]

Async. framework code comes from OpenAI repository under MIT licence: https://github.com/openai/universe-starter-agent

Despite the fact BTgym is not real-time environment [yet], thread-runner approach is still here. From original universe-starter-agent: …One of the key distinctions between a normal environment and a universe environment is that a universe environment is _real time_. This means that there should be a thread that would constantly interact with the environment and tell it what to do. This thread is here.

Another idea is to see ThreadRunner as all-in-one data provider, thus shaping data distribution fed to estimator from single place. So, replay memory is also here, as well as some service functions (collecting summary data).

Parameters:
  • env – environment instance
  • policy – policy instance
  • task – int
  • rollout_length – int
  • episode_summary_freq – int
  • env_render_freq – int
  • test – Atari or BTGyn
  • ep_summary – tf.summary
  • runner_fn_ref – callable defining runner execution logic
  • memory_config – replay memory configuration dictionary
  • log_level – int, logbook.level
run()[source]

Just keep running.

btgym.algorithms.runner.synchro module

class btgym.algorithms.runner.synchro.BaseSynchroRunner(env, task, rollout_length, episode_summary_freq, env_render_freq, ep_summary, test=False, policy=None, data_sample_config=None, memory_config=None, test_conditions=None, test_deterministic=True, slowdown_steps=0, global_step_op=None, aux_render_modes=None, _implemented_aux_render_modes=None, name='synchro', log_level=13, **kwargs)[source]

Experience provider class. Interacts with environment and outputs data in form of rollouts augmented with relevant summaries and metadata. This runner is synchronous in sense that data collection is in-process’ and every rollout is collected by explicit call to respective `get_data() method [this is unlike ‘async-` thread-runner version found earlier in this this package which, once being started, runs on its own and can not be moderated]. Makes precise control on policy being executed possible. Does not support ‘atari’ mode.

Parameters:
  • env – BTgym environment instance
  • task – int, runner task id
  • rollout_length – int
  • episode_summary_freq – int
  • env_render_freq – int
  • test – legacy, not used
  • ep_summary – legacy, not used
  • policy – policy instance to execute
  • data_sample_config – dict, data sampling configuration dictionary
  • memory_config – dict, replay memory configuration
  • test_conditions – dict or None, dictionary of single experience conditions to check to mark it as test one.
  • test_deterministic – bool, if True - act deterministically for test episodes
  • slowdown_time – time to sleep between steps
  • aux_render_modes – iterable of str, additional summaries to compute
  • iterable of str, implemented additional summaries (_implemented_aux_render_modes) –
  • name – str, name scope
  • log_level – int, logbook.level
start_runner(sess, summary_writer, **kwargs)[source]

Legacy wrapper.

start(sess, summary_writer, init_context=None, data_sample_config=None)[source]

Executes initial sequence; fills initial replay memory if any.

get_init_experience(policy, policy_sync_op=None, init_context=None, data_sample_config=None)[source]

Starts new environment episode.

Parameters:
  • policy – policy to execute.
  • policy_sync_op – operation copying local behavioural policy params from global one
  • init_context – initial policy context for new episode.
  • data_sample_config – configuration dictionary of type btgym.datafeed.base.EnvResetConfig
Returns:

incomplete initial experience of episode as dictionary (misses bootstrapped R value), next_state, next, policy RNN context action_reward

get_experience(policy, state, context, action, reward, policy_sync_op=None)[source]

Get single experience (possibly terminal).

Returns:incomplete experience as dictionary (misses bootstrapped R value), next_state, next, policy RNN context action_reward
get_train_stat(is_test=False)[source]

Updates and computes average statistics for train episodes. :param is_test: bool, current episode type

Returns:dict of stats
get_test_stat(is_test=False)[source]

Updates and computes statistics for single test episode.

Parameters:is_test – bool, current episode type
Returns:dict of stats
get_ep_render(is_test=False)[source]

Collects environment renderings. Relies on environment renderer class methods, so it is only valid when environment rendering is enabled (typically it is true for master runner).

Returns:dictionary of images as rgb arrays
get_data(policy=None, policy_sync_op=None, init_context=None, data_sample_config=None, rollout_length=None, force_new_episode=False)[source]

Collects single trajectory rollout and bunch of summaries using specified policy. Updates episode statistics and replay memory.

Parameters:
  • policy – policy to execute
  • policy_sync_op – operation copying local behavioural policy params from global one
  • init_context – if specified, overrides initial episode context provided bu self.context (valid only if new episode is started within this rollout).
  • data_sample_config – environment configuration parameters for next episode to sample: configuration dictionary of type `btgym.datafeed.base.EnvResetConfig
  • rollout_length – length of rollout to collect, if specified - overrides self.rollout_length attr
  • force_new_episode – bool, if True - resets the environment
Returns:

data dictionary

get_batch(size, policy=None, policy_sync_op=None, require_terminal=True, same_trial=True, init_context=None, data_sample_config=None)[source]

Returns batch as list of ‘size’ or more rollouts collected under specified policy. Rollouts can be collected from several episodes consequently; there is may be more rollouts than set ‘size’ if it is necessary to collect at least one terminal rollout.

Parameters:
  • size – int, number of rollouts to collect
  • policy – policy to use
  • policy_sync_op – operation copying local behavioural policy params from global one
  • require_terminal – bool, if True - require at least one terminal rollout to be present.
  • same_trial – bool, if True - all episodes are sampled from same trial
  • init_context – if specified, overrides initial episode context provided bu self.context
  • data_sample_config – environment configuration parameters for all episodes in batch: configuration dictionary of type `btgym.datafeed.base.EnvResetConfig
Returns:

‘data’key holding list of data dictionaries; ‘terminal_context’ key holding list of terminal output contexts. If ‘require_terminal = True, this list is guarantied to hold at least one element.

Return type:

dict containing

class btgym.algorithms.runner.synchro.VerboseSynchroRunner(name='verbose_synchro', aux_render_modes=('action_prob', 'value_fn', 'lstm_1_h', 'lstm_2_h'), **kwargs)[source]

Extends BaseSynchroRunner class with additional visualisation summaries in some expense of running speed.

get_ep_render(is_test=False)[source]

Collects episode, environment and policy visualisations. Relies on environment renderer class methods, so it is only valid when environment rendering is enabled (typically it is true for master runner).

Returns:dictionary of images as rgb arrays
start(sess, summary_writer, init_context=None, data_sample_config=None)[source]

Executes initial sequence; fills initial replay memory if any. Extra: initialises environment renderer to get aux. images.