btgym.algorithms.runner.base module¶

btgym.algorithms.runner.base.BaseEnvRunnerFn(sess, env, policy, task, rollout_length, summary_writer, episode_summary_freq, env_render_freq, atari_test, ep_summary, memory_config, log, **kwargs)[source]¶

Default function defining runtime logic of the thread runner. In brief, it constantly keeps on running the policy, and as long as the rollout exceeds a certain length, the thread runner appends all the collected data to the queue.

Parameters:	env – environment instance policy – policy instance task – int rollout_length – int episode_summary_freq – int env_render_freq – int atari_test – bool, Atari or BTGyn ep_summary – dict of tf.summary op and placeholders memory_config – replay memory configuration dictionary log – logbook logger

Yelds:: collected data as dictionary of on_policy, off_policy rollouts and episode statistics.

btgym.algorithms.runner.threadrunner module¶

class btgym.algorithms.runner.threadrunner.RunnerThread(env, policy, task, rollout_length, episode_summary_freq, env_render_freq, test, ep_summary, runner_fn_ref=<function BaseEnvRunnerFn>, memory_config=None, log_level=13, **kwargs)[source]¶

Async. framework code comes from OpenAI repository under MIT licence: https://github.com/openai/universe-starter-agent

Despite the fact BTgym is not real-time environment [yet], thread-runner approach is still here. From original universe-starter-agent: …One of the key distinctions between a normal environment and a universe environment is that a universe environment is _real time_. This means that there should be a thread that would constantly interact with the environment and tell it what to do. This thread is here.

Another idea is to see ThreadRunner as all-in-one data provider, thus shaping data distribution fed to estimator from single place. So, replay memory is also here, as well as some service functions (collecting summary data).

Parameters:	env – environment instance policy – policy instance task – int rollout_length – int episode_summary_freq – int env_render_freq – int test – Atari or BTGyn ep_summary – tf.summary runner_fn_ref – callable defining runner execution logic memory_config – replay memory configuration dictionary log_level – int, logbook.level

run()[source]¶: Just keep running.

btgym.algorithms.runner.synchro module¶

class btgym.algorithms.runner.synchro.BaseSynchroRunner(env, task, rollout_length, episode_summary_freq, env_render_freq, ep_summary, test=False, policy=None, data_sample_config=None, memory_config=None, test_conditions=None, test_deterministic=True, slowdown_steps=0, global_step_op=None, aux_render_modes=None, _implemented_aux_render_modes=None, name='synchro', log_level=13, **kwargs)[source]¶

Experience provider class. Interacts with environment and outputs data in form of rollouts augmented with relevant summaries and metadata. This runner is synchronous in sense that data collection is in-process’ and every rollout is collected by explicit call to respective `get_data() method [this is unlike ‘async-` thread-runner version found earlier in this this package which, once being started, runs on its own and can not be moderated]. Makes precise control on policy being executed possible. Does not support ‘atari’ mode.

Parameters:

env – BTgym environment instance
task – int, runner task id
rollout_length – int
episode_summary_freq – int
env_render_freq – int
test – legacy, not used
ep_summary – legacy, not used
policy – policy instance to execute
data_sample_config – dict, data sampling configuration dictionary
memory_config – dict, replay memory configuration
test_conditions – dict or None, dictionary of single experience conditions to check to mark it as test one.
test_deterministic – bool, if True - act deterministically for test episodes
slowdown_time – time to sleep between steps
aux_render_modes – iterable of str, additional summaries to compute
iterable of str, implemented additional summaries (_implemented_aux_render_modes) –
name – str, name scope
log_level – int, logbook.level

start_runner(sess, summary_writer, **kwargs)[source]¶: Legacy wrapper.

start(sess, summary_writer, init_context=None, data_sample_config=None)[source]¶: Executes initial sequence; fills initial replay memory if any.

get_init_experience(policy, policy_sync_op=None, init_context=None, data_sample_config=None)[source]¶

Starts new environment episode.

Parameters:	policy – policy to execute. policy_sync_op – operation copying local behavioural policy params from global one init_context – initial policy context for new episode. data_sample_config – configuration dictionary of type btgym.datafeed.base.EnvResetConfig
Returns:	incomplete initial experience of episode as dictionary (misses bootstrapped R value), next_state, next, policy RNN context action_reward

get_experience(policy, state, context, action, reward, policy_sync_op=None)[source]¶

Get single experience (possibly terminal).

Returns:	incomplete experience as dictionary (misses bootstrapped R value), next_state, next, policy RNN context action_reward

get_train_stat(is_test=False)[source]¶

Updates and computes average statistics for train episodes. :param is_test: bool, current episode type

Returns:	dict of stats

get_test_stat(is_test=False)[source]¶

Updates and computes statistics for single test episode.

Parameters:	is_test – bool, current episode type
Returns:	dict of stats

get_ep_render(is_test=False)[source]¶

Collects environment renderings. Relies on environment renderer class methods, so it is only valid when environment rendering is enabled (typically it is true for master runner).

Returns:	dictionary of images as rgb arrays

get_data(policy=None, policy_sync_op=None, init_context=None, data_sample_config=None, rollout_length=None, force_new_episode=False)[source]¶

Collects single trajectory rollout and bunch of summaries using specified policy. Updates episode statistics and replay memory.

Parameters:

policy – policy to execute
policy_sync_op – operation copying local behavioural policy params from global one
init_context – if specified, overrides initial episode context provided bu self.context (valid only if new episode is started within this rollout).
data_sample_config – environment configuration parameters for next episode to sample: configuration dictionary of type `btgym.datafeed.base.EnvResetConfig
rollout_length – length of rollout to collect, if specified - overrides self.rollout_length attr
force_new_episode – bool, if True - resets the environment

Returns:

data dictionary

get_batch(size, policy=None, policy_sync_op=None, require_terminal=True, same_trial=True, init_context=None, data_sample_config=None)[source]¶

Returns batch as list of ‘size’ or more rollouts collected under specified policy. Rollouts can be collected from several episodes consequently; there is may be more rollouts than set ‘size’ if it is necessary to collect at least one terminal rollout.

Parameters:	size – int, number of rollouts to collect policy – policy to use policy_sync_op – operation copying local behavioural policy params from global one require_terminal – bool, if True - require at least one terminal rollout to be present. same_trial – bool, if True - all episodes are sampled from same trial init_context – if specified, overrides initial episode context provided bu self.context data_sample_config – environment configuration parameters for all episodes in batch: configuration dictionary of type `btgym.datafeed.base.EnvResetConfig
Returns:	‘data’key holding list of data dictionaries; ‘terminal_context’ key holding list of terminal output contexts. If ‘require_terminal = True, this list is guarantied to hold at least one element.
Return type:	dict containing

class btgym.algorithms.runner.synchro.VerboseSynchroRunner(name='verbose_synchro', aux_render_modes=('action_prob', 'value_fn', 'lstm_1_h', 'lstm_2_h'), **kwargs)[source]¶

Extends BaseSynchroRunner class with additional visualisation summaries in some expense of running speed.

get_ep_render(is_test=False)[source]¶

Collects episode, environment and policy visualisations. Relies on environment renderer class methods, so it is only valid when environment rendering is enabled (typically it is true for master runner).

Returns:	dictionary of images as rgb arrays

start(sess, summary_writer, init_context=None, data_sample_config=None)[source]¶: Executes initial sequence; fills initial replay memory if any. Extra: initialises environment renderer to get aux. images.