btgym.algorithms.nn.losses module

btgym.algorithms.nn.losses.aac_loss_def(act_target, adv_target, r_target, pi_logits, pi_vf, pi_prime_logits, entropy_beta, epsilon=None, name='_aac_', verbose=False)[source]

Advantage Actor Critic loss definition. Paper: https://arxiv.org/abs/1602.01783

Parameters:
  • act_target – tensor holding policy actions targets;
  • adv_target – tensor holding policy estimated advantages targets;
  • r_target – tensor holding policy empirical returns targets;
  • pi_logits – policy logits output tensor;
  • pi_prime_logits – not used;
  • pi_vf – policy value function output tensor;
  • entropy_beta – entropy regularization constant;
  • epsilon – not used;
  • name – scope;
  • verbose – summary level.
Returns:

tensor holding estimated AAC loss; list of related tensorboard summaries.

btgym.algorithms.nn.losses.ppo_loss_def(act_target, adv_target, r_target, pi_logits, pi_vf, pi_prime_logits, entropy_beta, epsilon, name='_ppo_', verbose=False)[source]

PPO clipped surrogate loss definition, as (7) in https://arxiv.org/pdf/1707.06347.pdf

Parameters:
  • act_target – tensor holding policy actions targets;
  • adv_target – tensor holding policy estimated advantages targets;
  • r_target – tensor holding policy empirical returns targets;
  • pi_logits – policy logits output tensor;
  • pi_vf – policy value function output tensor;
  • pi_prime_logits – old_policy logits output tensor;
  • entropy_beta – entropy regularization constant
  • epsilon – L^Clip epsilon tensor;
  • name – scope;
  • verbose – summary level.
Returns:

tensor holding estimated PPO L^Clip loss; list of related tensorboard summaries.

btgym.algorithms.nn.losses.value_fn_loss_def(r_target, pi_vf, name='_vr_', verbose=False)[source]

Value function loss.

Parameters:
  • r_target – tensor holding policy empirical returns targets;
  • pi_vf – policy value function output tensor;
  • name – scope;
  • verbose – summary level.
Returns:

tensor holding estimated value fn. loss; list of related tensorboard summaries.

btgym.algorithms.nn.losses.pc_loss_def(actions, targets, pi_pc_q, name='_pc_', verbose=False)[source]

Pixel control auxiliary task loss definition.

Paper: https://arxiv.org/abs/1611.05397

Borrows heavily from Kosuke Miyoshi code, under Apache License 2.0:

https://miyosuda.github.io/

https://github.com/miyosuda/unreal

Parameters:
  • actions – tensor holding policy actions;
  • targets – tensor holding estimated pixel-change targets;
  • pi_pc_q – policy Q-value features output tensor;
  • name – scope;
  • verbose – summary level.
Returns:

tensor holding estimated pc loss; list of related tensorboard summaries.

btgym.algorithms.nn.losses.rp_loss_def(rp_targets, pi_rp_logits, name='_rp_', verbose=False)[source]

Reward prediction auxillary task loss definition.

Paper: https://arxiv.org/abs/1611.05397

Borrows heavily from Kosuke Miyoshi code, under Apache License 2.0:

https://miyosuda.github.io/

https://github.com/miyosuda/unreal

Parameters:
  • targets – tensor holding reward prediction target;
  • pi_rp_logits – policy reward predictions tensor;
  • name – scope;
  • verbose – summary level.
Returns:

tensor holding estimated rp loss; list of related tensorboard summaries.

btgym.algorithms.nn.losses.ae_loss_def(targets, logits, alpha=1.0, name='ae_loss', verbose=False, **kwargs)[source]

Mean quadratic autoencoder reconstruction loss definition

Parameters:
  • targets – tensor holding reconstruction target
  • logits – t ensor holding decoded aa decoder output
  • alpha – loss weight constant
  • name – scope
  • verbose – summary level.
Returns:

tensor holding estimated reconstruction loss list of summarues

btgym.algorithms.nn.losses.beta_vae_loss_def(targets, logits, d_kl, alpha=1.0, beta=1.0, name='beta_vae_loss', verbose=False)[source]

Beta-variational autoencoder loss definition

Papers:
http://www.matthey.me/pdf/betavae_iclr_2017.pdf https://drive.google.com/file/d/0Bwy4Nlx78QCCNktVTFFMTUs4N2oxY295VU9qV25MWTBQS2Uw/view
Parameters:
  • targets
  • logits
  • d_kl
  • alpha
  • beta
  • name
  • verbose
Returns:

tensor holding estimated loss list of summarues

btgym.algorithms.nn.networks module

btgym.algorithms.nn.networks.conv_2d_network(x, ob_space, ac_space, conv_2d_layer_ref=<function conv2d>, conv_2d_num_filters=(32, 32, 64, 64), conv_2d_filter_size=(3, 3), conv_2d_stride=(2, 2), pad='SAME', dtype=tf.float32, name='conv2d', collections=None, reuse=False, keep_prob=None, **kwargs)[source]

Stage1 network: from preprocessed 2D input to estimated features. Encapsulates convolutions + layer normalisation + nonlinearity. Can be shared.

Returns:tensor holding state features;
btgym.algorithms.nn.networks.conv_1d_network(x, ob_space, ac_space, conv_1d_num_layers=4, conv_1d_num_filters=32, conv_1d_filter_size=3, conv_1d_stride=2, pad='SAME', dtype=tf.float32, collections=None, reuse=False, **kwargs)[source]

Stage1 network: from preprocessed 1D input to estimated features. Encapsulates convolutions, [possibly] skip-connections etc. Can be shared.

Returns:tensor holding state features;
btgym.algorithms.nn.networks.lstm_network(x, lstm_sequence_length, lstm_class=<class 'tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell'>, lstm_layers=(256, ), static=False, keep_prob=None, name='lstm', reuse=False, **kwargs)[source]

Stage2 network: from features to flattened LSTM output. Defines [multi-layered] dynamic [possibly shared] LSTM network.

Returns:batch-wise flattened output tensor; lstm initial state tensor; lstm state output tensor; lstm flattened feed placeholders as tuple.
btgym.algorithms.nn.networks.dense_aac_network(x, ac_space_depth, name='dense_aac', linear_layer_ref=<function noisy_linear>, reuse=False)[source]

Stage3 network: from LSTM flattened output to advantage actor-critic.

Returns:
logits tensor
value function tensor action sampling function.

for every space in ac_space_shape dictionary

Return type:dictionary containg tuples
btgym.algorithms.nn.networks.dense_rp_network(x, linear_layer_ref=<function noisy_linear>)[source]

Stage3 network: From shared convolutions to reward-prediction task output tensor.

btgym.algorithms.nn.networks.pixel_change_2d_estimator(ob_space, pc_estimator_stride=(2, 2), **kwargs)[source]

Defines tf operation for estimating pixel change as subsampled absolute difference of two states.

Note

crops input array by one pix from either side; –> 1D signal to be shaped as [signal_length, 3]

btgym.algorithms.nn.networks.duelling_pc_network(x, ac_space, duell_pc_x_inner_shape=(9, 9, 32), duell_pc_filter_size=(4, 4), duell_pc_stride=(2, 2), linear_layer_ref=<function noisy_linear>, reuse=False, **kwargs)[source]

Stage3 network for `pixel control’ task: from LSTM output to Q-aux. features tensor.

btgym.algorithms.nn.layers module

btgym.algorithms.nn.layers.categorical_sample(logits, depth)[source]

Given logits returns one-hot encoded categorical sample. :param logits: :param depth:

Returns:tensor of shape [batch_dim, logits_depth]
btgym.algorithms.nn.layers.linear(x, size, name, initializer=None, bias_init=0, reuse=False)[source]

Linear network layer.

btgym.algorithms.nn.layers.noisy_linear(x, size, name, bias=True, activation_fn=<function identity>, reuse=False, **kwargs)[source]

Noisy Net linear network layer using Factorised Gaussian noise; Code by Andrew Liao, https://github.com/andrewliao11/NoisyNet-DQN

Papers:
https://arxiv.org/abs/1706.10295 https://arxiv.org/abs/1706.01905
btgym.algorithms.nn.layers.conv2d(x, num_filters, name, filter_size=(3, 3), stride=(1, 1), pad='SAME', dtype=tf.float32, collections=None, reuse=False)[source]

2D convolution layer.

btgym.algorithms.nn.layers.deconv2d(x, output_channels, name, filter_size=(4, 4), stride=(2, 2), dtype=tf.float32, collections=None, reuse=False)[source]

Deconvolution layer, paper: http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf

btgym.algorithms.nn.layers.conv1d(x, num_filters, name, filter_size=3, stride=2, pad='SAME', dtype=tf.float32, collections=None, reuse=False)[source]

1D convolution layer.

btgym.algorithms.nn.layers.conv2d_dw(x, num_filters, name='conv2d_dw', filter_size=(3, 3), stride=(1, 1), pad='SAME', dtype=tf.float32, collections=None, reuse=False)[source]

Depthwise 2D convolution layer. Slow, do not use.

btgym.algorithms.nn.ae module

btgym.algorithms.nn.ae.conv2d_encoder(x, layer_config=((32, (3, 1), (2, 1)), (32, (3, 1), (2, 1)), (32, (3, 1), (2, 1))), pad='SAME', name='encoder', reuse=False)[source]

Defines convolutional encoder.

Parameters:
  • x – input tensor
  • layer_config – first to last nested layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]
  • pad – str, padding scheme: ‘SAME’ or ‘VALID’
  • name – str, mame scope
  • reuse – bool
Returns:

list of tensors holding encoded features for every layer outer to inner, level-wise list of encoding layers shapes, first ro last.

btgym.algorithms.nn.ae.conv2d_decoder(z, layer_shapes, layer_config=((32, (3, 1), (2, 1)), (32, (3, 1), (2, 1)), (32, (3, 1), (2, 1))), pad='SAME', resize_method=0, name='decoder', reuse=False)[source]

Defines convolutional decoder.

Parameters:
  • z – tensor holding encoded state
  • layer_shapes – level-wise list of matching encoding layers shapes, last to first.
  • layer_config – layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]
  • pad – str, padding scheme: ‘SAME’ or ‘VALID’
  • resize_method – up-sampling method, one of supported tf.image.ResizeMethod’s
  • name – str, mame scope
  • reuse – bool
Returns:

list of tensors holding decoded features for every layer inner to outer

btgym.algorithms.nn.ae.conv2d_autoencoder(inputs, layer_config, resize_method=0, pad='SAME', linear_layer_ref=<function linear>, name='base_conv2d_autoencoder', reuse=False, **kwargs)[source]

Basic convolutional autoencoder. Hidden state is passed through dense linear layer.

Parameters:
  • inputs – input tensor
  • layer_config – layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]; this list represent decoder part of autoencoder bottleneck, decoder part is inferred symmetrically
  • resize_method – up-sampling method, one of supported tf.image.ResizeMethod’s
  • pad – str, padding scheme: ‘SAME’ or ‘VALID’
  • linear_layer_ref – linear layer class to use
  • name – str, mame scope
  • reuse – bool
Returns:

list of tensors holding encoded features, layer_wise from outer to inner tensor holding batch-wise flattened hidden state vector list of tensors holding decoded features, layer-wise from inner to outer tensor holding reconstructed output None value

btgym.algorithms.nn.ae.cw_conv2d_autoencoder(inputs, layer_config, resize_method=0, pad='SAME', linear_layer_ref=<function linear>, name='cw_conv2d_autoencoder', reuse=False, **kwargs)[source]

Channel-wise convolutional autoencoder. Hidden state is passed through dense linear layer. Pain-slow, do not use.

Parameters:
  • inputs – input tensor
  • layer_config – layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]; this list represent decoder part of autoencoder bottleneck, decoder part is inferred symmetrically
  • resize_method – up-sampling method, one of supported tf.image.ResizeMethod’s
  • pad – str, padding scheme: ‘SAME’ or ‘VALID’
  • linear_layer_ref – linear layer class to use
  • name – str, mame scope
  • reuse – bool
Returns:

per-channel list of lists of tensors holding encoded features, layer_wise from outer to inner tensor holding batch-wise flattened hidden state vector per-channel list of lists of tensors holding decoded features, layer-wise from inner to outer tensor holding reconstructed output None value

btgym.algorithms.nn.ae.beta_var_conv2d_autoencoder(inputs, layer_config, resize_method=0, pad='SAME', linear_layer_ref=<function linear>, name='vae_conv2d', max_batch_size=256, reuse=False)[source]

Variational autoencoder.

Papers:
https://arxiv.org/pdf/1312.6114.pdf https://arxiv.org/pdf/1606.05908.pdf http://www.matthey.me/pdf/betavae_iclr_2017.pdf
Parameters:
  • inputs – input tensor
  • layer_config – layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]; this list represent decoder part of autoencoder bottleneck, decoder part is inferred symmetrically
  • resize_method – up-sampling method, one of supported tf.image.ResizeMethod’s
  • pad – str, padding scheme: ‘SAME’ or ‘VALID’
  • linear_layer_ref – linear layer class - not used
  • name – str, mame scope
  • max_batch_size – int, dynamic batch size should be no greater than this value
  • reuse – bool
Returns:

list of tensors holding encoded features, layer_wise from outer to inner tensor holding batch-wise flattened hidden state vector list of tensors holding decoded features, layer-wise from inner to outer tensor holding reconstructed output tensor holding estimated KL divergence

class btgym.algorithms.nn.ae.KernelMonitor(conv_input, layer_output)[source]

Visualises convolution filters learnt for specific layer. Source: https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html

Parameters:
  • conv_input – convolution stack input tensor
  • layer_output – tensor holding output of layer of interest from stack
fit(sess, kernel_index, step=0.001, num_steps=40)[source]

Learns input signal that maximizes the activation of given kernel.

Parameters:
  • sess – tf.Session object
  • kernel_index – filter number of interest
  • step – gradient ascent step size
  • num_steps – number of steps to fit
Returns:

learnt signal as np.array