btgym.algorithms.nn.losses module¶

btgym.algorithms.nn.losses.aac_loss_def(act_target, adv_target, r_target, pi_logits, pi_vf, pi_prime_logits, entropy_beta, epsilon=None, name='_aac_', verbose=False)[source]¶

Advantage Actor Critic loss definition. Paper: https://arxiv.org/abs/1602.01783

Parameters:

act_target – tensor holding policy actions targets;
adv_target – tensor holding policy estimated advantages targets;
r_target – tensor holding policy empirical returns targets;
pi_logits – policy logits output tensor;
pi_prime_logits – not used;
pi_vf – policy value function output tensor;
entropy_beta – entropy regularization constant;
epsilon – not used;
name – scope;
verbose – summary level.

Returns:

tensor holding estimated AAC loss; list of related tensorboard summaries.

btgym.algorithms.nn.losses.ppo_loss_def(act_target, adv_target, r_target, pi_logits, pi_vf, pi_prime_logits, entropy_beta, epsilon, name='_ppo_', verbose=False)[source]¶

PPO clipped surrogate loss definition, as (7) in https://arxiv.org/pdf/1707.06347.pdf

Parameters:

act_target – tensor holding policy actions targets;
adv_target – tensor holding policy estimated advantages targets;
r_target – tensor holding policy empirical returns targets;
pi_logits – policy logits output tensor;
pi_vf – policy value function output tensor;
pi_prime_logits – old_policy logits output tensor;
entropy_beta – entropy regularization constant
epsilon – L^Clip epsilon tensor;
name – scope;
verbose – summary level.

Returns:

tensor holding estimated PPO L^Clip loss; list of related tensorboard summaries.

btgym.algorithms.nn.losses.value_fn_loss_def(r_target, pi_vf, name='_vr_', verbose=False)[source]¶

Value function loss.

Parameters:	r_target – tensor holding policy empirical returns targets; pi_vf – policy value function output tensor; name – scope; verbose – summary level.
Returns:	tensor holding estimated value fn. loss; list of related tensorboard summaries.

btgym.algorithms.nn.losses.pc_loss_def(actions, targets, pi_pc_q, name='_pc_', verbose=False)[source]¶

Pixel control auxiliary task loss definition.

Paper: https://arxiv.org/abs/1611.05397

Borrows heavily from Kosuke Miyoshi code, under Apache License 2.0:

https://miyosuda.github.io/

https://github.com/miyosuda/unreal

Parameters:	actions – tensor holding policy actions; targets – tensor holding estimated pixel-change targets; pi_pc_q – policy Q-value features output tensor; name – scope; verbose – summary level.
Returns:	tensor holding estimated pc loss; list of related tensorboard summaries.

btgym.algorithms.nn.losses.rp_loss_def(rp_targets, pi_rp_logits, name='_rp_', verbose=False)[source]¶

Reward prediction auxillary task loss definition.

Paper: https://arxiv.org/abs/1611.05397

Borrows heavily from Kosuke Miyoshi code, under Apache License 2.0:

https://miyosuda.github.io/

https://github.com/miyosuda/unreal

Parameters:	targets – tensor holding reward prediction target; pi_rp_logits – policy reward predictions tensor; name – scope; verbose – summary level.
Returns:	tensor holding estimated rp loss; list of related tensorboard summaries.

btgym.algorithms.nn.losses.ae_loss_def(targets, logits, alpha=1.0, name='ae_loss', verbose=False, **kwargs)[source]¶

Mean quadratic autoencoder reconstruction loss definition

Parameters:	targets – tensor holding reconstruction target logits – t ensor holding decoded aa decoder output alpha – loss weight constant name – scope verbose – summary level.
Returns:	tensor holding estimated reconstruction loss list of summarues

btgym.algorithms.nn.losses.beta_vae_loss_def(targets, logits, d_kl, alpha=1.0, beta=1.0, name='beta_vae_loss', verbose=False)[source]¶

Beta-variational autoencoder loss definition

Papers:: http://www.matthey.me/pdf/betavae_iclr_2017.pdf https://drive.google.com/file/d/0Bwy4Nlx78QCCNktVTFFMTUs4N2oxY295VU9qV25MWTBQS2Uw/view

Parameters:	targets – logits – d_kl – alpha – beta – name – verbose –
Returns:	tensor holding estimated loss list of summarues

btgym.algorithms.nn.networks module¶

btgym.algorithms.nn.networks.conv_2d_network(x, ob_space, ac_space, conv_2d_layer_ref=<function conv2d>, conv_2d_num_filters=(32, 32, 64, 64), conv_2d_filter_size=(3, 3), conv_2d_stride=(2, 2), pad='SAME', dtype=tf.float32, name='conv2d', collections=None, reuse=False, keep_prob=None, **kwargs)[source]¶

Stage1 network: from preprocessed 2D input to estimated features. Encapsulates convolutions + layer normalisation + nonlinearity. Can be shared.

Returns:	tensor holding state features;

btgym.algorithms.nn.networks.conv_1d_network(x, ob_space, ac_space, conv_1d_num_layers=4, conv_1d_num_filters=32, conv_1d_filter_size=3, conv_1d_stride=2, pad='SAME', dtype=tf.float32, collections=None, reuse=False, **kwargs)[source]¶

Stage1 network: from preprocessed 1D input to estimated features. Encapsulates convolutions, [possibly] skip-connections etc. Can be shared.

Returns:	tensor holding state features;

btgym.algorithms.nn.networks.lstm_network(x, lstm_sequence_length, lstm_class=<class 'tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell'>, lstm_layers=(256, ), static=False, keep_prob=None, name='lstm', reuse=False, **kwargs)[source]¶

Stage2 network: from features to flattened LSTM output. Defines [multi-layered] dynamic [possibly shared] LSTM network.

Returns:	batch-wise flattened output tensor; lstm initial state tensor; lstm state output tensor; lstm flattened feed placeholders as tuple.

btgym.algorithms.nn.networks.dense_aac_network(x, ac_space_depth, name='dense_aac', linear_layer_ref=<function noisy_linear>, reuse=False)[source]¶

Stage3 network: from LSTM flattened output to advantage actor-critic.

Returns:

logits tensor: value function tensor action sampling function.

for every space in ac_space_shape dictionary

Return type: dictionary containg tuples

btgym.algorithms.nn.networks.dense_rp_network(x, linear_layer_ref=<function noisy_linear>)[source]¶: Stage3 network: From shared convolutions to reward-prediction task output tensor.

btgym.algorithms.nn.networks.pixel_change_2d_estimator(ob_space, pc_estimator_stride=(2, 2), **kwargs)[source]¶: Defines tf operation for estimating pixel change as subsampled absolute difference of two states.

Note

crops input array by one pix from either side; –> 1D signal to be shaped as [signal_length, 3]

btgym.algorithms.nn.networks.duelling_pc_network(x, ac_space, duell_pc_x_inner_shape=(9, 9, 32), duell_pc_filter_size=(4, 4), duell_pc_stride=(2, 2), linear_layer_ref=<function noisy_linear>, reuse=False, **kwargs)[source]¶: Stage3 network for `pixel control’ task: from LSTM output to Q-aux. features tensor.

btgym.algorithms.nn.layers module¶

btgym.algorithms.nn.layers.categorical_sample(logits, depth)[source]¶

Given logits returns one-hot encoded categorical sample. :param logits: :param depth:

Returns:	tensor of shape [batch_dim, logits_depth]

btgym.algorithms.nn.layers.linear(x, size, name, initializer=None, bias_init=0, reuse=False)[source]¶: Linear network layer.

btgym.algorithms.nn.layers.noisy_linear(x, size, name, bias=True, activation_fn=<function identity>, reuse=False, **kwargs)[source]¶

Noisy Net linear network layer using Factorised Gaussian noise; Code by Andrew Liao, https://github.com/andrewliao11/NoisyNet-DQN

Papers:: https://arxiv.org/abs/1706.10295 https://arxiv.org/abs/1706.01905

btgym.algorithms.nn.layers.conv2d(x, num_filters, name, filter_size=(3, 3), stride=(1, 1), pad='SAME', dtype=tf.float32, collections=None, reuse=False)[source]¶: 2D convolution layer.

btgym.algorithms.nn.layers.deconv2d(x, output_channels, name, filter_size=(4, 4), stride=(2, 2), dtype=tf.float32, collections=None, reuse=False)[source]¶: Deconvolution layer, paper: http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf

btgym.algorithms.nn.layers.conv1d(x, num_filters, name, filter_size=3, stride=2, pad='SAME', dtype=tf.float32, collections=None, reuse=False)[source]¶: 1D convolution layer.

btgym.algorithms.nn.layers.conv2d_dw(x, num_filters, name='conv2d_dw', filter_size=(3, 3), stride=(1, 1), pad='SAME', dtype=tf.float32, collections=None, reuse=False)[source]¶: Depthwise 2D convolution layer. Slow, do not use.

btgym.algorithms.nn.ae module¶

btgym.algorithms.nn.ae.conv2d_encoder(x, layer_config=((32, (3, 1), (2, 1)), (32, (3, 1), (2, 1)), (32, (3, 1), (2, 1))), pad='SAME', name='encoder', reuse=False)[source]¶

Defines convolutional encoder.

Parameters:	x – input tensor layer_config – first to last nested layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)] pad – str, padding scheme: ‘SAME’ or ‘VALID’ name – str, mame scope reuse – bool
Returns:	list of tensors holding encoded features for every layer outer to inner, level-wise list of encoding layers shapes, first ro last.

btgym.algorithms.nn.ae.conv2d_decoder(z, layer_shapes, layer_config=((32, (3, 1), (2, 1)), (32, (3, 1), (2, 1)), (32, (3, 1), (2, 1))), pad='SAME', resize_method=0, name='decoder', reuse=False)[source]¶

Defines convolutional decoder.

Parameters:

z – tensor holding encoded state
layer_shapes – level-wise list of matching encoding layers shapes, last to first.
layer_config – layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]
pad – str, padding scheme: ‘SAME’ or ‘VALID’
resize_method – up-sampling method, one of supported tf.image.ResizeMethod’s
name – str, mame scope
reuse – bool

Returns:

list of tensors holding decoded features for every layer inner to outer

btgym.algorithms.nn.ae.conv2d_autoencoder(inputs, layer_config, resize_method=0, pad='SAME', linear_layer_ref=<function linear>, name='base_conv2d_autoencoder', reuse=False, **kwargs)[source]¶

Basic convolutional autoencoder. Hidden state is passed through dense linear layer.

Parameters:

inputs – input tensor
layer_config – layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]; this list represent decoder part of autoencoder bottleneck, decoder part is inferred symmetrically
resize_method – up-sampling method, one of supported tf.image.ResizeMethod’s
pad – str, padding scheme: ‘SAME’ or ‘VALID’
linear_layer_ref – linear layer class to use
name – str, mame scope
reuse – bool

Returns:

list of tensors holding encoded features, layer_wise from outer to inner tensor holding batch-wise flattened hidden state vector list of tensors holding decoded features, layer-wise from inner to outer tensor holding reconstructed output None value

btgym.algorithms.nn.ae.cw_conv2d_autoencoder(inputs, layer_config, resize_method=0, pad='SAME', linear_layer_ref=<function linear>, name='cw_conv2d_autoencoder', reuse=False, **kwargs)[source]¶

Channel-wise convolutional autoencoder. Hidden state is passed through dense linear layer. Pain-slow, do not use.

Parameters:

inputs – input tensor
layer_config – layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]; this list represent decoder part of autoencoder bottleneck, decoder part is inferred symmetrically
resize_method – up-sampling method, one of supported tf.image.ResizeMethod’s
pad – str, padding scheme: ‘SAME’ or ‘VALID’
linear_layer_ref – linear layer class to use
name – str, mame scope
reuse – bool

Returns:

per-channel list of lists of tensors holding encoded features, layer_wise from outer to inner tensor holding batch-wise flattened hidden state vector per-channel list of lists of tensors holding decoded features, layer-wise from inner to outer tensor holding reconstructed output None value

btgym.algorithms.nn.ae.beta_var_conv2d_autoencoder(inputs, layer_config, resize_method=0, pad='SAME', linear_layer_ref=<function linear>, name='vae_conv2d', max_batch_size=256, reuse=False)[source]¶

Variational autoencoder.

Papers:: https://arxiv.org/pdf/1312.6114.pdf https://arxiv.org/pdf/1606.05908.pdf http://www.matthey.me/pdf/betavae_iclr_2017.pdf

Parameters:

inputs – input tensor
layer_config – layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]; this list represent decoder part of autoencoder bottleneck, decoder part is inferred symmetrically
resize_method – up-sampling method, one of supported tf.image.ResizeMethod’s
pad – str, padding scheme: ‘SAME’ or ‘VALID’
linear_layer_ref – linear layer class - not used
name – str, mame scope
max_batch_size – int, dynamic batch size should be no greater than this value
reuse – bool

Returns:

list of tensors holding encoded features, layer_wise from outer to inner tensor holding batch-wise flattened hidden state vector list of tensors holding decoded features, layer-wise from inner to outer tensor holding reconstructed output tensor holding estimated KL divergence

class btgym.algorithms.nn.ae.KernelMonitor(conv_input, layer_output)[source]¶

Visualises convolution filters learnt for specific layer. Source: https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html

Parameters:	conv_input – convolution stack input tensor layer_output – tensor holding output of layer of interest from stack

fit(sess, kernel_index, step=0.001, num_steps=40)[source]¶

Learns input signal that maximizes the activation of given kernel.

Parameters:	sess – tf.Session object kernel_index – filter number of interest step – gradient ascent step size num_steps – number of steps to fit
Returns:	learnt signal as np.array