btgym.algorithms.nn.losses module¶
-
btgym.algorithms.nn.losses.
aac_loss_def
(act_target, adv_target, r_target, pi_logits, pi_vf, pi_prime_logits, entropy_beta, epsilon=None, name='_aac_', verbose=False)[source]¶ Advantage Actor Critic loss definition. Paper: https://arxiv.org/abs/1602.01783
Parameters: - act_target – tensor holding policy actions targets;
- adv_target – tensor holding policy estimated advantages targets;
- r_target – tensor holding policy empirical returns targets;
- pi_logits – policy logits output tensor;
- pi_prime_logits – not used;
- pi_vf – policy value function output tensor;
- entropy_beta – entropy regularization constant;
- epsilon – not used;
- name – scope;
- verbose – summary level.
Returns: tensor holding estimated AAC loss; list of related tensorboard summaries.
-
btgym.algorithms.nn.losses.
ppo_loss_def
(act_target, adv_target, r_target, pi_logits, pi_vf, pi_prime_logits, entropy_beta, epsilon, name='_ppo_', verbose=False)[source]¶ PPO clipped surrogate loss definition, as (7) in https://arxiv.org/pdf/1707.06347.pdf
Parameters: - act_target – tensor holding policy actions targets;
- adv_target – tensor holding policy estimated advantages targets;
- r_target – tensor holding policy empirical returns targets;
- pi_logits – policy logits output tensor;
- pi_vf – policy value function output tensor;
- pi_prime_logits – old_policy logits output tensor;
- entropy_beta – entropy regularization constant
- epsilon – L^Clip epsilon tensor;
- name – scope;
- verbose – summary level.
Returns: tensor holding estimated PPO L^Clip loss; list of related tensorboard summaries.
-
btgym.algorithms.nn.losses.
value_fn_loss_def
(r_target, pi_vf, name='_vr_', verbose=False)[source]¶ Value function loss.
Parameters: - r_target – tensor holding policy empirical returns targets;
- pi_vf – policy value function output tensor;
- name – scope;
- verbose – summary level.
Returns: tensor holding estimated value fn. loss; list of related tensorboard summaries.
-
btgym.algorithms.nn.losses.
pc_loss_def
(actions, targets, pi_pc_q, name='_pc_', verbose=False)[source]¶ Pixel control auxiliary task loss definition.
Paper: https://arxiv.org/abs/1611.05397
Borrows heavily from Kosuke Miyoshi code, under Apache License 2.0:
https://github.com/miyosuda/unreal
Parameters: - actions – tensor holding policy actions;
- targets – tensor holding estimated pixel-change targets;
- pi_pc_q – policy Q-value features output tensor;
- name – scope;
- verbose – summary level.
Returns: tensor holding estimated pc loss; list of related tensorboard summaries.
-
btgym.algorithms.nn.losses.
rp_loss_def
(rp_targets, pi_rp_logits, name='_rp_', verbose=False)[source]¶ Reward prediction auxillary task loss definition.
Paper: https://arxiv.org/abs/1611.05397
Borrows heavily from Kosuke Miyoshi code, under Apache License 2.0:
https://github.com/miyosuda/unreal
Parameters: - targets – tensor holding reward prediction target;
- pi_rp_logits – policy reward predictions tensor;
- name – scope;
- verbose – summary level.
Returns: tensor holding estimated rp loss; list of related tensorboard summaries.
-
btgym.algorithms.nn.losses.
ae_loss_def
(targets, logits, alpha=1.0, name='ae_loss', verbose=False, **kwargs)[source]¶ Mean quadratic autoencoder reconstruction loss definition
Parameters: - targets – tensor holding reconstruction target
- logits – t ensor holding decoded aa decoder output
- alpha – loss weight constant
- name – scope
- verbose – summary level.
Returns: tensor holding estimated reconstruction loss list of summarues
-
btgym.algorithms.nn.losses.
beta_vae_loss_def
(targets, logits, d_kl, alpha=1.0, beta=1.0, name='beta_vae_loss', verbose=False)[source]¶ Beta-variational autoencoder loss definition
- Papers:
- http://www.matthey.me/pdf/betavae_iclr_2017.pdf https://drive.google.com/file/d/0Bwy4Nlx78QCCNktVTFFMTUs4N2oxY295VU9qV25MWTBQS2Uw/view
Parameters: - targets –
- logits –
- d_kl –
- alpha –
- beta –
- name –
- verbose –
Returns: tensor holding estimated loss list of summarues
btgym.algorithms.nn.networks module¶
-
btgym.algorithms.nn.networks.
conv_2d_network
(x, ob_space, ac_space, conv_2d_layer_ref=<function conv2d>, conv_2d_num_filters=(32, 32, 64, 64), conv_2d_filter_size=(3, 3), conv_2d_stride=(2, 2), pad='SAME', dtype=tf.float32, name='conv2d', collections=None, reuse=False, keep_prob=None, **kwargs)[source]¶ Stage1 network: from preprocessed 2D input to estimated features. Encapsulates convolutions + layer normalisation + nonlinearity. Can be shared.
Returns: tensor holding state features;
-
btgym.algorithms.nn.networks.
conv_1d_network
(x, ob_space, ac_space, conv_1d_num_layers=4, conv_1d_num_filters=32, conv_1d_filter_size=3, conv_1d_stride=2, pad='SAME', dtype=tf.float32, collections=None, reuse=False, **kwargs)[source]¶ Stage1 network: from preprocessed 1D input to estimated features. Encapsulates convolutions, [possibly] skip-connections etc. Can be shared.
Returns: tensor holding state features;
-
btgym.algorithms.nn.networks.
lstm_network
(x, lstm_sequence_length, lstm_class=<class 'tensorflow.python.ops.rnn_cell_impl.BasicLSTMCell'>, lstm_layers=(256, ), static=False, keep_prob=None, name='lstm', reuse=False, **kwargs)[source]¶ Stage2 network: from features to flattened LSTM output. Defines [multi-layered] dynamic [possibly shared] LSTM network.
Returns: batch-wise flattened output tensor; lstm initial state tensor; lstm state output tensor; lstm flattened feed placeholders as tuple.
-
btgym.algorithms.nn.networks.
dense_aac_network
(x, ac_space_depth, name='dense_aac', linear_layer_ref=<function noisy_linear>, reuse=False)[source]¶ Stage3 network: from LSTM flattened output to advantage actor-critic.
Returns: - logits tensor
- value function tensor action sampling function.
for every space in ac_space_shape dictionary
Return type: dictionary containg tuples
-
btgym.algorithms.nn.networks.
dense_rp_network
(x, linear_layer_ref=<function noisy_linear>)[source]¶ Stage3 network: From shared convolutions to reward-prediction task output tensor.
-
btgym.algorithms.nn.networks.
pixel_change_2d_estimator
(ob_space, pc_estimator_stride=(2, 2), **kwargs)[source]¶ Defines tf operation for estimating pixel change as subsampled absolute difference of two states.
Note
crops input array by one pix from either side; –> 1D signal to be shaped as [signal_length, 3]
-
btgym.algorithms.nn.networks.
duelling_pc_network
(x, ac_space, duell_pc_x_inner_shape=(9, 9, 32), duell_pc_filter_size=(4, 4), duell_pc_stride=(2, 2), linear_layer_ref=<function noisy_linear>, reuse=False, **kwargs)[source]¶ Stage3 network for `pixel control’ task: from LSTM output to Q-aux. features tensor.
btgym.algorithms.nn.layers module¶
-
btgym.algorithms.nn.layers.
categorical_sample
(logits, depth)[source]¶ Given logits returns one-hot encoded categorical sample. :param logits: :param depth:
Returns: tensor of shape [batch_dim, logits_depth]
-
btgym.algorithms.nn.layers.
linear
(x, size, name, initializer=None, bias_init=0, reuse=False)[source]¶ Linear network layer.
-
btgym.algorithms.nn.layers.
noisy_linear
(x, size, name, bias=True, activation_fn=<function identity>, reuse=False, **kwargs)[source]¶ Noisy Net linear network layer using Factorised Gaussian noise; Code by Andrew Liao, https://github.com/andrewliao11/NoisyNet-DQN
-
btgym.algorithms.nn.layers.
conv2d
(x, num_filters, name, filter_size=(3, 3), stride=(1, 1), pad='SAME', dtype=tf.float32, collections=None, reuse=False)[source]¶ 2D convolution layer.
-
btgym.algorithms.nn.layers.
deconv2d
(x, output_channels, name, filter_size=(4, 4), stride=(2, 2), dtype=tf.float32, collections=None, reuse=False)[source]¶ Deconvolution layer, paper: http://www.matthewzeiler.com/wp-content/uploads/2017/07/cvpr2010.pdf
btgym.algorithms.nn.ae module¶
-
btgym.algorithms.nn.ae.
conv2d_encoder
(x, layer_config=((32, (3, 1), (2, 1)), (32, (3, 1), (2, 1)), (32, (3, 1), (2, 1))), pad='SAME', name='encoder', reuse=False)[source]¶ Defines convolutional encoder.
Parameters: - x – input tensor
- layer_config – first to last nested layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]
- pad – str, padding scheme: ‘SAME’ or ‘VALID’
- name – str, mame scope
- reuse – bool
Returns: list of tensors holding encoded features for every layer outer to inner, level-wise list of encoding layers shapes, first ro last.
-
btgym.algorithms.nn.ae.
conv2d_decoder
(z, layer_shapes, layer_config=((32, (3, 1), (2, 1)), (32, (3, 1), (2, 1)), (32, (3, 1), (2, 1))), pad='SAME', resize_method=0, name='decoder', reuse=False)[source]¶ Defines convolutional decoder.
Parameters: - z – tensor holding encoded state
- layer_shapes – level-wise list of matching encoding layers shapes, last to first.
- layer_config – layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]
- pad – str, padding scheme: ‘SAME’ or ‘VALID’
- resize_method – up-sampling method, one of supported tf.image.ResizeMethod’s
- name – str, mame scope
- reuse – bool
Returns: list of tensors holding decoded features for every layer inner to outer
-
btgym.algorithms.nn.ae.
conv2d_autoencoder
(inputs, layer_config, resize_method=0, pad='SAME', linear_layer_ref=<function linear>, name='base_conv2d_autoencoder', reuse=False, **kwargs)[source]¶ Basic convolutional autoencoder. Hidden state is passed through dense linear layer.
Parameters: - inputs – input tensor
- layer_config – layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]; this list represent decoder part of autoencoder bottleneck, decoder part is inferred symmetrically
- resize_method – up-sampling method, one of supported tf.image.ResizeMethod’s
- pad – str, padding scheme: ‘SAME’ or ‘VALID’
- linear_layer_ref – linear layer class to use
- name – str, mame scope
- reuse – bool
Returns: list of tensors holding encoded features, layer_wise from outer to inner tensor holding batch-wise flattened hidden state vector list of tensors holding decoded features, layer-wise from inner to outer tensor holding reconstructed output None value
-
btgym.algorithms.nn.ae.
cw_conv2d_autoencoder
(inputs, layer_config, resize_method=0, pad='SAME', linear_layer_ref=<function linear>, name='cw_conv2d_autoencoder', reuse=False, **kwargs)[source]¶ Channel-wise convolutional autoencoder. Hidden state is passed through dense linear layer. Pain-slow, do not use.
Parameters: - inputs – input tensor
- layer_config – layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]; this list represent decoder part of autoencoder bottleneck, decoder part is inferred symmetrically
- resize_method – up-sampling method, one of supported tf.image.ResizeMethod’s
- pad – str, padding scheme: ‘SAME’ or ‘VALID’
- linear_layer_ref – linear layer class to use
- name – str, mame scope
- reuse – bool
Returns: per-channel list of lists of tensors holding encoded features, layer_wise from outer to inner tensor holding batch-wise flattened hidden state vector per-channel list of lists of tensors holding decoded features, layer-wise from inner to outer tensor holding reconstructed output None value
-
btgym.algorithms.nn.ae.
beta_var_conv2d_autoencoder
(inputs, layer_config, resize_method=0, pad='SAME', linear_layer_ref=<function linear>, name='vae_conv2d', max_batch_size=256, reuse=False)[source]¶ Variational autoencoder.
- Papers:
- https://arxiv.org/pdf/1312.6114.pdf https://arxiv.org/pdf/1606.05908.pdf http://www.matthey.me/pdf/betavae_iclr_2017.pdf
Parameters: - inputs – input tensor
- layer_config – layers configuration list: [layer_1_config, layer_2_config,…], where: layer_i_config = [num_filters(int), filter_size(list), stride(list)]; this list represent decoder part of autoencoder bottleneck, decoder part is inferred symmetrically
- resize_method – up-sampling method, one of supported tf.image.ResizeMethod’s
- pad – str, padding scheme: ‘SAME’ or ‘VALID’
- linear_layer_ref – linear layer class - not used
- name – str, mame scope
- max_batch_size – int, dynamic batch size should be no greater than this value
- reuse – bool
Returns: list of tensors holding encoded features, layer_wise from outer to inner tensor holding batch-wise flattened hidden state vector list of tensors holding decoded features, layer-wise from inner to outer tensor holding reconstructed output tensor holding estimated KL divergence
-
class
btgym.algorithms.nn.ae.
KernelMonitor
(conv_input, layer_output)[source]¶ Visualises convolution filters learnt for specific layer. Source: https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html
Parameters: - conv_input – convolution stack input tensor
- layer_output – tensor holding output of layer of interest from stack
-
fit
(sess, kernel_index, step=0.001, num_steps=40)[source]¶ Learns input signal that maximizes the activation of given kernel.
Parameters: - sess – tf.Session object
- kernel_index – filter number of interest
- step – gradient ascent step size
- num_steps – number of steps to fit
Returns: learnt signal as np.array