btgym.research.gps.aac module¶

class btgym.research.gps.aac.GuidedAAC(expert_loss=<function guided_aac_loss_def_0_3>, aac_lambda=1.0, guided_lambda=1.0, guided_decay_steps=None, runner_config=None, aux_render_modes=None, name='GuidedA3C', **kwargs)[source]¶

Actor-critic framework augmented with expert actions imitation loss: L_gps = aac_lambda * L_a3c + guided_lambda * L_im.

This implementation is loosely refereed as ‘guided policy search’ after algorithm described in paper by S. Levine and P. Abbeel Learning Neural Network Policies with Guided PolicySearch under Unknown Dynamics

in a sense that exploits idea of fitting ‘local’ (here - single episode) oracle for environment with generally unknown dynamics and use actions demonstrated by it to optimize trajectory distribution for training agent.

Note that this particular implementation of expert does not provides complete action-state space trajectory for agent to follow. Instead it estimates advised categorical distribution over actions conditioned on external (i.e. price dynamics) state observations only.

Papers:

Levine et al., ‘Learning Neural Network Policies with Guided PolicySearch under Unknown Dynamics’

https://people.eecs.berkeley.edu/~svlevine/papers/mfcgps.pdf
Brys et al., ‘Reinforcement Learning from Demonstration through Shaping’

https://www.ijcai.org/Proceedings/15/Papers/472.pdf
Wiewiora et al., ‘Principled Methods for Advising Reinforcement Learning Agents’

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.6412&rep=rep1&type=pdf

Parameters:	expert_loss – callable returning tensor holding on_policy imitation loss graph and summaries aac_lambda – float, main on_policy a3c loss lambda guided_lambda – float, imitation loss lambda guided_decay_steps – number of steps guided_lambda is annealed to zero name – str, name scope **kwargs – see BaseAAC kwargs

class btgym.research.gps.aac.VerboseGuidedAAC(runner_config=None, aux_render_modes=('action_prob', 'value_fn', 'lstm_1_h', 'lstm_2_h'), name='VerboseGuidedA3C', **kwargs)[source]¶: Extends parent GuidedAAC class with additional summaries.

btgym.research.gps.policy module¶

class btgym.research.gps.policy.GuidedPolicy_0_0(conv_2d_layer_config=((32, (3, 1), (2, 1)), (32, (3, 1), (2, 1)), (64, (3, 1), (2, 1)), (64, (3, 1), (2, 1))), lstm_class_ref=<class 'tensorflow.contrib.rnn.python.ops.rnn_cell.LayerNormBasicLSTMCell'>, lstm_layers=(256, 256), lstm_2_init_period=50, linear_layer_ref=<function noisy_linear>, **kwargs)[source]¶: Guided policy: simple configuration wrapper around Stacked LSTM architecture.

btgym.research.gps.loss module¶

btgym.research.gps.loss.guided_aac_loss_def_0_0(pi_actions, expert_actions, name='on_policy/aac', verbose=False, **kwargs)[source]¶

Cross-entropy imitation loss on expert actions.

Parameters:	pi_actions – tensor holding policy actions logits expert_actions – tensor holding expert actions probability distribution name – loss op name scope
Returns:	tensor holding estimated imitation loss; list of related tensorboard summaries.

btgym.research.gps.loss.guided_aac_loss_def_0_1(pi_actions, expert_actions, name='on_policy/aac', verbose=False, **kwargs)[source]¶

Cross-entropy imitation loss on {buy, sell} subset of expert actions.

Parameters:	pi_actions – tensor holding policy actions logits expert_actions – tensor holding expert actions probability distribution name – loss op name scope
Returns:	tensor holding estimated imitation loss; list of related tensorboard summaries.

btgym.research.gps.loss.guided_aac_loss_def_0_3(pi_actions, expert_actions, name='on_policy/aac', verbose=False, **kwargs)[source]¶

MSE imitation loss on {buy, sell} subset of expert actions.

Parameters:	pi_actions – tensor holding policy actions logits expert_actions – tensor holding expert actions probability distribution name – loss op name scope
Returns:	tensor holding estimated imitation loss; list of related tensorboard summaries.

btgym.research.gps.strategy module¶

class btgym.research.gps.strategy.GuidedStrategy_0_0(**kwargs)[source]¶

Augments observation state with expert actions predictions estimated by accessing entire episode data (=cheating).

nextstart()[source]¶: Overrides base method augmenting it with estimating expert actions before actual episode starts.

class btgym.research.gps.strategy.ExpertObserver(*args, **kwargs)[source]¶: Keeps track of expert-advised actions. Single data_feed.

btgym.research.gps.oracle module¶

class btgym.research.gps.oracle.Oracle(action_space=(0, 1, 2, 3), time_threshold=5, pips_threshold=10, pips_scale=0.0001, kernel_size=5, kernel_stddev=1)[source]¶

Irresponsible financial adviser.

Parameters:

action_space – actions to advice: 0 - hold, 1- buy, 2 - sell, 3 - close
time_threshold – how many points (in number of ENVIRONMENT timesteps) on each side to use for the comparison to consider comparator(n, n+x) to be True
pips_threshold – int, minimal peaks difference in pips to consider comparator(n, n+x) to be True
pips_scale – actual single pip value wrt signal value
kernel_size – gaussian convolution kernel size (used to compute distribution over actions)
kernel_stddev – gaussian kernel standard deviation

filter_by_margine(lst, threshold)[source]¶

Filters out peaks by their ‘value’ difference withing tolerance given. Filtering is done from first to last index by removing every succeeding element of list from now on if its value difference with value in hand is less than given threshold.

Parameters:	lst – list of tuples; each tuple is (value, index) threshold – value filtering threshold
Returns:	filtered out list of tuples

estimate_actions(episode_data)[source]¶

Estimates hold/buy/sell signals based on local peaks filtered by time horizon and signal amplitude.

Parameters:	episode_data – 1D np.array of unscaled [but possibly resampled] price values in OHL[CV] format
Returns:	1D vector of signals of same length as episode_data

adjust_signals(signal)[source]¶: Add simple heuristics (based on examining learnt policy actions distribution): - repeat same buy or sell signal kernel_size - 1 times.

fit(episode_data, resampling_factor=1)[source]¶

Estimates advised actions probabilities distribution based on data received.

Parameters:	episode_data – 1D np.array of unscaled price values in OHL[CV] format resampling_factor – factor by which to resample given data by taking min/max values inside every resampled bar
Returns:	Np.array of size [resampled_data_size, actions_space_size] of probabilities of advised actions, where resampled_data_size = int(len(episode_data) / resampling_factor) + 1/0

resample_data(episode_data, factor=1)[source]¶

Resamples raw observations according to given skip_frame value and estimates mean value of newly formed bars.

Parameters:	episode_data – np.array of shape [episode_length, values] factor – scalar
Returns:	np.array of median Hi/Lo observations of size [int(episode_length/skip_frame) + 1, 1]

class btgym.research.gps.oracle.Oracle2(action_space=(0, 1, 2, 3), gamma=1.0, **kwargs)[source]¶

[less]Irresponsible financial adviser.

Parameters:	action_space – actions to advice: 0 - hold, 1- buy, 2 - sell, 3 - close gamma – price movement horizon discount, in (0, 1]

p_up(x, gamma=1.0)[source]¶: Discounted rise potential

p_down(x, gamma=1.0)[source]¶: Discounted fall potential

fit(episode_data, resampling_factor=1)[source]¶

Estimates advised actions probabilities distribution based on data received.

Parameters:	episode_data – 1D np.array of unscaled price values in OHL[CV] format resampling_factor – factor by which to resample given data by taking min/max values inside every resampled bar
Returns:	Np.array of size [resampled_data_size, actions_space_size] of probabilities of advised actions, where resampled_data_size = int(len(episode_data) / resampling_factor) + 1/0

static resample_data(episode_data, factor=1)[source]¶

Resamples raw observations according to given skip_frame value and estimates mean value of newly formed bars.

Parameters:	episode_data – np.array of shape [episode_length, values] factor – scalar
Returns:	np.array of median Hi/Lo observations of size [int(episode_length/skip_frame) + 1, 1]