btgym.research.gps.aac module

class btgym.research.gps.aac.GuidedAAC(expert_loss=<function guided_aac_loss_def_0_3>, aac_lambda=1.0, guided_lambda=1.0, guided_decay_steps=None, runner_config=None, aux_render_modes=None, name='GuidedA3C', **kwargs)[source]

Actor-critic framework augmented with expert actions imitation loss: L_gps = aac_lambda * L_a3c + guided_lambda * L_im.

This implementation is loosely refereed as ‘guided policy search’ after algorithm described in paper by S. Levine and P. Abbeel Learning Neural Network Policies with Guided PolicySearch under Unknown Dynamics

in a sense that exploits idea of fitting ‘local’ (here - single episode) oracle for environment with generally unknown dynamics and use actions demonstrated by it to optimize trajectory distribution for training agent.

Note that this particular implementation of expert does not provides complete action-state space trajectory for agent to follow. Instead it estimates advised categorical distribution over actions conditioned on external (i.e. price dynamics) state observations only.

Papers:
Parameters:
  • expert_loss – callable returning tensor holding on_policy imitation loss graph and summaries
  • aac_lambda – float, main on_policy a3c loss lambda
  • guided_lambda – float, imitation loss lambda
  • guided_decay_steps – number of steps guided_lambda is annealed to zero
  • name – str, name scope
  • **kwargs – see BaseAAC kwargs
class btgym.research.gps.aac.VerboseGuidedAAC(runner_config=None, aux_render_modes=('action_prob', 'value_fn', 'lstm_1_h', 'lstm_2_h'), name='VerboseGuidedA3C', **kwargs)[source]

Extends parent GuidedAAC class with additional summaries.

btgym.research.gps.policy module

class btgym.research.gps.policy.GuidedPolicy_0_0(conv_2d_layer_config=((32, (3, 1), (2, 1)), (32, (3, 1), (2, 1)), (64, (3, 1), (2, 1)), (64, (3, 1), (2, 1))), lstm_class_ref=<class 'tensorflow.contrib.rnn.python.ops.rnn_cell.LayerNormBasicLSTMCell'>, lstm_layers=(256, 256), lstm_2_init_period=50, linear_layer_ref=<function noisy_linear>, **kwargs)[source]

Guided policy: simple configuration wrapper around Stacked LSTM architecture.

btgym.research.gps.loss module

btgym.research.gps.loss.guided_aac_loss_def_0_0(pi_actions, expert_actions, name='on_policy/aac', verbose=False, **kwargs)[source]

Cross-entropy imitation loss on expert actions.

Parameters:
  • pi_actions – tensor holding policy actions logits
  • expert_actions – tensor holding expert actions probability distribution
  • name – loss op name scope
Returns:

tensor holding estimated imitation loss; list of related tensorboard summaries.

btgym.research.gps.loss.guided_aac_loss_def_0_1(pi_actions, expert_actions, name='on_policy/aac', verbose=False, **kwargs)[source]

Cross-entropy imitation loss on {buy, sell} subset of expert actions.

Parameters:
  • pi_actions – tensor holding policy actions logits
  • expert_actions – tensor holding expert actions probability distribution
  • name – loss op name scope
Returns:

tensor holding estimated imitation loss; list of related tensorboard summaries.

btgym.research.gps.loss.guided_aac_loss_def_0_3(pi_actions, expert_actions, name='on_policy/aac', verbose=False, **kwargs)[source]

MSE imitation loss on {buy, sell} subset of expert actions.

Parameters:
  • pi_actions – tensor holding policy actions logits
  • expert_actions – tensor holding expert actions probability distribution
  • name – loss op name scope
Returns:

tensor holding estimated imitation loss; list of related tensorboard summaries.

btgym.research.gps.strategy module

class btgym.research.gps.strategy.GuidedStrategy_0_0(**kwargs)[source]

Augments observation state with expert actions predictions estimated by accessing entire episode data (=cheating).

nextstart()[source]

Overrides base method augmenting it with estimating expert actions before actual episode starts.

class btgym.research.gps.strategy.ExpertObserver(*args, **kwargs)[source]

Keeps track of expert-advised actions. Single data_feed.

btgym.research.gps.oracle module

class btgym.research.gps.oracle.Oracle(action_space=(0, 1, 2, 3), time_threshold=5, pips_threshold=10, pips_scale=0.0001, kernel_size=5, kernel_stddev=1)[source]

Irresponsible financial adviser.

Parameters:
  • action_space – actions to advice: 0 - hold, 1- buy, 2 - sell, 3 - close
  • time_threshold – how many points (in number of ENVIRONMENT timesteps) on each side to use for the comparison to consider comparator(n, n+x) to be True
  • pips_threshold – int, minimal peaks difference in pips to consider comparator(n, n+x) to be True
  • pips_scale – actual single pip value wrt signal value
  • kernel_size – gaussian convolution kernel size (used to compute distribution over actions)
  • kernel_stddev – gaussian kernel standard deviation
filter_by_margine(lst, threshold)[source]

Filters out peaks by their ‘value’ difference withing tolerance given. Filtering is done from first to last index by removing every succeeding element of list from now on if its value difference with value in hand is less than given threshold.

Parameters:
  • lst – list of tuples; each tuple is (value, index)
  • threshold – value filtering threshold
Returns:

filtered out list of tuples

estimate_actions(episode_data)[source]

Estimates hold/buy/sell signals based on local peaks filtered by time horizon and signal amplitude.

Parameters:episode_data – 1D np.array of unscaled [but possibly resampled] price values in OHL[CV] format
Returns:1D vector of signals of same length as episode_data
adjust_signals(signal)[source]

Add simple heuristics (based on examining learnt policy actions distribution): - repeat same buy or sell signal kernel_size - 1 times.

fit(episode_data, resampling_factor=1)[source]

Estimates advised actions probabilities distribution based on data received.

Parameters:
  • episode_data – 1D np.array of unscaled price values in OHL[CV] format
  • resampling_factor – factor by which to resample given data by taking min/max values inside every resampled bar
Returns:

Np.array of size [resampled_data_size, actions_space_size] of probabilities of advised actions, where resampled_data_size = int(len(episode_data) / resampling_factor) + 1/0

resample_data(episode_data, factor=1)[source]

Resamples raw observations according to given skip_frame value and estimates mean value of newly formed bars.

Parameters:
  • episode_data – np.array of shape [episode_length, values]
  • factor – scalar
Returns:

np.array of median Hi/Lo observations of size [int(episode_length/skip_frame) + 1, 1]

class btgym.research.gps.oracle.Oracle2(action_space=(0, 1, 2, 3), gamma=1.0, **kwargs)[source]

[less]Irresponsible financial adviser.

Parameters:
  • action_space – actions to advice: 0 - hold, 1- buy, 2 - sell, 3 - close
  • gamma – price movement horizon discount, in (0, 1]
p_up(x, gamma=1.0)[source]

Discounted rise potential

p_down(x, gamma=1.0)[source]

Discounted fall potential

fit(episode_data, resampling_factor=1)[source]

Estimates advised actions probabilities distribution based on data received.

Parameters:
  • episode_data – 1D np.array of unscaled price values in OHL[CV] format
  • resampling_factor – factor by which to resample given data by taking min/max values inside every resampled bar
Returns:

Np.array of size [resampled_data_size, actions_space_size] of probabilities of advised actions, where resampled_data_size = int(len(episode_data) / resampling_factor) + 1/0

static resample_data(episode_data, factor=1)[source]

Resamples raw observations according to given skip_frame value and estimates mean value of newly formed bars.

Parameters:
  • episode_data – np.array of shape [episode_length, values]
  • factor – scalar
Returns:

np.array of median Hi/Lo observations of size [int(episode_length/skip_frame) + 1, 1]