btgym.strategy package¶

btgym.strategy.base module¶

class btgym.strategy.base.BTgymBaseStrategy(**kwargs)[source]¶

Controls Environment inner dynamics and backtesting logic. Provides gym’my (State, Action, Reward, Done, Info) data. Any State, Reward and Info computation logic can be implemented by subclassing BTgymStrategy and overriding get_[mode]_state(), get_reward(), get_info(), is_done() and set_datalines() methods. One can always go deeper and override __init__ () and next() methods for desired server cerebro engine behaviour, including order execution logic etc.

Note

base class supports single asset iteration via default data_line named ‘base_asset’, see derived classes multi-asset support
bt.observers.DrawDown observer will be automatically added to BTgymStrategy instance at runtime.
Since it is bt.Strategy subclass, refer to https://www.backtrader.com/docu/strategy.html for more information.

Keyword Arguments:

params (dict) – parameters dictionary, see Note below.

Notes – Due to backtrader convention, any strategy arguments should be defined inside params dictionary or passed as kwargs to bt.Cerebro() class via .addstrategy() method. Parameter dictionary should contain at least these keys:

state_shape:        Observation state shape is dictionary of Gym spaces, by convention
                    first dimension of every Gym Box space is time embedding one;
cash_name:          str, name for cash asset
asset_names:        iterable of str, names for assets
start_cash:         float, broker starting cash
commission:         float, broker commission value, .01 stands for 1%
leverage:           broker leverage, default is 1.0
order_size:         dict of fixed order stakes (floats); keys should match assets names.
drawdown_call:      finish episode when hitting this drawdown treshghold , in percent.
target_call:        finish episode when reaching this profit target, in percent.
portfolio_actions:  possible agent actions.
skip_frame:         number of environment steps to skip before returning next response,
                    e.g. if set to 10 -- agent will interact with environment every 10th step;
                    every other step agent action is assumed to be 'hold'.

Default values are:

state_shape=dict(raw_state=spaces.Box(shape=(4, 4), low=0, high=0,))
cash_name='default_cash'
asset_names=['default_asset']
start_cash=None
commission=None
leverage=1.0
drawdown_call=10
target_call=10
dataset_stat=None
episode_stat=None
portfolio_actions=('hold', 'buy', 'sell', 'close')
skip_frame=1
order_size=None

next()[source]¶: Default implementation for built-in backtrader method. Defines one step environment routine; Handles order execution logic according to action received. Note that orders can only be submitted for data_lines in action_space (assets). self.action attr. is updated by btgym.server._BTgymAnalyzer, and None actions are emitted while doing skip_frame loop.

update_broker_stat()[source]¶

Updates all sliding broker statistics deques with latest-step values such as:

normalized broker value
normalized broker cash
normalized exposure (position size)
exp. scaled episode duration in steps, normalized wrt. max possible episode steps
normalized realized profit/loss for last closed trade (is zero if no pos. closures within last env. step)
normalized profit/loss for current opened trade (unrealized p/l);

get_broker_value(current_value, **kwargs)[source]¶

Parameters:	current_value – current portfolio value
Returns:	normalized broker value.

get_broker_cash(**kwargs)[source]¶

Returns:	normalized broker cash

get_broker_exposure(**kwargs)[source]¶

Returns:	normalized exposure (position size)

get_broker_pos_direction(**kwargs)[source]¶

Returns:	short/long/out position indicator

get_broker_realized_pnl(current_value, **kwargs)[source]¶

Parameters:	current_value – current portfolio value
Returns:	normalized realized profit/loss for last closed trade (is zero if no pos. closures within last env. step)

get_broker_unrealized_pnl(current_value, **kwargs)[source]¶

Parameters:	current_value – current portfolio value
Returns:	normalized profit/loss for current opened trade

get_broker_episode_step(**kwargs)[source]¶

Returns:	exp. scaled episode duration in steps, normalized wrt. max possible episode steps

get_broker_drawdown(**kwargs)[source]¶

Returns:	current drawdown value

set_datalines()[source]¶: Default datalines are: Open, Low, High, Close, Volume. Any other custom data lines, indicators, etc. should be explicitly defined by overriding this method. Invoked once by Strategy.__init__().

get_raw_state()[source]¶

Default state observation composer.

Returns:	4 - number of signal features == state_shape[1], n - time-embedding length == state_shape[0] == <set by user>.
Return type:	and updates time-embedded environment state observation as [n,4] numpy matrix, where

Note

self.raw_state is used to render environment human mode and should not be modified.

get_internal_state()[source]¶: Composes internal state tensor by calling all statistics from broker_stat dictionary. Generally, this method should not be modified, implement corresponding get_broker_[mode]() methods.

get_state()[source]¶

Collects estimated values for every mode of observation space by calling methods from collection_get_state_methods dictionary. As a rule, this method should not be modified, override or implement corresponding get_[mode]_state() methods, defining necessary calculations and return arbitrary shaped tensors for every space mode.

Note

‘data’ referes to bt.startegy datafeeds and should be treated as such.

Datafeed Lines that are not default to BTgymStrategy should be explicitly defined by

__init__() or define_datalines().

get_reward()[source]¶

Shapes reward function as normalized single trade realized profit/loss, augmented with potential-based reward shaping functions in form of: F(s, a, s`) = gamma * FI(s`) - FI(s);

potential FI_1 is current normalized unrealized profit/loss;

EXCLUDED/ - potential FI_2 is current normalized broker value. EXCLUDED/ - FI_3: penalizing exposure toward the end of episode

Paper:

“Policy invariance under reward transformations:: Theory and application to reward shaping” by A. Ng et al., 1999; http://www.robotics.stanford.edu/~ang/papers/shaping-icml99.pdf

get_info()[source]¶: Composes information part of environment response, can be any object. Override to own taste.

Note

Due to ‘skip_frame’ feature, INFO part of environment response transmitted by server can be a list containing either all skipped frame’s info objects, i.e. [info[-9], info[-8], …, info[0]] or just latest one, [info[0]]. This behaviour is set inside btgym.server._BTgymAnalyzer().next() method.

get_done()[source]¶

Episode termination estimator, defines any trading logic conditions episode stop is called upon, e.g. <OMG! Stop it, we became too rich!>. It is just a structural a convention method. Default method is empty.

Expected to return:: tuple (<is_done, type=bool>, <message, type=str>).

notify_order(order)[source]¶: Shamelessly taken from backtrader tutorial. TODO: better multi data support

btgym.strategy.observers module¶

class btgym.strategy.observers.Reward(*args, **kwargs)[source]¶: Keeps track of reward values.

class btgym.strategy.observers.Position(*args, **kwargs)[source]¶: Keeps track of position size.

class btgym.strategy.observers.NormPnL(*args, **kwargs)[source]¶: Keeps track of PnL stats.