btgym.strategy package

btgym.strategy.base module

class btgym.strategy.base.BTgymBaseStrategy(**kwargs)[source]

Controls Environment inner dynamics and backtesting logic. Provides gym’my (State, Action, Reward, Done, Info) data. Any State, Reward and Info computation logic can be implemented by subclassing BTgymStrategy and overriding get_[mode]_state(), get_reward(), get_info(), is_done() and set_datalines() methods. One can always go deeper and override __init__ () and next() methods for desired server cerebro engine behaviour, including order execution logic etc.

Note

  • base class supports single asset iteration via default data_line named ‘base_asset’, see derived classes multi-asset support
  • bt.observers.DrawDown observer will be automatically added to BTgymStrategy instance at runtime.
  • Since it is bt.Strategy subclass, refer to https://www.backtrader.com/docu/strategy.html for more information.
Keyword Arguments:
 
  • params (dict) – parameters dictionary, see Note below.

  • Notes – Due to backtrader convention, any strategy arguments should be defined inside params dictionary or passed as kwargs to bt.Cerebro() class via .addstrategy() method. Parameter dictionary should contain at least these keys:

    state_shape:        Observation state shape is dictionary of Gym spaces, by convention
                        first dimension of every Gym Box space is time embedding one;
    cash_name:          str, name for cash asset
    asset_names:        iterable of str, names for assets
    start_cash:         float, broker starting cash
    commission:         float, broker commission value, .01 stands for 1%
    leverage:           broker leverage, default is 1.0
    order_size:         dict of fixed order stakes (floats); keys should match assets names.
    drawdown_call:      finish episode when hitting this drawdown treshghold , in percent.
    target_call:        finish episode when reaching this profit target, in percent.
    portfolio_actions:  possible agent actions.
    skip_frame:         number of environment steps to skip before returning next response,
                        e.g. if set to 10 -- agent will interact with environment every 10th step;
                        every other step agent action is assumed to be 'hold'.
    

    Default values are:

    state_shape=dict(raw_state=spaces.Box(shape=(4, 4), low=0, high=0,))
    cash_name='default_cash'
    asset_names=['default_asset']
    start_cash=None
    commission=None
    leverage=1.0
    drawdown_call=10
    target_call=10
    dataset_stat=None
    episode_stat=None
    portfolio_actions=('hold', 'buy', 'sell', 'close')
    skip_frame=1
    order_size=None
    
next()[source]

Default implementation for built-in backtrader method. Defines one step environment routine; Handles order execution logic according to action received. Note that orders can only be submitted for data_lines in action_space (assets). self.action attr. is updated by btgym.server._BTgymAnalyzer, and None actions are emitted while doing skip_frame loop.

update_broker_stat()[source]
Updates all sliding broker statistics deques with latest-step values such as:
  • normalized broker value
  • normalized broker cash
  • normalized exposure (position size)
  • exp. scaled episode duration in steps, normalized wrt. max possible episode steps
  • normalized realized profit/loss for last closed trade (is zero if no pos. closures within last env. step)
  • normalized profit/loss for current opened trade (unrealized p/l);
get_broker_value(current_value, **kwargs)[source]
Parameters:current_value – current portfolio value
Returns:normalized broker value.
get_broker_cash(**kwargs)[source]
Returns:normalized broker cash
get_broker_exposure(**kwargs)[source]
Returns:normalized exposure (position size)
get_broker_pos_direction(**kwargs)[source]
Returns:short/long/out position indicator
get_broker_realized_pnl(current_value, **kwargs)[source]
Parameters:current_value – current portfolio value
Returns:normalized realized profit/loss for last closed trade (is zero if no pos. closures within last env. step)
get_broker_unrealized_pnl(current_value, **kwargs)[source]
Parameters:current_value – current portfolio value
Returns:normalized profit/loss for current opened trade
get_broker_episode_step(**kwargs)[source]
Returns:exp. scaled episode duration in steps, normalized wrt. max possible episode steps
get_broker_drawdown(**kwargs)[source]
Returns:current drawdown value
set_datalines()[source]

Default datalines are: Open, Low, High, Close, Volume. Any other custom data lines, indicators, etc. should be explicitly defined by overriding this method. Invoked once by Strategy.__init__().

get_raw_state()[source]

Default state observation composer.

Returns:4 - number of signal features == state_shape[1], n - time-embedding length == state_shape[0] == <set by user>.
Return type:and updates time-embedded environment state observation as [n,4] numpy matrix, where

Note

self.raw_state is used to render environment human mode and should not be modified.

get_internal_state()[source]

Composes internal state tensor by calling all statistics from broker_stat dictionary. Generally, this method should not be modified, implement corresponding get_broker_[mode]() methods.

get_state()[source]

Collects estimated values for every mode of observation space by calling methods from collection_get_state_methods dictionary. As a rule, this method should not be modified, override or implement corresponding get_[mode]_state() methods, defining necessary calculations and return arbitrary shaped tensors for every space mode.

Note

  • ‘data’ referes to bt.startegy datafeeds and should be treated as such.
    Datafeed Lines that are not default to BTgymStrategy should be explicitly defined by
    __init__() or define_datalines().
get_reward()[source]

Shapes reward function as normalized single trade realized profit/loss, augmented with potential-based reward shaping functions in form of: F(s, a, s`) = gamma * FI(s`) - FI(s);

  • potential FI_1 is current normalized unrealized profit/loss;

EXCLUDED/ - potential FI_2 is current normalized broker value. EXCLUDED/ - FI_3: penalizing exposure toward the end of episode

Paper:
“Policy invariance under reward transformations:
Theory and application to reward shaping” by A. Ng et al., 1999; http://www.robotics.stanford.edu/~ang/papers/shaping-icml99.pdf
get_info()[source]

Composes information part of environment response, can be any object. Override to own taste.

Note

Due to ‘skip_frame’ feature, INFO part of environment response transmitted by server can be a list containing either all skipped frame’s info objects, i.e. [info[-9], info[-8], …, info[0]] or just latest one, [info[0]]. This behaviour is set inside btgym.server._BTgymAnalyzer().next() method.

get_done()[source]

Episode termination estimator, defines any trading logic conditions episode stop is called upon, e.g. <OMG! Stop it, we became too rich!>. It is just a structural a convention method. Default method is empty.

Expected to return:
tuple (<is_done, type=bool>, <message, type=str>).
notify_order(order)[source]

Shamelessly taken from backtrader tutorial. TODO: better multi data support

btgym.strategy.observers module

class btgym.strategy.observers.Reward(*args, **kwargs)[source]

Keeps track of reward values.

class btgym.strategy.observers.Position(*args, **kwargs)[source]

Keeps track of position size.

class btgym.strategy.observers.NormPnL(*args, **kwargs)[source]

Keeps track of PnL stats.