btgym.strategy package¶
btgym.strategy.base module¶
-
class
btgym.strategy.base.
BTgymBaseStrategy
(**kwargs)[source]¶ Controls Environment inner dynamics and backtesting logic. Provides gym’my (State, Action, Reward, Done, Info) data. Any State, Reward and Info computation logic can be implemented by subclassing BTgymStrategy and overriding get_[mode]_state(), get_reward(), get_info(), is_done() and set_datalines() methods. One can always go deeper and override __init__ () and next() methods for desired server cerebro engine behaviour, including order execution logic etc.
Note
- base class supports single asset iteration via default data_line named ‘base_asset’, see derived classes multi-asset support
- bt.observers.DrawDown observer will be automatically added to BTgymStrategy instance at runtime.
- Since it is bt.Strategy subclass, refer to https://www.backtrader.com/docu/strategy.html for more information.
Keyword Arguments: params (dict) – parameters dictionary, see Note below.
Notes – Due to backtrader convention, any strategy arguments should be defined inside params dictionary or passed as kwargs to bt.Cerebro() class via .addstrategy() method. Parameter dictionary should contain at least these keys:
state_shape: Observation state shape is dictionary of Gym spaces, by convention first dimension of every Gym Box space is time embedding one; cash_name: str, name for cash asset asset_names: iterable of str, names for assets start_cash: float, broker starting cash commission: float, broker commission value, .01 stands for 1% leverage: broker leverage, default is 1.0 order_size: dict of fixed order stakes (floats); keys should match assets names. drawdown_call: finish episode when hitting this drawdown treshghold , in percent. target_call: finish episode when reaching this profit target, in percent. portfolio_actions: possible agent actions. skip_frame: number of environment steps to skip before returning next response, e.g. if set to 10 -- agent will interact with environment every 10th step; every other step agent action is assumed to be 'hold'.
Default values are:
state_shape=dict(raw_state=spaces.Box(shape=(4, 4), low=0, high=0,)) cash_name='default_cash' asset_names=['default_asset'] start_cash=None commission=None leverage=1.0 drawdown_call=10 target_call=10 dataset_stat=None episode_stat=None portfolio_actions=('hold', 'buy', 'sell', 'close') skip_frame=1 order_size=None
-
next
()[source]¶ Default implementation for built-in backtrader method. Defines one step environment routine; Handles order execution logic according to action received. Note that orders can only be submitted for data_lines in action_space (assets). self.action attr. is updated by btgym.server._BTgymAnalyzer, and None actions are emitted while doing skip_frame loop.
-
update_broker_stat
()[source]¶ - Updates all sliding broker statistics deques with latest-step values such as:
- normalized broker value
- normalized broker cash
- normalized exposure (position size)
- exp. scaled episode duration in steps, normalized wrt. max possible episode steps
- normalized realized profit/loss for last closed trade (is zero if no pos. closures within last env. step)
- normalized profit/loss for current opened trade (unrealized p/l);
-
get_broker_value
(current_value, **kwargs)[source]¶ Parameters: current_value – current portfolio value Returns: normalized broker value.
-
get_broker_realized_pnl
(current_value, **kwargs)[source]¶ Parameters: current_value – current portfolio value Returns: normalized realized profit/loss for last closed trade (is zero if no pos. closures within last env. step)
-
get_broker_unrealized_pnl
(current_value, **kwargs)[source]¶ Parameters: current_value – current portfolio value Returns: normalized profit/loss for current opened trade
-
get_broker_episode_step
(**kwargs)[source]¶ Returns: exp. scaled episode duration in steps, normalized wrt. max possible episode steps
-
set_datalines
()[source]¶ Default datalines are: Open, Low, High, Close, Volume. Any other custom data lines, indicators, etc. should be explicitly defined by overriding this method. Invoked once by Strategy.__init__().
-
get_raw_state
()[source]¶ Default state observation composer.
Returns: 4 - number of signal features == state_shape[1], n - time-embedding length == state_shape[0] == <set by user>. Return type: and updates time-embedded environment state observation as [n,4] numpy matrix, where Note
self.raw_state is used to render environment human mode and should not be modified.
-
get_internal_state
()[source]¶ Composes internal state tensor by calling all statistics from broker_stat dictionary. Generally, this method should not be modified, implement corresponding get_broker_[mode]() methods.
-
get_state
()[source]¶ Collects estimated values for every mode of observation space by calling methods from collection_get_state_methods dictionary. As a rule, this method should not be modified, override or implement corresponding get_[mode]_state() methods, defining necessary calculations and return arbitrary shaped tensors for every space mode.
Note
- ‘data’ referes to bt.startegy datafeeds and should be treated as such.
- Datafeed Lines that are not default to BTgymStrategy should be explicitly defined by
- __init__() or define_datalines().
-
get_reward
()[source]¶ Shapes reward function as normalized single trade realized profit/loss, augmented with potential-based reward shaping functions in form of: F(s, a, s`) = gamma * FI(s`) - FI(s);
- potential FI_1 is current normalized unrealized profit/loss;
EXCLUDED/ - potential FI_2 is current normalized broker value. EXCLUDED/ - FI_3: penalizing exposure toward the end of episode
- Paper:
- “Policy invariance under reward transformations:
- Theory and application to reward shaping” by A. Ng et al., 1999; http://www.robotics.stanford.edu/~ang/papers/shaping-icml99.pdf
-
get_info
()[source]¶ Composes information part of environment response, can be any object. Override to own taste.
Note
Due to ‘skip_frame’ feature, INFO part of environment response transmitted by server can be a list containing either all skipped frame’s info objects, i.e. [info[-9], info[-8], …, info[0]] or just latest one, [info[0]]. This behaviour is set inside btgym.server._BTgymAnalyzer().next() method.