Package Description ------------------- **Btgym** is an OpenAI Gym-compatible environment for Backtrader backtesting/trading library, designed to provide gym-integrated framework for running reinforcement learning experiments in [close to] real world algorithmic trading environments. **[experimental]:** Besides core environment package includes implementations of several deep RL algorithms, tuned [to attempt] to solve this particular type of tasks. **Backtrader** is open-source algorithmic trading library: GitHub: http://github.com/mementum/backtrader Documentation and community: http://www.backtrader.com/ **OpenAI Gym** is..., well, everyone knows Gym: GitHub: http://github.com/openai/gym Documentation and community: https://gym.openai.com/ **DISCLAIMER:** This package is neither out-of-the-box-moneymaker, nor it provides ready-to-converge RL solutions. Rather, it is framework for setting experiments with complex, non stationary, time-series based environments. I have no idea what kind of algorithm and setup will solve it [if any]. Explore on your own! Installation ------------ It is highly recommended to run BTGym in designated virtual environment. Clone or copy btgym repository to local disk, cd to it and run: `pip install -e .` to install package and all dependencies:: git clone https://github.com/Kismuz/btgym.git cd btgym pip install -e . To update to latest version:: cd btgym git pull pip install --upgrade -e . Note: BTGym requres Matplotlib version 2.0.2, downgrade your installation if you have version 2.1:: pip install matplotlib==2.0.2 Quickstart ---------- Making gym environment with all parmeters set to defaults is as simple as:: from btgym import BTgymEnv MyEnvironment = BTgymEnv(filename='../examples/data/DAT_ASCII_EURUSD_M1_2016.csv',) Adding more controls may look like:: from btgym import BTgymEnv from gym import spaces MyEnvironment = BTgymEnv(filename='../examples/data/DAT_ASCII_EURUSD_M1_2016.csv', episode_duration={'days': 2, 'hours': 23, 'minutes': 55}, drawdown_call=50, state_shape={'raw': spaces.Box(low=0,high=1,shape=(20,4))}, port=5555, verbose=1, ) Same one but registering environment in Gym preferred way:: import gym from gym import spaces from btgym import BTgymEnv env_params = dict(filename='../examples/data/DAT_ASCII_EURUSD_M1_2016.csv', episode_duration={'days': 2, 'hours': 23, 'minutes': 55}, drawdown_call=50, state_shape={'raw': spaces.Box(low=0,high=1,shape=(20,4))}, port=5555, verbose=1, ) gym.envs.register(id='backtrader-v5555', entry_point='btgym:BTgymEnv', kwargs=env_params,) MyEnvironment = gym.make('backtrader-v5555') Maximum environment flexibility is achieved by explicitly defining and passing `Dataset` and `Cerebro` instances:: from gym import spaces import backtrader as bt from btgym import BTgymDataset, BTgymBaseStrategy, BTgymEnv MyCerebro = bt.Cerebro() MyCerebro.addstrategy(BTgymStrategy, state_shape={'raw': spaces.Box(low=0,high=1,shape=(20,4))}, skip_frame=5, state_low=None, state_high=None, drawdown_call=50, ) MyCerebro.broker.setcash(100.0) MyCerebro.broker.setcommission(commission=0.001) MyCerebro.addsizer(bt.sizers.SizerFix, stake=10) MyCerebro.addanalyzer(bt.analyzers.DrawDown) MyDataset = BTgymDataset(filename='../examples/data/DAT_ASCII_EURUSD_M1_2016.csv', start_weekdays=[0, 1, 2, 4], start_00=True, episode_duration={'days': 0, 'hours': 23, 'minutes': 55}, time_gap={'hours': 5}, ) MyEnvironment = BTgymEnv(dataset=MyDataset, engine=MyCerebro, port=5555, verbose=1, ) Problem definition ------------------ - **Discrete actions setup:** consider setup with one riskless asset acting as broker account cash and K (by default - one) risky assets. For every risky asset there exists track of historic price records referred as `data-line`. Apart from assets data lines there [optionally] exists number of exogenous data lines holding some information and statistics, e.g. economic indexes, encoded news, macroeconomic indicators, weather forecasts etc. which are considered relevant to decision-making. It is supposed for this setup that: 1. there is no interest rates for any asset; 2. broker actions are fixed-size market orders (`buy`, `sell`, `close`); short selling is permitted; 3. transaction costs are modelled via broker commission; 4. 'market liquidity' and 'capital impact' assumptions are met; 6. time indexes match for all data lines provided; - The problem is modelled as discrete-time finite-horizon partially observable Markov decision process for equity/currency trading: - *for every asset* traded agent action space is discrete `(0: `hold` [do nothing], 1:`buy`, 2: `sell`, 3:`close` [position])`; - environment is episodic: maximum episode duration and episode termination conditions are set; - for every timestep of the episode agent is given environment state observation as tensor of last `m` time-embedded preprocessed values for every data-line included and emits actions according some stochastic policy. - agent's goal is to maximize expected cumulative capital by learning optimal policy; - **Continuous actions setup[BETA]:** this setup closely relates to continuous portfolio optimisation problem definition; it differs from setup above in: 1. base broker actions are real numbers: `a[i] in [0,1], 0<=i<=K, SUM{a[i]} = 1` for `K` risky assets added; each action is a market target order to adjust portfolio to get share `a[i]*100%` for `i`-th asset; 2. entire single-step broker action is dictionary of form: `{cash_name: a[0], asset_name_1: a[1], ..., asset_name_K: a[K]}`; 3. short selling is not permitted; - For RL model it implies having continuous action space as `K+1` dim vector. Environment engine description ------------------------------ BTgym uses Backtrader framework for actual environment computations, for extensive documentation see: https://www.backtrader.com/docu/index.html. In short: - User defines backtrading engine parameters by composing `Backtrader.Cerebro()` subclass, provides historic prices dataset as `BTgymDataset()` instance and passes it as arguments when making BTgym environment. See https://www.backtrader.com/docu/concepts.html for general Backtrader concepts descriptions. - Environment starts separate server process responsible for rendering gym environment queries like `env.reset()` and `env.step()` by repeatedly sampling episodes form given dataset and running backtesting `Cerebro` engine on it. See OpenAI Gym documentation for details: https://gym.openai.com/docs **** .. image:: btgym_env_operation.png :scale: 60 % :alt: btgym environment operation **Workflow sample**: 1. Define backtesting `BTgymStrategy(bt.Strategy)`, which will control Environment inner dynamics and backtesting logic. - For RL-specific part, any `STATE`, `REWARD`, `DONE` and `INFO` computation logic can be implemented by overriding `get_state()`, `get_reward()`, `get_info()`, `is_done()` and `set_datalines()` methods. - For Broker/Trading specific part, custom order execution logic, stake sizing, analytics tracking can be implemented as for regular `bt.Strategy()`. 2. Instantiate `Cerbro()`, add `BTgymStrategy()`, backtrader `Sizers`, `Analyzers` and `Observers` (if needed). 3. Define dataset by passing CSV datafile and parameters to BTgymDataset instance. - `BTgymDataset()` is simply `Backtrader.feeds` class wrapper, which pipes `CSV`[source]->`pandas`[for efficient sampling]->`bt.feeds` routine and implements random episode data sampling. 4. Initialize (or register and `make()`) gym environment with `Cerebro()` and `BTgymDataset()` along with other kwargs. 5. Run your favorite RL algorithm: - start episode by calling `env.reset()`; - advance one step of episode by calling `env.step()`, perform agent training or testing; - after single episode is finished, retrieve agent performance statistic by `env.get_stat()`. **** **Server operation details:** Backtrader server starts when `env.reset()` method is called for first time, runs as separate process, follows simple Request/Reply pattern (every request should be paired with reply message) and operates one of two modes: - Control mode: initial mode, accepts only `_reset`, `_stop` and `_getstat` messages. Any other message is ignored and replied with simple info messge. Shuts down upon recieving `_stop` via `env._stop_server()` method, goes to episode mode upon `_reset` (via `env.reset()`) and send last run episode statistic (if any) upon `_getstat` via `env.get_stat()`. - Episode mode: runs episode according `BtGymStrategy()` logic. Accepts `action` messages, returns `tuple`: `([state observation], [reward], [is_done], [aux.info])`. Finishes episode upon recieving `action`==`_done` or according to strategy termination rules, than falls back to control mode. - Before every episode start, BTserver samples episode data and adds it to `bt.Cerebro()` instance along with specific `_BTgymAnalyzer`. The service of this hidden Analyzer is twofold: - enables strategy-environment communication by calling RL-related `BTgymStrategy` methods: `get_state()`, `get_reward()`, `get_info()` and `is_done()` [see below]; - controls episode termination conditions. - Episode runtime: after preparing environment initial state by running `BTgymStrategy` `start()`, `prenext()` methods, server halts and waits for incoming agent `action`. Upon receiving `action`, server performs all necessary `next()` computations (e.g. issues orders, computes broker values etc.), composes environment response and sends it back to agent ( via `_BTgymAnalyzer`). Actually, since 'no market impact' is assumed, all state computations are performed one step ahead: **** **Server loop**:: pseudocode Initialize by receiving engine [bt.Cerebro()] and dataset [BTgymDataset()] Repeat until received message '_stop': Wait for incoming message If message is '_getstat': send episode statistics If message is '_reset': Randomly sample episode data from BTgymDataset Add episode data to bt.Cerebro() Add service _BTgymAnalyzer() to bt.Cerebro() Add DrawDown observer to bt.Cerebro(), if not already present Prepare BTgymStrategy initial state Set agent to 'hold' Repeat until episode termination conditions are met: Issue and process orders according to recieved agent action Perform all backtesting engine computations Estimate state observation Eestimate env. reward Compose aux. information Check episode termination conditions Wait for incoming message Send (state, reward, done, info) response Data flow structure ------------------- .. image:: data_domain_iteration.png :scale: 50 % :alt: Btgym Data Domain Structure A3C framework description ------------------------- BTGym can be thougt as two-part package: one is environment itself and the other one is collection RL algoritms tuned for solving algo-trading tasks. Below is BTgym A3C training framework operation diagram. Three advantage actor-critic style algorithms are implemented: A3C itself, it's UNREAL extension and PPO. Note that while base training framework is itself is somewhat stable, exact algorithms implementations and corresponding BTgym startegies, state and reward shaping methods, data providers etc. are subject to experiments and changes. .. image:: btgym_a3c_framework.png :scale: 60 % :alt: A3C framework workwlow image **** **Stacked LSTM Agent** Based on NAV_A3C agent from `LEARNING TO NAVIGATE IN COMPLEX ENVIRONMENTS `_ paper by Mirowski at al. Modifications to original paper architecture: - splitted Policy/Value outputs: Policy is taken off first LSTM layer, Value - off the second; - LSTM state initialisation: first RNN layer context (policy) is initialised on every episode start, while second (Value) is reset either on begining of every Trial (future work) or or every N-constant episodes, motivated by RL^2 approach by Duan et al., `FAST REINFORCEMENT LEARNING VIA SLOW REINFORCEMENT LEARNING `_; - inner/external observation state state split: external (market) is encoded via conolution layers and fed to first LSTM layer, inner (broker) state is fed into second LSTM layer, can optionally be encoded via separate convolution block (doesnt seem to improve much though); - optional Value Replay losss (`Unreal` feature) improves sample efficiency, but is computationally more expensive; Other details: - All convolution and LSTM layers are layer-normalized, see `Layer Normalisation `_ paper by Jimmy Ba at al.; - A3C option `time_flat` is ON by default, improves training stability, reduces computation costs, see `Base_AAC class Note `_ for details; .. image:: a3c_stacked_lstm_agent.png :scale: 50 % :alt: A3C stacked LSTM agent architecture