btgym.datafeed package¶

btgym.datafeed.base module¶

btgym.datafeed.base.DataSampleConfig = {'get_new': True, 'sample_type': 0, 'timestamp': None, 'b_alpha': 1, 'b_beta': 1}¶: dict – Conventional sampling configuration template to pass to data class sample() method – `sample = my_data.sample(**DataSampleConfig)`

btgym.datafeed.base.EnvResetConfig = {'episode_config': {'get_new': True, 'sample_type': 0, 'timestamp': None, 'b_alpha': 1, 'b_beta': 1}, 'trial_config': {'get_new': True, 'sample_type': 0, 'timestamp': None, 'b_alpha': 1, 'b_beta': 1}}¶: dict – Conventional reset configuration template to pass to environment reset() method – `observation = env.reset(**EnvResetConfig)`

class btgym.datafeed.base.BTgymBaseData(filename=None, dataframe=None, parsing_params=None, sampling_params=None, name='base_data', data_names=('default_asset', ), task=0, frozen_time_split=None, log_level=13, _config_stack=None, **kwargs)[source]¶

Base BTgym data provider class. Provides core data loading, sampling, splitting and converting functionality. Do not use directly.

Enables Pipe:

CSV[source data]-->pandas[for efficient sampling]-->bt.feeds

Parameters:

filename – Str or iterable of of str, filenames holding csv data; should be given either here or when calling read_csv(), see Notes
dataframe – pd.dataframe holding data, if this arg is given - overrides ``filename` arg.
CSV to Pandas parsing (specific_params) –
sep – ‘;’
header – 0
index_col – 0
parse_dates – True
names – [‘open’, ‘high’, ‘low’, ‘close’, ‘volume’]
Pandas to BT.feeds conversion (specific_params) –
timeframe=1 – 1 minute.
datetime – 0
open – 1
high – 2
low – 3
close – 4
volume – -1
openinterest – -1
Sampling (specific_params) –
sample_class_ref – None - if not None, than sample() method will return instance of specified class, which itself must be subclass of BaseBTgymDataset, else returns instance of the base data class.
start_weekdays – [0, 1, 2, 3, ] - Only weekdays from the list will be used for sample start.
start_00 – True - sample start time will be set to first record of the day (usually 00:00).
sample_duration – {‘days’: 1, ‘hours’: 23, ‘minutes’: 55} - Maximum sample time duration in days, hours, minutes
time_gap – {‘’days’: 0, hours’: 5, ‘minutes’: 0} - Data omittance threshold: maximum no-data time gap allowed within sample in days, hours. Thereby, if set to be < 1 day, samples containing weekends and holidays gaps will be rejected.
test_period – {‘days’: 0, ‘hours’: 0, ‘minutes’: 0} - setting this param to non-zero duration forces instance.data split to train / test subsets with test subset duration equal to test_period with time_gap tolerance. Train data always precedes test one: [0_record<-train_data->split_point_record<-test_data->last_record].
sample_expanding – None, reserved for child classes.

Note

CSV file can contain duplicate records, checks will be performed and all duplicates will be removed;
CSV file should be properly sorted by date_time in ascending order, no sorting checks performed.
When supplying list of file_names, all files should be also listed ascending by their time period, no correct sampling will be possible otherwise.
Default parameters are source-specific and made to correctly parse 1 minute Forex generic ASCII data files from www.HistData.com. Tune according to your data source.

set_params(params_dict)[source]¶

Batch attribute setter.

Parameters:	params_dict – dictionary of parameters to be set as instance attributes.

set_logger(level=None, task=None)[source]¶

Sets logbook logger.

Parameters:	level – logbook.level, int task – task id, int

reset(data_filename=None, **kwargs)[source]¶

Gets instance ready.

Parameters:	data_filename – [opt] string or list of strings. kwargs – not used.

read_csv(data_filename=None, force_reload=False)[source]¶

Populates instance by loading data: CSV file –> pandas dataframe.

Parameters:	data_filename – [opt] csv data filename as string or list of such strings. force_reload – ignore loaded data.

describe()[source]¶

Returns summary dataset statistic as pandas dataframe:

records count,

data mean,

data std dev,

min value,

25% percentile,

50% percentile,

75% percentile,

max value

for every data column.

to_btfeed()[source]¶

Performs BTgymData–>bt.feed conversion.

Returns:	{data_line_name: bt.datafeed instance}.
Return type:	dict of type

_sample(get_new=True, sample_type=0, b_alpha=1.0, b_beta=1.0, force_interval=False, interval=None, **kwargs)[source]¶

Samples continuous subset of data.

Parameters:

get_new (bool) – sample new (True) or reuse (False) last made sample;
sample_type (int or bool) – 0 (train) or 1 (test) - get sample from train or test data subsets respectively.
b_alpha (float) – beta-distribution sampling alpha > 0, valid for train episodes.
b_beta (float) – beta-distribution sampling beta > 0, valid for train episodes.
force_interval (bool) – use exact sampling interval (should be given)
interval (iterable of int, len2) – exact interval to sample from when force_interval=True

Returns: if no sample_class_ref param been set:

BTgymDataset instance with number of records ~ max_episode_len, where ~ tolerance is set by time_gap param;

else:: sample_class_ref instance with same as above number of records.

Note

Train sample start position within interval is drawn from beta-distribution with default parameters b_alpha=1, b_beta=1, i.e. uniform one. Beta-distribution makes skewed sampling possible , e.g. to give recent episodes higher probability of being sampled, e.g.: b_alpha=10, b_beta=0.8. Test samples are always uniform one.

_sample_random(sample_type=0, timestamp=None, name='random_sample_', interval=None, force_interval=False, **kwargs)[source]¶

Randomly samples continuous subset of data.

Parameters:	name – str, sample filename id
Returns:	BTgymDataset instance with number of records ~ max_episode_len, where ~ tolerance is set by time_gap param.

_sample_interval(interval, b_alpha=1.0, b_beta=1.0, name='interval_sample_', force_interval=False, **kwargs)[source]¶

Samples continuous subset of data, such as entire episode records lie within positions specified by interval. Episode start position within interval is drawn from beta-distribution parametrised by b_alpha, b_beta. By default distribution is uniform one.

Parameters:

interval – tuple, list or 1d-array of integers of length 2: [lower_row_number, upper_row_number];
b_alpha – float > 0, sampling B-distribution alpha param, def=1;
b_beta – float > 0, sampling B-distribution beta param, def=1;
name – str, sample filename id
force_interval – bool, if true: force exact interval sampling

Returns:

number of records ~ max_episode_len, subj. to time_gap param; 2. actual episode start position is sampled from interval;

False if it is not possible to sample instance with set args.

Return type:

BTgymDataset instance such as

_sample_aligned_interval(interval, align_left=False, b_alpha=1.0, b_beta=1.0, name='interval_sample_', force_interval=False, **kwargs)[source]¶

Samples continuous subset of data, such as entire episode records lie within positions specified by interval Episode start position within interval is drawn from beta-distribution parametrised by b_alpha, b_beta. By default distribution is uniform one.

Parameters:

interval – tuple, list or 1d-array of integers of length 2: [lower_row_number, upper_row_number];
align – if True - try to align sample to beginning of interval;
b_alpha – float > 0, sampling B-distribution alpha param, def=1;
b_beta – float > 0, sampling B-distribution beta param, def=1;
name – str, sample filename id
force_interval – bool, if true: force exact interval sampling

Returns:

number of records ~ max_episode_len, subj. to time_gap param; 2. actual episode start position is sampled from interval;

False if it is not possible to sample instance with set args.

Return type:

BTgymDataset instance such as

_sample_exact_interval(interval, name='interval_sample_', **kwargs)[source]¶

Samples exactly defined interval.

Parameters:	interval – tuple, list or 1d-array of integers of length 2: [lower_row_number, upper_row_number]; name – str, sample filename id
Returns:	BTgymDataset instance.

btgym.datafeed.base.random_beta()¶

beta(a, b, size=None)

Draw samples from a Beta distribution.

The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function

\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]

where the normalisation, B, is the beta function,

\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]

It is often seen in Bayesian inference and order statistics.

Parameters:	a (float or array_like of floats) – Alpha, non-negative. b (float or array_like of floats) – Beta, non-negative. size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., `(m, n, k)`, then `m * n * k` samples are drawn. If size is `None` (default), a single value is returned if `a` and `b` are both scalars. Otherwise, `np.broadcast(a, b).size` samples are drawn.
Returns:	out – Drawn samples from the parameterized beta distribution.
Return type:	ndarray or scalar

btgym.datafeed.derivative module¶

class btgym.datafeed.derivative.BTgymEpisode(filename=None, parsing_params=None, sampling_params=None, name=None, data_names=('default_asset', ), task=0, log_level=13, _config_stack=None)[source]¶: Low-level data class. Implements Episode object containing single episode data sequence. Doesnt allows further sampling and data loading. Supposed to be converted to bt.datafeed object via .to_btfeed() method. Do not use directly.

class btgym.datafeed.derivative.BTgymDataTrial(filename=None, parsing_params=None, sampling_params=None, name=None, data_names=('default_asset', ), frozen_time_split=None, task=0, log_level=13, _config_stack=None)[source]¶

Intermediate-level data class. Implements conception of Trial object. Supports data train/test separation. Do not use directly.

Parameters:	filename – not used; sampling_params – dict, sample retrieving options, see base class description for details; task – int, optional; parsing_params – csv parsing options, see base class description for details; log_level – int, optional, logbook.level; _config_stack – dict, holding configuration for nested child samples;

class btgym.datafeed.derivative.BTgymRandomDataDomain(trial_params, episode_params, filename=None, dataframe=None, parsing_params=None, target_period=None, use_target_backshift=False, frozen_time_split=None, name='RndDataDomain', task=0, data_names=('default_asset', ), log_level=13)[source]¶

Top-level data class. Implements one way data domains can be defined, namely when source domain precedes and target one. Implements pipe:

Domain.sample() --> Trial.sample() --> Episode.to_btfeed() --> bt.Startegy

This particular class randomly samples Trials from provided dataset.

Parameters:

filename – Str or list of str, file_names containing CSV historic data;
dataframe – pd.dataframe or iterable of pd.dataframes containing historic data;
parsing_params – csv parsing options, see base class description for details;
trial_params – dict, describes trial parameters, should contain keys: {sample_duration, time_gap, start_00, start_weekdays, test_period, expanding};
episode_params – dict, describes episode parameters, should contain keys: {sample_duration, time_gap, start_00, start_weekdays};
target_period – dict, None or Int, domain target period, def={‘days’: 0, ‘hours’: 0, ‘minutes’: 0}; setting this param to non-zero duration forces separation to source/target domains (which can be thought of as creating top-level train/test subsets) with target data duration equal to target_period; if set to None - no target period assumed; if set to -1 - no source period assumed; Source data always precedes target one.
use_target_backshift – bool, if true - target domain is shifted back by the duration of trial train period, thus allowing training on part of target domain data, namely train part of the trial closest to source/target break point.
name – str, optional
task – int, optional
log_level – int, logbook.level

trial_class_ref¶: alias of BTgymDataTrial

episode_class_ref¶: alias of BTgymEpisode

class btgym.datafeed.derivative.BTgymDataset(filename=None, episode_duration=None, time_gap=None, start_00=False, start_weekdays=None, parsing_params=None, target_period=None, name='SimpleDataSet', data_names=('default_asset', ), log_level=13, **kwargs)[source]¶

Simple top-level data class, implements direct random episode sampling from data set induced by csv file, i.e it is a special case for Trial=def=Episode. Supports source and target data domains separation with some caveat - see Note.

Note

Due to current implementation sampling test episode actually requires sampling test TRIAL. To be improved.

Parameters:

filename – Str or list of str, file_names containing CSV historic data;
episode_duration – dict, maximum episode duration in d:h:m, def={‘days’: 0, ‘hours’: 23, ‘minutes’: 55}, alias for sample_duration;
time_gap – dict, data time gap allowed within sample in d:h:m, def={‘days’: 0, ‘hours’: 6};
start_00 – bool, episode start point will be shifted back to first record; of the day (usually 00:00), def=False;
start_weekdays – list, only weekdays from the list will be used for sample start, def=[0, 1, 2, 3, 4, 5, 6];
target_period – domain test(aka target) period. def={‘days’: 0, ‘hours’: 0, ‘minutes’: 0}; setting this param to non-zero duration forces data separation to train/test subsets. Train data always precedes test one.
parsing_params – csv parsing options, see base class description for details;
name – str, instance name;
log_level – int, logbook.level;
**kwargs – deprecated kwargs;

class BTgymSimpleTrial(filename=None, parsing_params=None, sampling_params=None, name=None, data_names=('default_asset', ), frozen_time_split=None, task=0, log_level=13, _config_stack=None)[source]¶

Truncated Trial without test period: always samples from train, sampled episode inherits tarin/test metadata of parent trail.

Parameters:	filename – not used; sampling_params – dict, sample retrieving options, see base class description for details; task – int, optional; parsing_params – csv parsing options, see base class description for details; log_level – int, optional, logbook.level; _config_stack – dict, holding configuration for nested child samples;

trial_class_ref¶: alias of BTgymSimpleTrial

class btgym.datafeed.derivative.BTgymDataset2(filename=None, dataframe=None, episode_duration=None, time_gap=None, start_00=False, start_weekdays=None, parsing_params=None, target_period=None, name='SimpleDataSet2', data_names=('default_asset', ), log_level=13, **kwargs)[source]¶

Simple top-level data class, implements direct random episode sampling from data set induced by csv file, i.e it is a special case for Trial=def=Episode.

Parameters:

filename – Str or list of str, file_names containing CSV historic data;
dataframe – pd.dataframe or iterable of pd.dataframes containing historic data;
episode_duration – dict, maximum episode duration in d:h:m, def={‘days’: 0, ‘hours’: 23, ‘minutes’: 55}, alias for sample_duration;
time_gap – dict, data time gap allowed within sample in d:h:m, def={‘days’: 0, ‘hours’: 6};
start_00 – bool, episode start point will be shifted back to first record; of the day (usually 00:00), def=False;
start_weekdays – list, only weekdays from the list will be used for sample start, def=[0, 1, 2, 3, 4, 5, 6];
target_period – domain test(aka target) period. def={‘days’: 0, ‘hours’: 0, ‘minutes’: 0}; setting this param to non-zero duration forces data separation to train/test subsets. Train data always precedes test one.
parsing_params – csv parsing options, see base class description for details;
name – str, instance name;
log_level – int, logbook.level;
**kwargs –

btgym.datafeed.casual module¶

class btgym.datafeed.casual.BTgymCasualTrial(name='TimeTrial', **kwargs)[source]¶

Intermediate-level data class. Implements conception of Trial object. Supports exact data train/test separation by means of global_time Do not use directly.

Parameters:	filename – not used; sampling_params – dict, sample retrieving options, see base class description for details; task – int, optional; parsing_params – csv parsing options, see base class description for details; log_level – int, optional, logbook.level; _config_stack – dict, holding configuration for nested child samples;

set_global_timestamp(timestamp)[source]¶

Performs validity checks and sets current global_time. :param timestamp: POSIX timestamp

Returns:

get_global_index()[source]¶

Returns:	data row corresponded to current global_time

get_intervals()[source]¶

Estimates exact sampling intervals such as test episode starts as close to current global time point as data consistency allows but no earlier;

Returns:	dict of train and test sampling intervals for current global_time point

sample(get_new=True, sample_type=0, timestamp=None, align_left=True, b_alpha=1.0, b_beta=1.0, **kwargs)[source]¶

Samples continuous subset of data.

Parameters:

get_new (bool) – sample new (True) or reuse (False) last made sample;
sample_type (int or bool) – 0 (train) or 1 (test) - get sample from train or test data subsets respectively.
timestamp – POSIX timestamp.
align_left – bool, if True: set test interval as close to current timepoint as possible.
b_alpha (float) – beta-distribution sampling alpha > 0, valid for train episodes.
b_beta (float) – beta-distribution sampling beta > 0, valid for train episodes.

class btgym.datafeed.casual.BTgymCasualDataDomain(filename, trial_params, episode_params, frozen_time_split=None, name='TimeDataDomain', data_names=('default_asset', ), **kwargs)[source]¶

Imitates online data stream by implementing conception of sliding current time point and enabling sampling control according to it.

Objective is to enable proper train/evaluation/test data split and prevent data leakage by: allowing training on known, past data only and testing on unknown, future data, providing realistic training cycle.

Source trials set is defined as all trials starting somewhere in past and ending no later than current time point, and target trials set as set of trials such as: trial test period starts somewhere in the past and ends at current time point and trial test period starts from now on for all time points in available dataset range.

Sampling control is defined by: - current time point is set arbitrary and is stateful in sense it can be only increased (no backward time); - source trials can be sampled from past (known) data multiply times; - target trial can only be sampled once according to current time point or later (unknown data); - as any sampled target trial is being evaluated by outer algorithm, current time should be incremented either by

providing ‘timestamp’ arg. to sample() method or calling set_global_timestamp() method, to match last evaluated record (marking all evaluated data as already known and making it available for training);

Parameters:

filename – Str or list of str, file_names containing CSV historic data;
parsing_params – csv parsing options, see base class description for details;
trial_params – dict, describes trial parameters, should contain keys: {sample_duration, time_gap, start_00, start_weekdays, test_period, expanding};
episode_params – dict, describes episode parameters, should contain keys: {sample_duration, time_gap, start_00, start_weekdays};
name – str, optional
task – int, optional
log_level – int, logbook.level

trial_class_ref¶: alias of BTgymCasualTrial

episode_class_ref¶: alias of BTgymEpisode

set_global_timestamp(timestamp)[source]¶

Performs validity checks and sets current global_time. :param timestamp: POSIX timestamp

Returns:

get_global_index()[source]¶

Returns:	data row corresponded to current global_time

get_intervals()[source]¶

Estimates exact sampling intervals such as train period of target trial overlaps by known up to date data

Returns:	dict of train and test sampling intervals for current global_time point

sample(get_new=True, sample_type=0, timestamp=None, b_alpha=1.0, b_beta=1.0, **kwargs)[source]¶

Samples from sequence of Trials.

Parameters:

get_new (bool) – sample new (True) or reuse (False) last made sample; n/a for target trials
sample_type (int or bool) – 0 (train) or 1 (test) - get sample from source or target data subsets respectively;
timestamp – POSIX timestamp indicating current global time of training loop
b_alpha (float) – beta-distribution sampling alpha > 0, valid for train episodes.
b_beta (float) – beta-distribution sampling beta > 0, valid for train episodes.

Returns:

Trial as BTgymBaseDataTrial instance; None, if trial’s sequence is exhausted (global time is up).

btgym.datafeed.multi module¶

class btgym.datafeed.multi.BTgymMultiData(data_class_ref=None, data_config=None, name='multi_data', data_names=None, task=0, log_level=13, **kwargs)[source]¶

Multiply data streams wrapper.

Parameters:	data_class_ref – one of BTgym single-stream datafeed classes data_config – nested dictionary of individual data streams sources, see notes below. kwargs – shared parameters for all data streams, see base dataclass

Notes

Data_config specifies all data sources consumed by strategy:

data_config = {
    data_line_name_0: {
        filename: [source csv filename string or list of strings],
        [config: {optional dict of individual stream config. params},]
    },
    ...,
    data_line_name_n : {...}
}

Example:

data_config = {
    'usd': {'filename': '.../DAT_ASCII_EURUSD_M1_2017.csv'},
    'gbp': {'filename': '.../DAT_ASCII_EURGBP_M1_2017.csv'},
    'jpy': {'filename': '.../DAT_ASCII_EURJPY_M1_2017.csv'},
    'chf': {'filename': '.../DAT_ASCII_EURCHF_M1_2017.csv'},
}
It is user responsibility to correctly choose historic data conversion rates wrt cash currency (here - EUR).

set_logger(level=None, task=None)[source]¶

Sets logbook logger.

Parameters:	level – logbook.level, int task – task id, int

set_params(params_dict)[source]¶

Batch attribute setter.

Parameters:	params_dict – dictionary of parameters to be set as instance attributes.

btgym.datafeed.multi.random_beta()¶

beta(a, b, size=None)

Draw samples from a Beta distribution.

The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function

\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]

where the normalisation, B, is the beta function,

\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]

It is often seen in Bayesian inference and order statistics.

Parameters:	a (float or array_like of floats) – Alpha, non-negative. b (float or array_like of floats) – Beta, non-negative. size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., `(m, n, k)`, then `m * n * k` samples are drawn. If size is `None` (default), a single value is returned if `a` and `b` are both scalars. Otherwise, `np.broadcast(a, b).size` samples are drawn.
Returns:	out – Drawn samples from the parameterized beta distribution.
Return type:	ndarray or scalar