Basic Usage¶

This is an example of how to train and deploy a simple agent that trade EURUSD in a metaquotes demo account. The configurations for this agent can be found in agents/t00001 and a more detailed explanation be found in the configuration documentation.

Note: - example only works Monday to Friday GMT+2/3 due to metaquotes not allowing data extract or trades on weekend demo accounts. - Steps 4+ are blocking processes, so open a new terminal to run in parallel.

1) Connect to running docker container¶

If opening a new terminal, connect interactively to the running docker container:

docker exec -it releat /bin/bash

Note: - This assumes that the running container is called releat

2) Start services¶

Starts services necessary to train, deploy and monitor the reinforcement learning trading agent: - Aerospike: In-memory database to store features and hyperparameters - Ray: Manages compute and provides RL training and inference logic - MT5: Download data and trade for forex and futures (depending on broker) - Tensorboard: Monitor RL training progress - Redis: In memory cache for storing RL predictions

The launch-mt5-api command launched a flask api in wine to interact with MetaTrader5. It is broker specific because in the future, this architecture should allow for multi-broker strategies.

IMPORTANT: If it is the first time starting up the docker container, or if it has been rebuilt, log in the your MT5 account manually and click the allow autotrading button. If not, steps 3+ will not work.

releat start
releat launch-mt5-api metaquotes general

Note: - A demo metaquotes account can be found in the releat/utils/configs/constants.py. Alternatively you can create new credentials from the metaquotes website

3) Build training data¶

Build the features defined by the feature_config.py script and upload to Aerospike.

releat build-train-data t00001

The t00001 config creates two feature groups: - one group for 30s timeframe - one group for 5m timeframe

The features within the 30s timeframe feature group include: - average price - modified one-hot encoding of the different flag types - average spread

The features within the 5m timeframe include: - average price - min price - max price - gradient of ticks

Features are built from tick data and are scaled by: - clipping by percentile - power transformer - linear scaling - clipping for extreme values

These features are then updated to Aerospike where the key is: (<environment>, <agent_version>, <integer>), i.e. ('prod','t00001',10000) The value for this example is:

{
    "date": str,
    "30s": list,
    "5m": list,
    "price": list
}

The price here indicates the bid-ask for EURUSD for the next 2s. It is not as used an input into the training model, rather it is used to simulate slippage in the gym environment.

4) Train model¶

In this step, the RL model is defined in agent_model.py: - a class that extend Ray's rllib's Tensorflow model - simple example of gated residual network - allows arbitrary dict inputs - allows action masking (i.e. to block bad actions, i.e. a close action if no positions are open)

The training process is defined in agent_config.py: - gym_env key defines the gym environments hyperparameters and is uploaded to aerospike so it can be dynamically changed during training (if necessary). It also define training hyperparameters such as number of episodes and training frequency. - keys with the prefix rl_ map directly to Ray's rllib's training configurations

releat train t00001

Note: - If you do not have a gpu, in the agent_config.py file, change agent_config['rl_resources']['num_gpus] from 1 to 0. - Model performance can be tracked in tensorboard, where the default address is http://localhost:6006/ - Resource usage can be tracked by ray dashboard

5) Generate signal¶

Using the artifacts generated by the training process, this generate signal process is deployed to continuously: - extract data from MT5 - note if first run in a fresh install, manually log into MT5 account and click on the enable algortrading button. - build features - makes predictions by invoking the RL agent - pushes predictions to redis - loads the latest checkpoint

The frequency of the prediction is controlled by the configs set in agent_config.py

releat generate-signal t00001

6) Launch trader¶

The trader is agent version agnostic (for now) and is deployed to: - gets the predictions from redis (in the future it will have capability to aggregate predictions from multiple different RL agents) - applies some risk logic (such as lot size scaling) - applies other operational logic (i.e. minimum position hold time, forced close at session close, etc.) - executes open or close actions for long or short positions

releat launch-trader