Skip to content

Data Format Guide

StratEvo's evolution engine works with CSV files. This document describes the expected format and directory structure.

CSV Format

Required Columns

Column Type Description
date string Date/datetime (e.g. 2024-03-24 or 2024-03-24 00:00:00)
open float Opening price
high float High price
low float Low price
close float Closing price
volume float Trading volume

Optional Columns

Column Type Description
code string Symbol/ticker code (for multi-stock files)
amount float Trading amount in currency
turn float Turnover rate

Example: Crypto (hourly)

date,open,high,low,close,volume
2024-03-24 00:00:00,0.6232,0.634,0.6225,0.6333,395848.7683
2024-03-24 01:00:00,0.6331,0.6346,0.6302,0.6332,277089.9413

Example: A-Shares (daily)

date,code,open,high,low,close,volume,amount,turn
2024-03-01,sh.600030,20.22,20.35,20.03,20.24,72393608,1537955150.46,0.6369

Directory Structure

data/
├── crypto/              # One CSV per trading pair
│   ├── BTC_USDT.csv
│   ├── ETH_USDT.csv
│   └── ...
├── a_shares_elite/      # One CSV per stock (elite 100)
│   ├── sh.600030.csv
│   └── ...
└── a_shares_300/        # One CSV per stock (CSI 300)
    ├── sh.000001.csv
    └── ...

Each CSV file = one symbol. The filename (minus .csv) is used as the symbol identifier.

Data Preparation

Download Crypto Data

stratevo download-crypto --symbols BTC/USDT,ETH/USDT --timeframe 1h --days 365

Download A-Share Data

python scripts/download_a_shares.py

Using Your Own Data

  1. Create a directory with one CSV per symbol
  2. Ensure each CSV has at minimum: date, open, high, low, close, volume
  3. Sort rows by date ascending
  4. No NaN values in OHLCV columns

Then run evolution:

stratevo evolve --market crypto --data-dir ./my_data/ --generations 100

Data Validation

The data loader automatically: - Removes rows with NaN in any OHLCV field - Removes duplicate dates (keeps first occurrence) - Sorts by date ascending - Warns if fewer than 30 data points (minimum required for backtesting) - Skips symbols with insufficient data