Efficient Methods for Ingesting Crypto Market Data into QuestDB

·

QuestDB is a high-performance time-series database designed for handling fast-moving financial data. It provides exceptional ingestion throughput, powerful SQL analytics capabilities, and hardware efficiency that reduces operational costs. As an open-source solution with support for open formats, QuestDB is particularly well-suited for managing tick data and other time-stamped financial information.

This guide explores three practical approaches for importing cryptocurrency market data into QuestDB for analysis and monitoring purposes.

Prerequisites

Before implementing any data ingestion method, you'll need to set up a QuestDB instance. Create a new directory and run the following command to start QuestDB using Docker:

mkdir cryptofeed-questdb
cd cryptofeed-questdb
docker run -p 9000:9000 -p 9009:9009 -p 8812:8812 -p 9003:9003 -v "$(pwd):/var/lib/questdb" questdb/questdb:8.1.0

This command starts QuestDB with the necessary ports exposed and creates a persistent volume for data storage.

Using Cryptofeed Library for Data Ingestion

The Cryptofeed Python library offers one of the simplest ways to collect market data from various cryptocurrency exchanges. This open-source tool establishes WebSocket connections to exchanges including Binance, OKX, Gemini, and Kraken, returning standardized trade, market, and order book data. Its native integration with QuestDB makes it an excellent choice for rapid data ingestion.

Setting Up Cryptofeed

Create a Python virtual environment (Python 3.8 or higher required) and install the necessary package:

python3 -m venv cryptofeed
source cryptofeed/bin/activate
pip install cryptofeed

Create a new Python file (questdb.py) and implement the following code to ingest trade data for BTC-USDT from OKX and Gemini:

from cryptofeed import FeedHandler
from cryptofeed.backends.quest import TradeQuest
from cryptofeed.defines import TRADES
from cryptofeed.exchanges import OKX, Gemini

QUEST_HOST = '127.0.0.1'
QUEST_PORT = 9009

def main():
    f = FeedHandler()
    f.add_feed(OKX(channels=[TRADES], symbols=['BTC-USDT'], 
                callbacks={TRADES: TradeQuest(host=QUEST_HOST, port=QUEST_PORT)}))
    f.add_feed(Gemini(channels=[TRADES], symbols=['BTC-USDT'], 
                callbacks={TRADES: TradeQuest(host=QUEST_HOST, port=QUEST_PORT)}))
    f.run()

if __name__ == '__main__':
    main()

This code establishes socket connections with the exchanges' APIs and automatically pushes data to QuestDB. You can view the ingested data by accessing the QuestDB web console at localhost:9000 and querying the relevant tables.

Customizing Data Structure

For more control over the ingested data structure, you can implement a custom callback handler. This allows you to modify table names, column structure, and data formatting according to your specific requirements. The Cryptofeed library uses InfluxDB Line Protocol (ILP) over socket connections to communicate with QuestDB, providing flexibility in how data is structured and transmitted.

The primary advantage of using Cryptofeed is its extensive preconfigured integrations with numerous exchanges. The library handles data normalization, making ingestion into QuestDB straightforward. However, if you need more control over data formats or need to integrate with unsupported exchanges, you might consider building a custom solution.

👉 Explore real-time data ingestion tools

Building a Custom Market Data Pipeline

When Cryptofeed doesn't meet your specific requirements—whether due to unsupported exchanges or need for custom data processing—building your own ingestion pipeline provides maximum flexibility. QuestDB supports both PostgreSQL wire protocol and InfluxDB Line Protocol, with ILP offering superior performance and schemaless ingestion capabilities.

Implementing a Custom Data Collector

The following example demonstrates how to use QuestDB's Python SDK with ILP to ingest price data from Binance and Gemini:

import requests
import time
from questdb.ingress import Sender, TimestampNanos

conf = 'http::addr=localhost:9000;'

def get_binance_data():
    url = 'https://api.binance.us/api/v3/ticker/price'
    params = {'symbol': 'BTCUSDT'}
    response = requests.get(url, params=params)
    data = response.json()
    btc_price = data['price']
    print(f"BTC Price on Binance: {btc_price}")
    publish_to_questdb('Binance', btc_price)

def get_gemini_data():
    url = 'https://api.gemini.com/v1/pubticker/btcusdt'
    response = requests.get(url)
    data = response.json()
    btc_price = data['last']
    print(f"BTC Price on Gemini: {btc_price}")
    publish_to_questdb('Gemini', btc_price)

def publish_to_questdb(exchange, price):
    print("Publishing BTC price to QuestDB...")
    with Sender.from_conf(conf) as sender:
        sender.row('prices', 
                  symbols={'pair': 'BTCUSDT'}, 
                  columns={'exchange': exchange, 'bid': price}, 
                  at=TimestampNanos.now())
        sender.flush()

def job():
    print("Fetching BTC price...")
    get_binance_data()
    get_gemini_data()

while True:
    job()
    time.sleep(5)

This implementation polls the REST endpoints of both exchanges at regular intervals and writes the retrieved data to a table named 'prices' in QuestDB. While this approach requires more development effort than using Cryptofeed, it provides complete control over data collection, processing, and ingestion logic.

Change Data Capture for Market Data Ingestion

Change Data Capture (CDC) offers an alternative approach for ingesting market data, particularly useful when you have existing data streams or databases that can be monitored for changes. This pattern is valuable when external teams publish price data on messaging platforms like Kafka or when updates need to be captured from relational databases.

CDC Architecture Example

A practical implementation of this architecture might involve a function that polls exchange APIs for latest price data and publishes this information to Kafka topics. The QuestDB Kafka Connector would then consume these messages and publish the data to QuestDB. This approach minimizes infrastructure burden by leveraging existing data pipelines and provides near-real-time data replication capabilities.

CDC is particularly valuable when you need to integrate QuestDB with existing data infrastructure without creating entirely new data collection processes. By listening to changes in source systems rather than polling directly, you reduce API calls and create a more efficient data flow.

👉 Discover advanced data integration methods

Comparison of Ingestion Methods

Each data ingestion approach offers distinct advantages depending on your specific requirements:

The optimal approach depends on your specific use case, technical requirements, and existing infrastructure.

Frequently Asked Questions

What are the different levels of market data?

Level 1 market data provides basic information including current bid and ask prices, last traded price, and order sizes. Level 2 data offers market depth with multiple price levels (typically 5-10 bid/ask prices). Level 3 includes additional depth information (up to 20 price levels) along with custom order details.

How does an order book work in cryptocurrency trading?

An order book is a digital ledger that displays all current buy and sell orders for a specific cryptocurrency trading pair. It organizes these orders by price level, constantly updating as new orders are placed and existing orders are filled or canceled.

What are candlestick charts used for in market analysis?

Candlestick charts visually represent price movements over specific time periods. Each candle shows the opening, closing, high, and low prices for that period, helping traders identify patterns and make informed trading decisions.

What is an exchange API and how does it work?

An exchange API is an application programming interface that provides programmatic access to exchange data and functionality. These APIs typically offer market data, order book information, trading history, and sometimes allow for automated trading operations.

How does Change Data Capture benefit data ingestion?

CDC captures database changes as they occur, enabling near-real-time data replication without the overhead of periodic polling. This approach reduces latency and ensures that your QuestDB instance contains the most current market information.

What role does Kafka play in data pipelines?

Kafka is a distributed streaming platform that serves as a reliable message broker for high-volume data pipelines. It enables durable storage and processing of data streams, making it ideal for financial data applications requiring robustness and scalability.

Conclusion

QuestDB provides multiple efficient pathways for ingesting cryptocurrency market data, each suited to different technical requirements and infrastructure constraints. For most users starting with market data analysis, the Cryptofeed library offers the quickest implementation with support for numerous exchanges. Those requiring more customization can develop custom data pipelines using QuestDB's ILP support, while organizations with existing data infrastructure can leverage CDC patterns for efficient data replication.

By selecting the appropriate ingestion method for your specific use case, you can ensure efficient, reliable collection of market data for analysis, monitoring, and trading applications.