Cryptocurrency quotes. Collection and processing. What should a trader know about?

Cryptocurrency quotes data. What nuances need to be considered when collecting information from crypto exchanges? Comparison of quotes – solutions.


  1. Introduction.
  2. About collecting data from cryptocurrency exchanges.
  3. The task of comparing two or more cryptocurrency quotes … What’s the catch?
  4. A variant of solving the problem of missing data for the task of comparing two or more cryptocurrency quotes.
  5. Conclusion.


How does trading on a crypto exchange begin? What is primary for a trader? Collection of information. Information is necessary for analysis, assessment of the market situation. Whether we are in longs, shorts or out of the market. Data is the nutrient that drives decisions. That is why the data that is taken into account must be treated with due respect. Whether a trading strategy is being formed … whether the current market position is being assessed – you need to remember this. It is about the collection methods, the quality of the data of the quotes of cryptocurrency pairs that will be discussed in this guide.

About collecting data from cryptocurrency exchanges

In most cases, crypto exchanges provide data in two ways: API and WebSocket. They can be illustrated as follows. Imagine you have two buddies. The first is laconic, but every phrase is worth its weight in gold. As they say, he does not throw words to the wind. You dial his phone number, get a chunk of information, and hang up. The second friend is a chatterbox. You need to hang on the telephone line for hours so that in his verbal uh…. flow to catch something valuable.

These are two fundamentally different types of communication. The first type is API, the second is WebSocket. Usually, you make a call through the API (accessing the crypto exchange server). The frequency of requests is usually no more than a few times per second. Information in the API comes on request from you (your bot, program). When working via WebSocket, a connection is established between the port of your computer and the port of the cryptoexchange. It’s like picking up the telephone.

Which method is preferable? It depends on your trading style. If you are doing scalping without WebSocket, you cannot do it.

In general, information about quotes is an important component in a trader’s trading strategy. Moreover, its collection is associated with the solution of a number of problems. Here are some guidelines if you intend to create your own cryptocurrency quotes data store:

  • Decide on the choice of server. How powerful should the hardware be?
  • Decide on the choice of database. At the moment, there are many options, it is necessary to dive into the nuances of the functioning of these systems. Will it be MySQL, PostgreSQL, ClickHouse?
  • Place collectors of information on reliable servers (reliability should be understood not only as a technical component).
  • Consider the risk of blockages and accesses. Not all jurisdictions have a positive attitude towards cryptocurrencies.
  • Observe the virtual server that you intend to make the data warehouse. How often is technical work carried out on it? How critical is downtime to you?
  • Choose the location of virtual servers, if possible, in the region where your data source is located.
  • Work with protecting your data from hackers. It is important.
  • Clone data servers. Backups will keep you safe in a critical situation.
  • Calculate what are the monthly infrastructure maintenance costs?
  • It is very likely that in the first months of data collection you will be busy catching bugs. Be careful with the data you receive.
  • Think about scaling storage later.

The task of comparing two or more cryptocurrency quotes … What’s the catch?

A very common task for a trader (analyst) is the task of comparing two or more cryptocurrency quotes. It arises when using correlation analysis (we research the relationship of cryptocurrencies), cluster analysis (we research the behavior of groups of cryptocurrencies), regression analysis (we research the functional dependence of some cryptocurrencies on others) and in many other approaches. Thus, the solution to the comparison problem can be the basis for the formation of systemic trading in the cryptocurrency market. At the same time, it is not so important who will execute this decision – a trading robot or a person.

This task is complicated by one problematic moment. Let’s consider it with an example. Let’s say there are two cryptocurrencies: Cryptocurrency A and Cryptocurrency B. For example, we are considering 30 periods for each of the cryptocurrencies. Ideally, there is a complete set of 30 periods for both Cryptocurrency A and Cryptocurrency B:

When we talk about periods, most often we consider either the Close prices or the Close growth rates. What is the absolute value of Close? … It is nothing more than a marker (something like a camera flash) – the value that was fixed at a certain point in time.

But… in practice, everything is not so perfect! Often, some data is simply missing. The reasons for this may be “carriage” and “small cart”. From unstable data transfer from the crypto exchange to the lack of trading at certain times (due to lack of liquidity). Let periods 7, 14, 22, 23 do not contain Close data for Cryptocurrency B. Let’s designate them in red. Then we have the following data representation:

If we completely exclude the missing periods, we get the following data composition:

It would seem that the situation is not so deplorable, out of 30 periods we have data on 26. But…. imagine if you need to compare not two currencies, but five … ten … twenty. Each of the compared currencies has its own gaps in the data:

As a result of this approach, as the number of cryptocurrencies grows, the sample becomes smaller and smaller:

The problem of holes in the data flow can significantly distort the subsequent analysis! Note that this problem is not new. It is inherent not only for quotations of financial assets. Experts from a wide variety of fields face similar challenges: ecologists, geologists, engineers. They have their own time series that are related to their subject area. Naturally, each subject area leaves its mark on the solution of the problem of patching information holes.

There are several classic approaches to help solve this problem. Some researchers patch holes with ordinary arithmetic means. For example, if you have data for period 6 and for period 8 … Um … why not take and calculate the average of them and write it down in period 7?

There is also a method for interpolating adjacent points. It works as follows: missing data is replaced with values that are formed by connecting a straight line before and after the pass. In other words, if on the value chart we draw a straight line between period 6 and period 8, we get the missing value of period 7.

Another method is the average value formed from the data of N neighboring points. This approach is similar to the arithmetic mean, but not only two adjacent points are taken into account.

There is a variant of using the median over N neighboring points. The median is preferred because it better describes the data if the distribution does not follow the normal distribution (you can read about the fact that the movement of cryptocurrency pairs does not obey the normal distribution in our research “How and how to analyze the connections of cryptocurrency pairs“? – look at

Another approach: Missing data in the time series can also be filled with predicted linear regression values.

It would seem that there are not so few options for solving the problem of missing data. But … what characterizes the cryptocurrency market? Volatility. Variability of indicators of variance of gains Close (goodbye linear regression …), sharp outliers. The use of averages, interpolation by neighboring points, medians to fill in the gaps in data, as it were, smooths the existing time series of quotes. This is completely contrary to the nature, the nature of the movements of cryptocurrency pairs.

A solution to the problem of missing data for the problem of comparing two or more cryptocurrency quotes

But what should be done? We offer another way to solve the problem of missing data for the task of comparing two or more cryptocurrency quotes. It is quite laborious for a human … but not for a robot.

Consider the above situation with five cryptocurrencies. But what … if, in order to research the properties of the cryptocurrency Cryptocurrency A, it is not critical for us to use this particular set of cryptocurrencies (Cryptocurrency B, Cryptocurrency C, Cryptocurrency D, Cryptocurrency E)? If among thousands of crypto pairs we can find such crypto pairs that will have the same required list of time periods as the Cryptocurrency A crypto pair?

If we find such cryptocurrency pairs Cryptocurrency F, Cryptocurrency G, Cryptocurrency H, Cryptocurrency I, Cryptocurrency J, then the dataset will be complete:

In other words, if there are a lot of crypto pairs, then we are faced with the task of finding the necessary data. We are looking for such crypto pairs that have the same timestamp composition as the target crypto pair (Cryptocurrency A).


How does it work in practice? At the time of this writing, we have historical data for more than 1000 crypto pairs. The most liquid part of them is no more than 6%. This part is of the greatest interest. Note: due to their liquidity, these pairs have practically no gaps in the data. Each of the cryptocurrency pairs, which is included in this 6%, alternately acts as Cryptocurrency A. we are looking for a complete list of comparable crypto pairsfor the current cryptocurrency pair that can give as much data as the Cryptocurrency A time series contains.

For example, let it be BTCUSDT. Consider the data for a certain period of time from the current moment. Which crypto pairs have the same amount of data? Lossless? We process the corresponding request. As a result (in the form of a file), we have a complete list of comparable crypto pairs (which can be analyzed together with BTCUSDT). A file with the relevant data is posted on

Obviously, the list of comparable crypto pairs can change from time to time. If a crypto pair has holes in the data over the time period under consideration, it is dropped from the full list of comparable crypto pairs in relation to Cryptocurrency A (and vice versa, if everything is fine, it is included in the list).

Note also that each liquid crypto pair has its own complete list of comparable crypto pairs for the period under review.

You may be interested in research / data: