Chaos and Fractals in Financial Markets

Part 8: The Correlation Integral and the Correlation Dimension

by J. Orlin Grabbe

[Note: Go here to read Parts 1-7.]

Dimension is a way of measuring things to get accurate probabilities. If we use the correct dimension, we get the right probabilities, and if we use the wrong dimension, we get the wrong probabilities.

Let's approach the concept of correlation dimension by throwing darts at a dart board. Stand ten feet away, aim at the center of the board, and let fly with a dart. Keep track of where the dart hits. Do this, say, ten thousand times. Now, let's create a measure of the points where the darts hit with respect to the center of the board. Draw a one-inch circle around the center of the dart board, and count the number of points within the circle. Now, draw a two-inch circle around the center of the board, and count the number of points within the two-inch circle. (Naturally, all the points in the one-inch circle are also in the two-inch circle.) The area A of a circle is A = π r², where r is is radius of the circle. The area grows with the square of r. Thus, we might think the number of points is proportional to r². The area of the 1-inch circle is A = π(1)² = π. The area of the two-inch circle is A = π(2)² = 4π. Thus, taking the ratio of areas, 4π/π = 4, one might expect four times as many points in the two-inch circle. This would be true if our throws were totally random, independently and uniformly distributed over the face of the target. On the other hand, if our aim is very good, there might only be twice as many points in the two-inch circle as in the one-inch circle. In that case, the number of points grows with r, not r².

Let C(r) be the number of points in a circle of radius r, divided by the total number of points (the total number of throws). Then, as the number of points (throws) goes to infinity, C(r) becomes simply the probability of finding a given point in a circle of radius r. Let's call C(r) the correlation integral. It's simply a measure of probability based on the radius r. C(r) is a probability distribution function, because obviously C(0) = 0, and C(infinity) = 1. No points will be found inside a circle of zero radius; all points will be found inside a circle of infinite radius.

In general, it may be that, for a sufficiently small radius r, the number of points, C(r), grows with r^D*:

(8.1) C(r) = lim_{r --> 0} K r^D*.

We will call D* the correlation dimension of the point distribution. It is a measure of how fast points accumulate as the radius of the circle increases. Note that equation (8.1) implies that D* is the slope of the ln C(r) curve versus the ln r curve:

(8.2) D* = lim_{r --> 0} [ln C(r)] / ln r .

The correlation dimension D* is related to the fractal dimension we defined previously. Grassberger and Procaccia[1] show that in general D*<=D.

Time-Series Data

Let's apply the concept of correlation dimension to time-series data. When we threw darts, we measured the distance between each point and the center of the dart board, and drew circles of radius r around the center of the dart board. But market prices and other economic observations, unfortunately, are not found on the wall near dart boards. So we have to measure distance by a different procedure. Suppose we have a time series of N price returns: x₁, x₂, . . . , x_m, x_m+1, . . . x_N.

If we compare each of these N prices with all of the others, there are N(N-1) possible comparisons. So in this case we define the correlation integral as

(8.3) C(r) = {the number of pairs (i , j ) such that |x_i - x_j| < r }/[N(N-1)].

That is, C(r) is simply the porportion of those pairs whose absolute difference lies within a circle of radius r. Restated: All pairs of values of the time series are compared; we count the number within a distance r from each other; and then we divide by the total number of pairs to get C(r). As N goes to infinity, C(r) becomes the probability of finding that any two randomly selected values differ by less than r. As before, C(r) is a probability distribution function.

Finally, let's define C(r) in such a way that it looks at sets of m successive observations. That is, we look at consecutive vectors X₁=(x₁,x₂,x₃, . . . x_m), X₂= (x₂,x₃,x₄, . . . ,x_m, x_m+1), etc. The number of such vectors is N-m+1, and if we compare each vector with each of the others, the number of comparisons is (N-m+1)(N-m). So for m-vectors we define a correlation integral C_m(r) as

(8.4)

C_m(r) = {the number of pairs (i, j ) such that each corresponding component of X_i , X_j , is less than r apart }/[(N-m+1)(N-m)].

As before we can calculate the correlation D* from (8.2).

In applying equation (8.4), the number m of successive observations we use is called the embedding dimension. The relationship between the embedding dimension m and the correlation dimension D* is very important. We can distinguish three cases.

Case 1: If we increase the embedding dimension m, and the estimate of the correlation dimension D* stops increasing, then the correlation dimension is said to saturate. For example, suppose we look at m-tuples of m = 5, and our estimate of the correlation dimension is also D* = 5. But then we look at m-tuples of m=30, and find our estimate of the correlation dimension is D* = 6. In that case the correlation dimension saturated at 6. We can be fairly sure the system is chaotic. (See Brock, Hsieh, and LeBaron [2].) Thus, short-term prediction will be possible within the parameters given by the Lyapunov exponents, but long-term prediction will not be possible. Saturation equals chaos. Peters [3], for example, finds D* = 2.33 for the S&P 500 index. This indicates a low-order chaotic system that could modelled by three (the greatest integer larger than D*) variables.

Case 2: If we increase the embedding dimension, and the correlation dimension keeps increasing, but remains well below the embedding dimension, then we can be pretty sure either the observations are Gaussian and correlated (the Hurst exponent H is not equal to .5), or they come from an independent and identically-distributed infinite-variance distribution (in which case the Hurst exponent H = 1/a, and 1/2 < H < infinity), or else they are both correlated and come from an infinite-variance distribution (in which case H > 1/a for positive correlation, or H < 1/a for negative correlation, where a>1, and 0<H<1). In that case we should be able to make some long-term predictions.

Case 3: If we increase the embedding dimension, and the correlation dimension keeps increasing in line with the embedding dimension, and stays more or less equal to the embedding dimension, then the series follows a martingale process. In this case, no prediction is possible (there are no predictable variations around trend).

References

[1] Grassberger, P., and I. Procaccia, Measuring the Strangeness of Strange Attractors, Physica, 9D, 1983, 189-208.

[2] Brock, Willam A., David A. Hsieh, and Blake LeBaron, Nonlinear Dynamics, Chaos, and Instability: Statistical Theory and Economic Evidence, The MIT Press, Cambridge MA, 1991.

[3] Peters, Edgar E., Chaos and Order in the Capital Markets, John Wiley & Sons, New York, 1991.

J. Orlin Grabbe is the author of International Financial Markets, and is an internationally recognized derivatives expert. He has recently branched out into cryptology, banking security, and digital cash. His home page is located at http://www.aci.net/kalliste//homepage.html .

-30-
from The Laissez Faire Electronic Times, Vol 2, No 32, August 18, 2003