README file for EBP H estimation Matlab code Owen Jones, 05 December 2003 Individuals are free to use this code for the purpose academic research, provided it is properly acknowledged. For any other use, permission must first be arranged with the author. Details of the algorithm are available in the paper "Estimating the Hurst index of a self-similar process via the crossing tree", Owen Jones and Yuan Shen, to appear in Signal Processing Letters. This code comprises 5 files get_hits.m calc_H.m calc_subX_stats.m (called by calc_H) calc_alpha_exponent.m (called by calc_subX_stats) plot_Xing.m To calculate an H estimate two steps are required. Firstly get_hits is used to calculate hitting times, hitting points, and subcrossing numbers. The results are saved in auxillary files. Secondly calc_H is used to give an H estimate based on a single crossing scale. Using get_hits -------------- We assume that we observe some continuous process X. Data is required in two files fname.dat and fname.tim. These must be of the same length, one entry per line. If x(i) is the i-th entry of fname.dat and t(i) the i-th entry of fname.tim, then x(i) is the positition of the (continuous) process X at time t(i). If t(i) = i then fname.tim can be omitted by setting the flag isTimeSeries = 1 when calling get_hits. If t(i) = i and x(i) = X(i) - X(i-1) then set flags isTimeSeries = 1 and isIncremental = 1. Note that if you only have observations at regular time points then at small scales not all crossings are known, which introduces a bias into the H estimate. Crossings are calculated at scales delta*2^n for each level n in levels. If delta is not specified then the standard deviation of X(t(i)) - X(t(i-1)) is used. levels must be a range of consecutive non-negative integers, for example [0:8]. The first apparent crossing is generally not a complete crossing, so the flag deleteFirst should usually be set to 1. For each specified level n, output is written to two files fname_n.hit and fname_n.sub. fname_n.hit contains two columns, the first gives hitting times and the second hitting points for crossings of size delta*2^n. fname_n.sub contains a single column which gives the number of subcrossings of size delta*2^(n-1) corresponding to each crossing of size delta*2^n. Note that if n0 is the smallest level for which hitting times are calculated, then subcrossing numbers are only available for levels n0+1 and above. Using calc_H ------------ calc_H uses the files calc_subX_stats and calc_alpha_exponent. To calculate H using level n crossings the data file fname_n.sub is required. Three methods are available for calculating confidence intervals for H. The method chosen will also affect the bias correction (which uses the same variance estimate used to calculate the confidence interval). The correct method to use depends on the correlation structure of the sequence of subcrossing numbers. Set biasCorrection = 1 to include a correction for the bias introduced by the transform H = log 2/ log mu. Method 'iid'. Use this if the subcrossing numbers are iid. This is the case for Brownian motion, otherwise is probably only reasonable for small H. Method 'srd'. Use this if the subcrossing numbers exhibit short-range dependence. A diagnostic plot is available in this case, which gives the sample autocorrelation and variance correction factor. If the varaince correction factor does not appear to converge, then the series may not be srd. Note that the sample autocorrelation is unreliable for large lags, so the variance correction factor should not be based on the whole sample autocorrelation. Use opt1 to specify what fraction of the sample autocorrelation to use; the default is 0.5. This method seems appropriate for moderate H. Note that errors will result if the variance correction factor falls below -1, and you should be distrustful of any situation in which it is below 1. Method 'lrd', Use this if the subcrossing numbers exhibit long-range dependence. Wavelets are used to estimate to decay exponent of the autocorrelation. The estimate is obtained from a weighted log-log regression of wavelet coefficients against scale. The regression points are available as a diagnostic plot. If there is not a reliable straight line fit to this data then the series is unlikely to be lrd. Two options are available for this method: opt1 specifies the degree of the wavelet basis used (default 10) and opt2 specifies the point at which the onset of scaling is observed (default 1). The regression is only taken from point opt2 onward. This method seems appropriate for large H (near 1). Note that the code does not check that the wavelet estimates fall within sensible bounds, so errors can result if this method is used in an inappropriate situation. Using plot_Xing --------------- plot_Xing is not needed to estimate H, but to visualise the crossing structure of a given sample. It should only be used for small samples. To use it you need to run get_hits first. The flags isTimeSeries and isIncremental are as for get_hits. The levels used should be a contiguous subset of the levels used in get_hits. Sample data ----------- The file fbm07.dat is a small sample of simulated fractional Brownian motion, observed at regular time points with H = 0.7. The standard deveiation of the increments is known to be 1. To get a plot of the crossings of this sample execute the following commands get_hits('fbm07', [3:5], 1, 1, 1, 0) plot_Xing('fbm07', [3:5], 1, 0) The files bc_paug89_1.dat and bc_paug89_1.tim contain the well known Bellcore traces from August 1989, in a format suitable for applying this code. For example, to estimate H using level 5 crossings do the following get_hits('bc_paug89_1', [4 5], 0, 1, 0, 0) [H, H_lb, H_ub] = calc_H('bc_paug89_1', 5, 'lrd', 1, 1, 10, 3)