Comparison of histograms in physical research

UDC: 53.088, 519.23

The review of methods of histograms comparison is presented. Possible approaches for the comparative analysis of histogram are considered.

The term “histogram” was coined by the famous statistician Karl Pearson to refer to a “common form of graphical representation” [1]. Histograms are very useful in their canonical visual representation, but today histograms are considered as purely mathematical objects.

Histograms are used in different scientific fields. Besides physics data analyses, histograms play a very important role in databases, image processing and computer vision [1]. Correspondingly, goals and methods of treatment of histograms are varied in dependence to the area of application. In this paper histograms are considered in frame of tasks related to physical experiments.

The paper presents some of the methods and results of the comparison of histograms. A comparison was made of three methods of the comparison of histograms: the Kolmogorov-Smirnov (KS) method, the Anderson-Darling (AD) method and the method for statistical comparison of histograms (SCH).

The dependence of the mean error in hypotheses testing of distinguishability of the reference data set and test data set on the difference in position parameter of samples: the Anderson-Darling and Kolmogorov-Smirnov criteria give the better result than SCH method. The dependence on the width parameter of samples: the SCH criterion gives the better result than AD and KS criteria.

Nevertheless, the SCH is a multidimensional method. It allows to include the any one-dimensional test statistic as an additional component of multidimensional test statistic in the frame of the method. For example, the including of the AD test statistic into SCH method as third component of the three dimensional test statistic will allow to reach the better coordinate resolution in the example which was considered above. Possible approaches for the comparative analysis of histogram are considered. As shown, there is no single best test for all applications. It means that before application any test must be checked with care. A good solution is a combined use of several tests.

Ссылки

Ioannidis Y., The history of histograms (abridged), Proceedings 2003 VLDB Conference, 2003, pp.19-30.
Cha S.-H., Srihari S.N., On measuring the distance between histograms, Pattern Recognition, 2002, v. 35, no. 6, pp. 1355-1370.
Kolmogorov A.N., Confidence limits for an unknown distribution function, Ann.Math.Stat., 1941, v. 12, no. 4, pp. 461-463.
Kullback S., Information Theory and Statistics, Wiley, New York, 1959.
Rosenthal J., Convergence rates for Markov chains, SIAM Rev., 1995, v. 37, pp. 387-485.
Cochran W., The chi-square test of goodness of fit, Ann.Math.Stat., 1952, v. 23, no. 3, pp. 315-342.
Kailath T., The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans.Commun.Technol, 1967, v.15, no. 1, pp. 52-60.
Hellinger E., Neue Begrundung der Theorie quadratischer Formen von unendlichvielen Veranderlichhen, Journal fur die reine und angewandte Mathematik, (in German), 1909, v. 1909, no. 136, p. 210.
Anderson T.W., Darling D.A., Asymptotic theory of certain «goodness of fit» criteria based on stochastic processes, Ann. Math. Statist., 1952, v. 23, no. 2, p. 193.
Gretton A., Borgwardt K., Rasch M.J., Scholkopf B., Smola A.J., A Kernel method for two-sample problem, arXiv:0805.2368, 2008.
Mann H.B., Whitney D.R., On a test of whether one of two random variables is stochastically larger than the other, Ann.Math.Stat, 1947, v.18, no. 1, p.50.
Bandemer H., Nather W., Fuzzy data analysis, Kluwer academic publishers, Dordrecht, 1992.
Luuka P., Collan M., Modulo similarity in comparing histograms, Proc. of IFSA-EUSFLAT2015, Eds. Alonso J.M., Bustince H., M. Reformat, Atlantis Press, 2015.
Bityukov S., Krasnikov N., Nikitenko A., Smirnova V., A method for statistical comparison of histograms, arXiv:1302.2651, 2013.
Krupanek B., Bogacz R., Comparison algorithm of multimodal histograms from wireless transmission, Przeglad Electrotechniczny, 2014, no. 11, p.32.
Cao Y., Petzold L., Accuracy limitations and the measurement of errors in the stochastic simulation of chemically reacting systems, J. of Computational Physics, 2006, v. 212, no. 1, pp. 6-24.
Xu K.-M., Using the bootstrap method for a statistical significance test of differences between summary histograms, NASA Technical Reports Server, ID: 20080015431, 2006.

Ea histogram the Monte Carlo method the event flow the test statistic

Link for citing the article: Bityukov S.I., Maksimushkina A.V., Smirnova V.V. Comparison of histograms in physical research. Izvestiya vuzov. Yadernaya Energetika. 2016, no. 1, pp. 81-90; DOI: https://doi.org/10.26583/npe.2016.1.09 (in Russian).

Open PDF file