Reproducible-Research
Codes and data to reproduce the results of research by P. Pernot and collaborators
view repo
The comparison of benchmark error sets is an essential tool for the evaluation of theories in computational chemistry. The standard ranking of methods by their Mean Unsigned Error is unsatisfactory for several reasons linked to the non-normality of the error distributions and the presence of underlying trends. Complementary statistics have recently been proposed to palliate such deficiencies, such as quantiles of the absolute errors distribution or the mean prediction uncertainty. We introduce here a new score, the systematic improvement probability (SIP), based on the direct system-wise comparison of absolute errors. Independently of the chosen scoring rule, the uncertainty of the statistics due to the incompleteness of the benchmark data sets is also generally overlooked. However, this uncertainty is essential to appreciate the robustness of rankings. In the present article, we develop two indicators based on robust statistics to address this problem: P_inv, the inversion probability between two values of a statistic, and P_r, the ranking probability matrix. We demonstrate also the essential contribution of the correlations between error sets in these scores comparisons.
READ FULL TEXT
Averages of proper scoring rules are often used to rank probabilistic
fo...
read it
A common problem in machine learning is to rank a set of n items based o...
read it
In a previous article of ours, we explained the reasons why the MNCS and...
read it
The comparisons of uncertainty calculi from the last two Uncertainty
Wor...
read it
Quadratic discriminant analysis (QDA) is a widely used statistical tool ...
read it
Providing a metric of uncertainty alongside a state estimate is often cr...
read it
This paper compares the Anderson-Darling and some Eicker-Jaeschke statis...
read it
Codes and data to reproduce the results of research by P. Pernot and collaborators
Comments
There are no comments yet.