Source code for valjean.gavroche.stat_tests.bonferroni

# Copyright French Alternative Energies and Atomic Energy Commission
# Contributors: valjean developers
# valjean-support@cea.fr
#
# This software is a computer program whose purpose is to analyze and
# post-process numerical simulation results.
#
# This software is governed by the CeCILL license under French law and abiding
# by the rules of distribution of free software. You can use, modify and/ or
# redistribute the software under the terms of the CeCILL license as circulated
# by CEA, CNRS and INRIA at the following URL: http://www.cecill.info.
#
# As a counterpart to the access to the source code and rights to copy, modify
# and redistribute granted by the license, users are provided only with a
# limited warranty and the software's author, the holder of the economic
# rights, and the successive licensors have only limited liability.
#
# In this respect, the user's attention is drawn to the risks associated with
# loading, using, modifying and/or developing or reproducing the software by
# the user in light of its specific status of free software, that may mean that
# it is complicated to manipulate, and that also therefore means that it is
# reserved for developers and experienced professionals having in-depth
# computer knowledge. Users are therefore encouraged to load and test the
# software's suitability as regards their requirements in conditions enabling
# the security of their systems and/or data to be ensured and, more generally,
# to use and operate it in the same conditions as regards security.
#
# The fact that you are presently reading this means that you have had
# knowledge of the CeCILL license and that you accept its terms.

r'''
.. _family-wise: https://en.wikipedia.org/wiki/Family-wise_error_rate
.. _Bonferroni: https://en.wikipedia.org/wiki/Bonferroni_correction
.. _Holm-Bonferroni: https://en.wikipedia.org/wiki/Holm-Bonferroni_method

In some cases we would like to interpret not only one test (like on a generic
score in Tripoli-4) but multiple ones (in a spectrum case for example). Each
individual test is supposed to be successful but in most of the case we can
accept that some of them fail (in a given acceptance of course).

The Bonferroni correction and the Holm-Bonferroni method allow to treat such
cases. See for example `family-wise`_ error rate Wikipedia webpage for more
details on this kind    of tests.

Here are implemented two tests based on comparison of the p-value from the
chosen test and a significance level:

* Bonferroni correction: all individual p-values are compared to the same
  significance level, depending on the number of tests;
* Holm-Bonferroni method: p-values are first sorted then compared to different
  significance levels depending on number of tests and rank.


In both cases a first test is chosen, like Student test, to evaluate the
individual hypothesis. The p-value of the test is calculated and used in the
following tests. Like in other tests, we consider two-sided significance levels
here.

The null hypothesis is that the p-value is higher than a given significance
level, that depends on the overall required significance level, the number of
tests considered and the chosen method (Bonferroni or Holm-Bonferroni).


Bonferroni correction
`````````````````````

The Bonferroni_ correction is used for multiple comparisons. It is based on a
weighted comparison to the significance level α.

If we have :math:`m` different tests (for example a spectrum with :math:`m`
bins), the significance level α will be weighted by :math:`m`. This p-value of
the :math:`m` tests will then be compared to that value. As a consequence, the
null hypothesis will be rejected if

.. math::

    p_k \leq \frac{\alpha}{m}

The  individual significance level is usually largely smaller than the required
one and the sum of them obvioulsy give α.

Conclusion of the test depends on what we want:

* it can be a simple ``True`` if all hypothesis are accepted;
* it can be ``False`` if at least one null hypothesis was rejected;
* we can also choose a level of acceptance, for example "it is fine if 1 % of
  the null hypothesis are rejected"

Per default, in our case the test is rejected if there is at least one null
hypothesis rejected.


Holm-Bonferroni method
``````````````````````

The Holm-Bonferroni_ method is a variation of the Bonferroni correction.

The p-value from the chosen test are first sorted from lower to higher value.
Then each of them are compared to the significance level weighted by
:math:`m - k + 1` with :math:`m` the number of tests and :math:`k` the rank
(starting at 1). The null hypothesis is rejected if

.. math::

    p_k \leq \frac{\alpha}{m - k + 1}.

The test can be stopped at the first :math:`k` accepting the null hypothesis.

This test is quite conservative. It can be possible to get the proportion of
accepted and rejected hypothesis, like in the Bonferroni correction case.


Code implementation
```````````````````

In both cases, the test is initiliazed with another test, a Student test for
example. This test can be evaluated inside the new test
:meth:`~TestBonferroni.evaluate` method or outside of it. Only examples with
the full structure will be shown here.

TestResults are returned as ``False`` if one null hypothesis is rejected.
Access to the initial test is still provided.


Examples
````````

Let's do the usual imports then use the quick example from Student test, with
the p-value calculation:

>>> import numpy as np
>>> from valjean.eponine.dataset import Dataset
>>> from valjean.gavroche.stat_tests.student import TestStudent
>>> from valjean.gavroche.stat_tests.bonferroni import TestBonferroni
>>> from valjean.gavroche.stat_tests.bonferroni import TestHolmBonferroni


Successful test with Bonferroni correction and Holm-Bonferroni method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>> ds1 = Dataset(np.array([5.2, 5.3, 5.25, 5.4, 5.5]),
...               np.array([0.2, 0.25, 0.1, 0.2, 0.3]))
>>> ds2 = Dataset(np.array([5.1, 5.6, 5.2, 5.3, 5.2]),
...               np.array([0.1, 0.3, 0.05, 0.4, 0.3]))
>>> tstudent_12 = TestStudent(ds1, ds2, name="comp",
...                           description="Comparison using Student test",
...                           alpha=0.05, ndf=1000)

>>> tbonf = TestBonferroni(name="bonf", description="Bonferroni correction",
...                        test=tstudent_12, alpha=0.05)
>>> tbonf.ntests
5
>>> tbonf.ntests == ds1.size
True
>>> tbonf.bonf_signi_level
0.005
>>> tbonf_res = tbonf.evaluate()
>>> bool(tbonf_res)
True
>>> tbonf_res.nb_rejected
[0]

>>> tholmbonf = TestHolmBonferroni(name="holm-bonf",
...                                description="Holm-Bonferroni method",
...                                test=tstudent_12, alpha=0.05)
>>> tholmbonf.alpha  # alpha is two-sided
0.025
>>> thb_res = tholmbonf.evaluate()
>>> bool(thb_res)
True
>>> thb_res.nb_rejected
[0]


Failing test with Bonferroni correction and Holm-Bonferroni method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>> ds3 = Dataset(np.array([5.1, 5.9, 5.8, 5.3, 4.6]),
...               np.array([0.1, 0.1, 0.05, 0.4, 0.12]))
>>> tstudent_13 = TestStudent(ds1, ds3, name="comp",
...                           description="Comparison using Student test",
...                           alpha=0.05, ndf=1000)

>>> tbonf = TestBonferroni(name="bonf", description="Bonferroni correction",
...                        test=tstudent_13, alpha=0.05)
>>> tbonf.ntests
5
>>> tbonf.bonf_signi_level
0.005
>>> tbonf_res = tbonf.evaluate()
>>> bool(tbonf_res)
False
>>> tbonf_res.nb_rejected
[1]
>>> tbonf_res.rejected_proportion  # in percentage
[20.0]

>>> tholmbonf = TestHolmBonferroni(name="holm-bonf",
...                                description="Holm-Bonferroni method",
...                                test=tstudent_13, alpha=0.05)
>>> tholmbonf.alpha  # alpha is two-sided
0.025
>>> thb_res = tholmbonf.evaluate()
>>> bool(thb_res)
False
>>> thb_res.nb_rejected
[2]
>>> thb_res.rejected_proportion  # in percentage
[40.0]

For an easier comparison, p-values, associated significance levels and the
corresponding result from the test (rejection or not) can be printed. The
sorting order is also given for the Holm-Bonferroni method.

>>> it = np.nditer([thb_res.first_test_res.pvalue,
...                 tbonf.bonf_signi_level, tbonf_res.rejected_null_hyp,
...                 thb_res.sort_ordering, thb_res.alphas_i,
...                 thb_res.rejected_null_hyp],
...                flags=['multi_index'])
>>> while not it.finished:
...     if it.iterindex == 0:
...         print("index  | pvalue(B) |   α(B)    |  reject B | order(HB) |"
...               f"   α(HB)   | reject HB\n{'-':->77}")
...     print(f"{it.multi_index} | {it[0]:.3e} | {it[1]:.3e} | {it[2]!s:^9} | "
...           f"{it[3]:^7}   | {it[4]:.3e} |   {it[5]!s:<}")
...     _ = it.iternext()
index  | pvalue(B) |   α(B)    |  reject B | order(HB) |   α(HB)   | reject HB
-----------------------------------------------------------------------------
(0, 0) | 6.548e-01 | 5.000e-03 |   False   |    3      | 1.250e-02 |   False
(0, 1) | 2.608e-02 | 5.000e-03 |   False   |    2      | 8.333e-03 |   False
(0, 2) | 1.015e-06 | 5.000e-03 |   True    |    0      | 5.000e-03 |   True
(0, 3) | 8.231e-01 | 5.000e-03 |   False   |    4      | 2.500e-02 |   False
(0, 4) | 5.447e-03 | 5.000e-03 |   False   |    1      | 6.250e-03 |   True


The first index corresponds to the index of the compared dataset in the
``datasets`` container in the input test (compared to ``dsref``).

The Holm-Bonferroni test is uniformly more powerful than the Bonferroni test.
In this particular case, Holm-Bonferroni is able to reject an additional null
hypothesis with respect to plain Bonferroni.


Tests with multi-dimension datasets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>> ds4 = Dataset(np.array([[5.2, 5.3, 5.25], [5.4, 5.5, 5.2]]),
...               np.array([[0.2, 0.25, 0.1], [0.2, 0.3, 0.25]]))
>>> ds5 = Dataset(np.array([[5.1, 5.9, 5.8], [5.3, 4.7, 5.5]]),
...               np.array([[0.1, 0.1, 0.05], [0.4, 0.05, 0.2]]))
>>> ds4.shape
(2, 3)
>>> ds4.size
6
>>> tstudent_45 = TestStudent(ds4, ds5, name="comp",
...                           description="Comparison using Student test",
...                           alpha=0.05, ndf=1000)

>>> tbonf = TestBonferroni(name="bonf", description="Bonferroni correction",
...                        test=tstudent_45, alpha=0.05)
>>> tbonf.ntests
6
>>> print(f'{tbonf.bonf_signi_level:.6f}')
0.004167
>>> tbonf_res = tbonf.evaluate()
>>> bool(tbonf_res)
False
>>> tbonf_res.nb_rejected
[1]
>>> print(f'{tbonf_res.rejected_proportion[0]:.3f} %')
16.667 %

>>> tholmbonf = TestHolmBonferroni(name="holm-bonf",
...                                description="Holm-Bonferroni method",
...                                test=tstudent_45, alpha=0.05)
>>> thb_res = tholmbonf.evaluate()
>>> tholmbonf.alpha
0.025
>>> bool(thb_res)
False
>>> thb_res.nb_rejected
[1]
>>> print(f'{thb_res.rejected_proportion[0]:.3f} %')
16.667 %

The detailed comparison also works for multi-dimensions arrays:

>>> it = np.nditer([thb_res.first_test_res.pvalue, tbonf.bonf_signi_level,
...                 tbonf_res.rejected_null_hyp, thb_res.sort_ordering,
...                 thb_res.alphas_i, thb_res.rejected_null_hyp],
...                flags=['multi_index'])
>>> while not it.finished:
...     if it.iterindex == 0:
...         print("index     | pvalue(B) |   α(B)   | reject B | order(HB) |"
...               f"   α(HB)  | reject HB\n{'-':->79}")
...     print(f"{it.multi_index} |  {it[0]:.2e} | {it[1]:.2e} | {it[2]!s:^7}  "
...           f"| {it[3]:^7}   | {it[4]:.2e} |   {it[5]!s:<}")
...     _ = it.iternext()
index     | pvalue(B) |   α(B)   | reject B | order(HB) |   α(HB)  | reject HB
-------------------------------------------------------------------------------
(0, 0, 0) |  6.55e-01 | 4.17e-03 |  False   |    4      | 1.25e-02 |   False
(0, 0, 1) |  2.61e-02 | 4.17e-03 |  False   |    2      | 6.25e-03 |   False
(0, 0, 2) |  1.01e-06 | 4.17e-03 |  True    |    0      | 4.17e-03 |   True
(0, 1, 0) |  8.23e-01 | 4.17e-03 |  False   |    5      | 2.50e-02 |   False
(0, 1, 1) |  8.66e-03 | 4.17e-03 |  False   |    1      | 5.00e-03 |   False
(0, 1, 2) |  3.49e-01 | 4.17e-03 |  False   |    3      | 8.33e-03 |   False

The first index corresponds to the index of the compared dataset in the
``datasets`` container in the input test (compared to ``dsref``). The next ones
correspond to the shape of the arrays.

The sorting order for Holm-Bonferroni considers all elements as equivalent (not
done per dimension).
'''

import numpy as np
from ..test import Test, TestResult


[docs]class TestResultBonferroni(TestResult): '''Test result from Bonferroni correction. The test returns ``True`` if the test is successful (null hypothesis was accepted) and ``False`` if it is rejected. '''
[docs] def __init__(self, test, first_test_res, rejected_null_hyp): '''Initialisation of :class:`~.TestResultBonferroni` :param test: the used Bonferroni test :type test: :class:`~.TestBonferroni` :param first_test_res: the test result used to obtain the p-values :type first_test_res: :class:`~valjean.gavroche.test.TestResult` child class :param reject_hyp: rejection of the null hypothesis for all bins :type reject_hyp: :class:`numpy.ndarray` (:obj:`bool`) ''' super().__init__(test) self.first_test_res = first_test_res self.rejected_null_hyp = rejected_null_hyp
[docs] def oracles(self): '''Final test (on list of tests, espacially in case of mulitple tests on common reference). :returns: list(bool) ''' return [not np.any(rnh) for rnh in self.rejected_null_hyp]
def __bool__(self): return not np.any(self.rejected_null_hyp) @property def nb_rejected(self): '''Number of rejected null hypothesis according Holm-Bonferroni test. :returns: list(int) or :class:`list` (:obj:`numpy.generic` (:obj:`int`)) ''' return [np.count_nonzero(rnh) for rnh in self.rejected_null_hyp] @property def rejected_proportion(self): '''Rejected proportion in percentage. :returns: :class:`list` (:class:`numpy.generic` (:obj:`float`)) ''' return [nr / rnh.size * 100 for nr, rnh in zip(self.nb_rejected, self.rejected_null_hyp)]
[docs]class TestBonferroni(Test): '''Bonferroni correction for multiple tests of the same hypothesis.'''
[docs] def __init__(self, *, name, description='', labels=None, test, alpha=0.01): '''Initialisation of :class:`TestBonferroni`. :param str name: local name of the test :param str description: specific description of the test :param dict labels: labels to be used for test classification in reports (for example category, input file name, type of result, ...) :param test: test from which pvalues will be extracted :type test: :class:`valjean.gavroche.test.Test` child class :param float alpha: required significance level The significance level is considered two-sided here, so divide by 2. It is the significance level for the whole test. ''' super().__init__(name=name, description=description, labels=labels) self.test = test self.alpha = np.float_(alpha / 2) self.bonf_signi_level = self.alpha / self.ntests
@property def ntests(self): '''Returns the number of hypotheses to test, i.e. number of bins here. ''' return self.test.dsref.size
[docs] @staticmethod def bonferroni_correction(pvalues, bonf_signi_level): '''Bonferroni correction. :param pvalues: p-values from the original test :type pvalues: :obj:`numpy.ndarray` (:obj:`float`) :param float bonf_signi_level: significance level for each test (α weighted by the number of tests) :returns: :obj:`numpy.ndarray` (:obj:`bool`) Compares the p-value to the "Bonferroni significance level": * if lower, null hypothesis is rejected for the bin * if higher, null hypothesis is accepted for the bin. ''' reject = pvalues <= bonf_signi_level return reject
[docs] def evaluate(self): '''Evaluate the Bonferroni correction. :returns: :class:`~TestResultBonferroni` ''' test_res = self.test.evaluate() null_hyp_reject = [ self.bonferroni_correction(pval, self.bonf_signi_level) for pval in test_res.pvalue] return TestResultBonferroni(self, test_res, null_hyp_reject)
[docs] def data(self): '''Generator yielding objects supporting the buffer protocol that (as a whole) represent a serialized version of `self`.''' yield from super().data() yield self.__class__.__name__.encode('utf-8') yield from self.test.data() yield float(self.alpha).hex().encode('utf-8')
[docs]class TestResultHolmBonferroni(TestResult): '''Result from the Holm-Bonferroni method.'''
[docs] def __init__(self, test, first_test_res, alphas_i, rejected_null_hyp): '''Initialisation of :class:`~.TestResultHolmBonferroni` :param test: the used Holm-Bonferroni test :type test: :class:`~TestHolmBonferroni` :param first_test_res: the test result used to obtain the p-values :type first_test_res: :class:`~valjean.gavroche.test.TestResult` child class :param sorted_indices: indices of the p-values sorted to get p-values in increasing order :type sorted_indices: :obj:`numpy.generic` (:obj:`int`) :param alphas_i: significance levels for sorted pvalues :type alphas_i: :obj:`numpy.ndarray` (:obj:`float`) :param rejected_hyp: rejection of the null hypothesis for all bins :type rejected_hyp: :obj:`numpy.ndarray` (:obj:`bool`) ''' super().__init__(test) self.alphas_i = alphas_i self.rejected_null_hyp = rejected_null_hyp self.first_test_res = first_test_res
[docs] def oracles(self): '''Final test (on list of tests, espacially in case of mulitple tests on common reference). :returns: list(bool) ''' return [not np.any(rnh) for rnh in self.rejected_null_hyp]
def __bool__(self): return not np.any(self.rejected_null_hyp) @property def nb_rejected(self): '''Number of rejected null hypothesis according Holm-Bonferroni test. :returns: list(int) or :class:`list` (:obj:`numpy.generic` (:obj:`int`)) ''' return [np.count_nonzero(rnh) for rnh in self.rejected_null_hyp] @property def rejected_proportion(self): '''Rejected proportion in percentage. :returns: :class:`list` (:class:`numpy.generic` (:obj:`float`)) ''' return [nr / rnh.size * 100 for nr, rnh in zip(self.nb_rejected, self.rejected_null_hyp)] @property def sort_ordering(self): '''Get the sorted pvalues. :returns: :class:`list` (:class:`numpy.generic` (:obj:`int`)) ''' sort_inds = [np.argsort(pvals.flatten()) for pvals in self.first_test_res.pvalue] inv_sort = [np.argsort(si) for si in sort_inds] return [iv.reshape(pvals.shape) for pvals in self.first_test_res.pvalue for iv in inv_sort]
[docs]class TestHolmBonferroni(Test): '''Holm-Bonferroni method for multiple tests of the same hypothesis.'''
[docs] def __init__(self, *, name, description='', labels=None, test, alpha=0.01): '''Initialisation of :class:`TestHolmBonferroni`. :param str name: local name of the test :param str description: specific description of the test :param dict labels: labels to be used for test classification in reports (for example category, jdd name, type of result, ...) :param test: test from which pvalues will be extracted :type test: :class:`valjean.gavroche.test.Test` child class :param float alpha: significance level The significance level is also considered two-sided here, so divide by 2. This is the overall significance level. ''' super().__init__(name=name, description=description, labels=labels) self.test = test self.alpha = alpha / 2
@property def ntests(self): '''Returns the number of hypotheses to test, i.e. number of bins here. ''' return self.test.dsref.size
[docs] @staticmethod def holm_bonferroni_method(pvalues, alpha): '''Holm-Bonferroni method. :param pvalues: array of pvalues :type pvalues: :obj:`numpy.ndarray` :param float alpha: significance level chosen for the Holm-Bonferroni method :returns: sorted indices, array of the bins significance level, array of rejection of the null hypothesis ''' flat_pvals = pvalues.flatten() sorted_inds = np.argsort(flat_pvals) alpha_i, rejected_hyp = [], [] for _i, pval in enumerate(flat_pvals[sorted_inds]): denom = flat_pvals.size - (_i+1) + 1 alpha_i.append(alpha / denom) rejected_hyp.append(pval < alpha_i[-1]) inv_sort = np.argsort(sorted_inds) return (np.array(alpha_i)[inv_sort].reshape(pvalues.shape), np.array(rejected_hyp)[inv_sort].reshape(pvalues.shape))
[docs] def evaluate(self): '''Evaluation of the used test and of the Holm-Bonferroni method. :returns: :class:`~.TestResultHolmBonferroni` ''' test_res = self.test.evaluate() hb_res = [self.holm_bonferroni_method(pvals, self.alpha) for pvals in test_res.pvalue] return TestResultHolmBonferroni(self, test_res, *zip(*hb_res))
[docs] def data(self): '''Generator yielding objects supporting the buffer protocol that (as a whole) represent a serialized version of `self`.''' yield from super().data() yield self.__class__.__name__.encode('utf-8') yield from self.test.data() yield float(self.alpha).hex().encode('utf-8')