Source code for valjean.gavroche.stat_tests.bonferroni

# Copyright French Alternative Energies and Atomic Energy Commission
# Contributors: valjean developers
# valjean-support@cea.fr
#
# This software is a computer program whose purpose is to analyze and
# post-process numerical simulation results.
#
# This software is governed by the CeCILL license under French law and abiding
# by the rules of distribution of free software. You can use, modify and/ or
# redistribute the software under the terms of the CeCILL license as circulated
# by CEA, CNRS and INRIA at the following URL: http://www.cecill.info.
#
# As a counterpart to the access to the source code and rights to copy, modify
# and redistribute granted by the license, users are provided only with a
# limited warranty and the software's author, the holder of the economic
# rights, and the successive licensors have only limited liability.
#
# In this respect, the user's attention is drawn to the risks associated with
# loading, using, modifying and/or developing or reproducing the software by
# the user in light of its specific status of free software, that may mean that
# it is complicated to manipulate, and that also therefore means that it is
# reserved for developers and experienced professionals having in-depth
# computer knowledge. Users are therefore encouraged to load and test the
# software's suitability as regards their requirements in conditions enabling
# the security of their systems and/or data to be ensured and, more generally,
# to use and operate it in the same conditions as regards security.
#
# The fact that you are presently reading this means that you have had
# knowledge of the CeCILL license and that you accept its terms.

r'''
.. _family-wise: https://en.wikipedia.org/wiki/Family-wise_error_rate
.. _Bonferroni: https://en.wikipedia.org/wiki/Bonferroni_correction
.. _Holm-Bonferroni: https://en.wikipedia.org/wiki/Holm-Bonferroni_method

In some cases we would like to interpret not only one test (like on a generic
score in Tripoli-4) but multiple ones (in a spectrum case for example). Each
individual test is supposed to be successful but in most of the case we can
accept that some of them fail (in a given acceptance of course).

The Bonferroni correction and the Holm-Bonferroni method allow to treat such
cases. See for example `family-wise`_ error rate Wikipedia webpage for more
details on this kind    of tests.

Here are implemented two tests based on comparison of the p-value from the
chosen test and a significance level:

* Bonferroni correction: all individual p-values are compared to the same
  significance level, depending on the number of tests;
* Holm-Bonferroni method: p-values are first sorted then compared to different
  significance levels depending on number of tests and rank.


In both cases a first test is chosen, like Student test, to evaluate the
individual hypothesis. The p-value of the test is calculated and used in the
following tests. Like in other tests, we consider two-sided significance levels
here.

The null hypothesis is that the p-value is higher than a given significance
level, that depends on the overall required significance level, the number of
tests considered and the chosen method (Bonferroni or Holm-Bonferroni).


Bonferroni correction
`````````````````````

The Bonferroni_ correction is used for multiple comparisons. It is based on a
weighted comparison to the significance level α.

If we have :math:`m` different tests (for example a spectrum with :math:`m`
bins), the significance level α will be weighted by :math:`m`. This p-value of
the :math:`m` tests will then be compared to that value. As a consequence, the
null hypothesis will be rejected if

.. math::

    p_k \leq \frac{\alpha}{m}

The  individual significance level is usually largely smaller than the required
one and the sum of them obvioulsy give α.

Conclusion of the test depends on what we want:

* it can be a simple ``True`` if all hypothesis are accepted;
* it can be ``False`` if at least one null hypothesis was rejected;
* we can also choose a level of acceptance, for example "it is fine if 1 % of
  the null hypothesis are rejected"

Per default, in our case the test is rejected if there is at least one null
hypothesis rejected.


Holm-Bonferroni method
``````````````````````

The Holm-Bonferroni_ method is a variation of the Bonferroni correction.

The p-value from the chosen test are first sorted from lower to higher value.
Then each of them are compared to the significance level weighted by
:math:`m - k + 1` with :math:`m` the number of tests and :math:`k` the rank
(starting at 1). The null hypothesis is rejected if

.. math::

    p_k \leq \frac{\alpha}{m - k + 1}.

The test can be stopped at the first :math:`k` accepting the null hypothesis.

This test is quite conservative. It can be possible to get the proportion of
accepted and rejected hypothesis, like in the Bonferroni correction case.


Code implementation
```````````````````

In both cases, the test is initiliazed with another test, a Student test for
example. This test can be evaluated inside the new test
:meth:`~TestBonferroni.evaluate` method or outside of it. Only examples with
the full structure will be shown here.

TestResults are returned as ``False`` if one null hypothesis is rejected.
Access to the initial test is still provided.


Examples
````````

Let's do the usual imports then use the quick example from Student test, with
the p-value calculation:

>>> import numpy as np
>>> from valjean.eponine.dataset import Dataset
>>> from valjean.gavroche.stat_tests.student import TestStudent
>>> from valjean.gavroche.stat_tests.bonferroni import TestBonferroni
>>> from valjean.gavroche.stat_tests.bonferroni import TestHolmBonferroni


Successful test with Bonferroni correction and Holm-Bonferroni method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>> ds1 = Dataset(np.array([5.2, 5.3, 5.25, 5.4, 5.5]),
...               np.array([0.2, 0.25, 0.1, 0.2, 0.3]))
>>> ds2 = Dataset(np.array([5.1, 5.6, 5.2, 5.3, 5.2]),
...               np.array([0.1, 0.3, 0.05, 0.4, 0.3]))
>>> tstudent_12 = TestStudent(ds1, ds2, name="comp",
...                           description="Comparison using Student test",
...                           alpha=0.05, ndf=1000)

>>> tbonf = TestBonferroni(name="bonf", description="Bonferroni correction",
...                        test=tstudent_12, alpha=0.05)
>>> tbonf.ntests
5
>>> tbonf.ntests == ds1.size
True
>>> tbonf.bonf_signi_level
0.005
>>> tbonf_res = tbonf.evaluate()
>>> bool(tbonf_res)
True
>>> tbonf_res.nb_rejected
[0]

>>> tholmbonf = TestHolmBonferroni(name="holm-bonf",
...                                description="Holm-Bonferroni method",
...                                test=tstudent_12, alpha=0.05)
>>> tholmbonf.alpha  # alpha is two-sided
0.025
>>> thb_res = tholmbonf.evaluate()
>>> bool(thb_res)
True
>>> thb_res.nb_rejected
[0]


Failing test with Bonferroni correction and Holm-Bonferroni method
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>> ds3 = Dataset(np.array([5.1, 5.9, 5.8, 5.3, 4.6]),
...               np.array([0.1, 0.1, 0.05, 0.4, 0.12]))
>>> tstudent_13 = TestStudent(ds1, ds3, name="comp",
...                           description="Comparison using Student test",
...                           alpha=0.05, ndf=1000)

>>> tbonf = TestBonferroni(name="bonf", description="Bonferroni correction",
...                        test=tstudent_13, alpha=0.05)
>>> tbonf.ntests
5
>>> tbonf.bonf_signi_level
0.005
>>> tbonf_res = tbonf.evaluate()
>>> bool(tbonf_res)
False
>>> tbonf_res.nb_rejected
[1]
>>> tbonf_res.rejected_proportion  # in percentage
[20.0]

>>> tholmbonf = TestHolmBonferroni(name="holm-bonf",
...                                description="Holm-Bonferroni method",
...                                test=tstudent_13, alpha=0.05)
>>> tholmbonf.alpha  # alpha is two-sided
0.025
>>> thb_res = tholmbonf.evaluate()
>>> bool(thb_res)
False
>>> thb_res.nb_rejected
[2]
>>> thb_res.rejected_proportion  # in percentage
[40.0]

For an easier comparison, p-values, associated significance levels and the
corresponding result from the test (rejection or not) can be printed. The
sorting order is also given for the Holm-Bonferroni method.

>>> it = np.nditer([thb_res.first_test_res.pvalue,
...                 tbonf.bonf_signi_level, tbonf_res.rejected_null_hyp,
...                 thb_res.sort_ordering, thb_res.alphas_i,
...                 thb_res.rejected_null_hyp],
...                flags=['multi_index'])
>>> while not it.finished:
...     if it.iterindex == 0:
...         print("index  | pvalue(B) |   α(B)    |  reject B | order(HB) |"
...               f"   α(HB)   | reject HB\n{'-':->77}")
...     print(f"{it.multi_index} | {it[0]:.3e} | {it[1]:.3e} | {it[2]!s:^9} | "
...           f"{it[3]:^7}   | {it[4]:.3e} |   {it[5]!s:<}")
...     _ = it.iternext()
index  | pvalue(B) |   α(B)    |  reject B | order(HB) |   α(HB)   | reject HB
-----------------------------------------------------------------------------
(0, 0) | 6.548e-01 | 5.000e-03 |   False   |    3      | 1.250e-02 |   False
(0, 1) | 2.608e-02 | 5.000e-03 |   False   |    2      | 8.333e-03 |   False
(0, 2) | 1.015e-06 | 5.000e-03 |   True    |    0      | 5.000e-03 |   True
(0, 3) | 8.231e-01 | 5.000e-03 |   False   |    4      | 2.500e-02 |   False
(0, 4) | 5.447e-03 | 5.000e-03 |   False   |    1      | 6.250e-03 |   True


The first index corresponds to the index of the compared dataset in the
``datasets`` container in the input test (compared to ``dsref``).

The Holm-Bonferroni test is uniformly more powerful than the Bonferroni test.
In this particular case, Holm-Bonferroni is able to reject an additional null
hypothesis with respect to plain Bonferroni.


Tests with multi-dimension datasets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>> ds4 = Dataset(np.array([[5.2, 5.3, 5.25], [5.4, 5.5, 5.2]]),
...               np.array([[0.2, 0.25, 0.1], [0.2, 0.3, 0.25]]))
>>> ds5 = Dataset(np.array([[5.1, 5.9, 5.8], [5.3, 4.7, 5.5]]),
...               np.array([[0.1, 0.1, 0.05], [0.4, 0.05, 0.2]]))
>>> ds4.shape
(2, 3)
>>> ds4.size
6
>>> tstudent_45 = TestStudent(ds4, ds5, name="comp",
...                           description="Comparison using Student test",
...                           alpha=0.05, ndf=1000)

>>> tbonf = TestBonferroni(name="bonf", description="Bonferroni correction",
...                        test=tstudent_45, alpha=0.05)
>>> tbonf.ntests
6
>>> print(f'{tbonf.bonf_signi_level:.6f}')
0.004167
>>> tbonf_res = tbonf.evaluate()
>>> bool(tbonf_res)
False
>>> tbonf_res.nb_rejected
[1]
>>> print(f'{tbonf_res.rejected_proportion[0]:.3f} %')
16.667 %

>>> tholmbonf = TestHolmBonferroni(name="holm-bonf",
...                                description="Holm-Bonferroni method",
...                                test=tstudent_45, alpha=0.05)
>>> thb_res = tholmbonf.evaluate()
>>> tholmbonf.alpha
0.025
>>> bool(thb_res)
False
>>> thb_res.nb_rejected
[1]
>>> print(f'{thb_res.rejected_proportion[0]:.3f} %')
16.667 %

The detailed comparison also works for multi-dimensions arrays:

>>> it = np.nditer([thb_res.first_test_res.pvalue, tbonf.bonf_signi_level,
...                 tbonf_res.rejected_null_hyp, thb_res.sort_ordering,
...                 thb_res.alphas_i, thb_res.rejected_null_hyp],
...                flags=['multi_index'])
>>> while not it.finished:
...     if it.iterindex == 0:
...         print("index     | pvalue(B) |   α(B)   | reject B | order(HB) |"
...               f"   α(HB)  | reject HB\n{'-':->79}")
...     print(f"{it.multi_index} |  {it[0]:.2e} | {it[1]:.2e} | {it[2]!s:^7}  "
...           f"| {it[3]:^7}   | {it[4]:.2e} |   {it[5]!s:<}")
...     _ = it.iternext()
index     | pvalue(B) |   α(B)   | reject B | order(HB) |   α(HB)  | reject HB
-------------------------------------------------------------------------------
(0, 0, 0) |  6.55e-01 | 4.17e-03 |  False   |    4      | 1.25e-02 |   False
(0, 0, 1) |  2.61e-02 | 4.17e-03 |  False   |    2      | 6.25e-03 |   False
(0, 0, 2) |  1.01e-06 | 4.17e-03 |  True    |    0      | 4.17e-03 |   True
(0, 1, 0) |  8.23e-01 | 4.17e-03 |  False   |    5      | 2.50e-02 |   False
(0, 1, 1) |  8.66e-03 | 4.17e-03 |  False   |    1      | 5.00e-03 |   False
(0, 1, 2) |  3.49e-01 | 4.17e-03 |  False   |    3      | 8.33e-03 |   False

The first index corresponds to the index of the compared dataset in the
``datasets`` container in the input test (compared to ``dsref``). The next ones
correspond to the shape of the arrays.

The sorting order for Holm-Bonferroni considers all elements as equivalent (not
done per dimension).
'''

import numpy as np
from ..test import Test, TestResult


[docs]class TestResultBonferroni(TestResult):
    '''Test result from Bonferroni correction.

    The test returns ``True`` if the test is successful (null hypothesis was
    accepted) and ``False`` if it is rejected.
    '''

[docs]    def __init__(self, test, first_test_res, rejected_null_hyp):
        '''Initialisation of :class:`~.TestResultBonferroni`

        :param test: the used Bonferroni test
        :type test: :class:`~.TestBonferroni`
        :param first_test_res: the test result used to obtain the p-values
        :type first_test_res: :class:`~valjean.gavroche.test.TestResult`
                              child class
        :param reject_hyp: rejection of the null hypothesis for all bins
        :type reject_hyp: :class:`numpy.ndarray` (:obj:`bool`)
        '''
        super().__init__(test)
        self.first_test_res = first_test_res
        self.rejected_null_hyp = rejected_null_hyp

[docs]    def oracles(self):
        '''Final test (on list of tests, espacially in case of mulitple tests
        on common reference).

        :returns: list(bool)
        '''
        return [not np.any(rnh) for rnh in self.rejected_null_hyp]

    def __bool__(self):
        return not np.any(self.rejected_null_hyp)

    @property
    def nb_rejected(self):
        '''Number of rejected null hypothesis according Holm-Bonferroni test.

        :returns: list(int) or
            :class:`list` (:obj:`numpy.generic` (:obj:`int`))
        '''
        return [np.count_nonzero(rnh) for rnh in self.rejected_null_hyp]

    @property
    def rejected_proportion(self):
        '''Rejected proportion in percentage.

        :returns: :class:`list` (:class:`numpy.generic` (:obj:`float`))
        '''
        return [nr / rnh.size * 100
                for nr, rnh in zip(self.nb_rejected, self.rejected_null_hyp)]


[docs]class TestBonferroni(Test):
    '''Bonferroni correction for multiple tests of the same hypothesis.'''

[docs]    def __init__(self, *, name, description='', labels=None, test, alpha=0.01):
        '''Initialisation of :class:`TestBonferroni`.


        :param str name: local name of the test
        :param str description: specific description of the test
        :param dict labels: labels to be used for test classification in
                            reports (for example category, input file name,
                            type of result, ...)
        :param test: test from which pvalues will be extracted
        :type test: :class:`valjean.gavroche.test.Test` child class
        :param float alpha: required significance level

        The significance level is considered two-sided here, so divide by 2.
        It is the significance level for the whole test.
        '''
        super().__init__(name=name, description=description, labels=labels)
        self.test = test
        self.alpha = np.float_(alpha / 2)
        self.bonf_signi_level = self.alpha / self.ntests

    @property
    def ntests(self):
        '''Returns the number of hypotheses to test, i.e. number of bins here.
        '''
        return self.test.dsref.size

[docs]    @staticmethod
    def bonferroni_correction(pvalues, bonf_signi_level):
        '''Bonferroni correction.

        :param pvalues: p-values from the original test
        :type pvalues: :obj:`numpy.ndarray` (:obj:`float`)
        :param float bonf_signi_level: significance level for each test (α
                                       weighted by the number of tests)
        :returns: :obj:`numpy.ndarray` (:obj:`bool`)

        Compares the p-value to the "Bonferroni significance level":

        * if lower, null hypothesis is rejected for the bin
        * if higher, null hypothesis is accepted for the bin.
        '''
        reject = pvalues <= bonf_signi_level
        return reject

[docs]    def evaluate(self):
        '''Evaluate the Bonferroni correction.

        :returns: :class:`~TestResultBonferroni`
        '''
        test_res = self.test.evaluate()
        null_hyp_reject = [
            self.bonferroni_correction(pval, self.bonf_signi_level)
            for pval in test_res.pvalue]
        return TestResultBonferroni(self, test_res, null_hyp_reject)

[docs]    def data(self):
        '''Generator yielding objects supporting the buffer protocol that (as a
        whole) represent a serialized version of `self`.'''
        yield from super().data()
        yield self.__class__.__name__.encode('utf-8')
        yield from self.test.data()
        yield float(self.alpha).hex().encode('utf-8')


[docs]class TestResultHolmBonferroni(TestResult):
    '''Result from the Holm-Bonferroni method.'''

[docs]    def __init__(self, test, first_test_res, alphas_i, rejected_null_hyp):
        '''Initialisation of :class:`~.TestResultHolmBonferroni`

        :param test: the used Holm-Bonferroni test
        :type test: :class:`~TestHolmBonferroni`
        :param first_test_res: the test result used to obtain the p-values
        :type first_test_res: :class:`~valjean.gavroche.test.TestResult`
                              child class
        :param sorted_indices: indices of the p-values sorted to get p-values
                               in increasing order
        :type sorted_indices: :obj:`numpy.generic` (:obj:`int`)
        :param alphas_i: significance levels for sorted pvalues
        :type alphas_i: :obj:`numpy.ndarray` (:obj:`float`)
        :param rejected_hyp: rejection of the null hypothesis for all bins
        :type rejected_hyp: :obj:`numpy.ndarray` (:obj:`bool`)
        '''
        super().__init__(test)
        self.alphas_i = alphas_i
        self.rejected_null_hyp = rejected_null_hyp
        self.first_test_res = first_test_res

[docs]    def oracles(self):
        '''Final test (on list of tests, espacially in case of mulitple tests
        on common reference).

        :returns: list(bool)
        '''
        return [not np.any(rnh) for rnh in self.rejected_null_hyp]

    def __bool__(self):
        return not np.any(self.rejected_null_hyp)

    @property
    def nb_rejected(self):
        '''Number of rejected null hypothesis according Holm-Bonferroni test.

        :returns: list(int) or
            :class:`list` (:obj:`numpy.generic` (:obj:`int`))
        '''
        return [np.count_nonzero(rnh) for rnh in self.rejected_null_hyp]

    @property
    def rejected_proportion(self):
        '''Rejected proportion in percentage.

        :returns: :class:`list` (:class:`numpy.generic` (:obj:`float`))
        '''
        return [nr / rnh.size * 100
                for nr, rnh in zip(self.nb_rejected, self.rejected_null_hyp)]

    @property
    def sort_ordering(self):
        '''Get the sorted pvalues.

        :returns: :class:`list` (:class:`numpy.generic` (:obj:`int`))
        '''
        sort_inds = [np.argsort(pvals.flatten())
                     for pvals in self.first_test_res.pvalue]
        inv_sort = [np.argsort(si) for si in sort_inds]
        return [iv.reshape(pvals.shape)
                for pvals in self.first_test_res.pvalue
                for iv in inv_sort]


[docs]class TestHolmBonferroni(Test):
    '''Holm-Bonferroni method for multiple tests of the same hypothesis.'''

[docs]    def __init__(self, *, name, description='', labels=None, test, alpha=0.01):
        '''Initialisation of :class:`TestHolmBonferroni`.

        :param str name: local name of the test
        :param str description: specific description of the test
        :param dict labels: labels to be used for test classification in
                            reports (for example category, jdd name, type of
                            result, ...)
        :param test: test from which pvalues will be extracted
        :type test: :class:`valjean.gavroche.test.Test` child class
        :param float alpha: significance level

        The significance level is also considered two-sided here, so divide by
        2. This is the overall significance level.
        '''
        super().__init__(name=name, description=description, labels=labels)
        self.test = test
        self.alpha = alpha / 2

    @property
    def ntests(self):
        '''Returns the number of hypotheses to test, i.e. number of bins here.
        '''
        return self.test.dsref.size

[docs]    @staticmethod
    def holm_bonferroni_method(pvalues, alpha):
        '''Holm-Bonferroni method.

        :param pvalues: array of pvalues
        :type pvalues: :obj:`numpy.ndarray`
        :param float alpha: significance level chosen for the
                                         Holm-Bonferroni method
        :returns: sorted indices, array of the bins significance level, array
                  of rejection of the null hypothesis
        '''
        flat_pvals = pvalues.flatten()
        sorted_inds = np.argsort(flat_pvals)
        alpha_i, rejected_hyp = [], []
        for _i, pval in enumerate(flat_pvals[sorted_inds]):
            denom = flat_pvals.size - (_i+1) + 1
            alpha_i.append(alpha / denom)
            rejected_hyp.append(pval < alpha_i[-1])
        inv_sort = np.argsort(sorted_inds)
        return (np.array(alpha_i)[inv_sort].reshape(pvalues.shape),
                np.array(rejected_hyp)[inv_sort].reshape(pvalues.shape))

[docs]    def evaluate(self):
        '''Evaluation of the used test and of the Holm-Bonferroni method.

        :returns: :class:`~.TestResultHolmBonferroni`
        '''
        test_res = self.test.evaluate()
        hb_res = [self.holm_bonferroni_method(pvals, self.alpha)
                  for pvals in test_res.pvalue]
        return TestResultHolmBonferroni(self, test_res, *zip(*hb_res))

[docs]    def data(self):
        '''Generator yielding objects supporting the buffer protocol that (as a
        whole) represent a serialized version of `self`.'''
        yield from super().data()
        yield self.__class__.__name__.encode('utf-8')
        yield from self.test.data()
        yield float(self.alpha).hex().encode('utf-8')