The nipy.algorithms.statistics.empirical_pvalue module contains a class that fits a Gaussian model to the central part of an histogram, following Schwartzman et al, 2009. This is typically necessary to estimate a FDR when one is not certain that the data behaves as a standard normal under H_0.
The NormalEmpiricalNull class learns its null distribution on the data provided at initialisation. Two different methods can be used to set a threshold from the null distribution: the NormalEmpiricalNull.threshold() method returns the threshold for a given false discovery rate, and thus accounts for multiple comparisons with the given dataset; the NormalEmpiricalNull.uncorrected_threshold() returns the threshold for a given uncorrected p-value, and as such does not account for multiple comparisons.
If we use the empirical normal null estimator on a two Gaussian mixture distribution, with a central Gaussian, and a wide one, it uses the central distribution as a null hypothesis, and returns the threshold following which the data can be claimed to belong to the wide Gaussian:
# emacs: -*- mode: python; py-indent-offset: 4; indent-tabs-mode: nil -*-
# vi: set ft=python sts=4 ts=4 sw=4 et:
import numpy as np
from nipy.algorithms.statistics.empirical_pvalue import NormalEmpiricalNull
x = np.c_[np.random.normal(size=1e4),
np.random.normal(scale=4, size=1e4)]
enn = NormalEmpiricalNull(x)
enn.threshold(verbose=True)
Exception occurred rendering plot.
The threshold evaluated with the NormalEmpiricalNull.threshold() method is around 2.8 (using the default p-value of 0.05). The NormalEmpiricalNull.uncorrected_threshold() returns, for the same p-value, a threshold of 1.9. It is necessary to use a higher p-value with uncorrected comparisons.