mstats_tests

Module Information¶

Module Group¶

src/stats¹

Project Stage ID¶

4²

Purpose¶

The purpose of this module is to provide the user with the ability to do data sample comparison tests that are available in statsmodels and scipy.stats libraries. The module requires list or parameter value for sample comparison to test statistical hypotheses and is not aimed at dataframe based data

Module Files¶

Here are the locations of the relevant files associated with the module

module information
module activation functions

Activation Functions¶

A list of all available activation functions in the module mstats_tests

its_ttest

data: [list,list] targ:None

Independent two sample Student's t-test: This test is used to compare the means of two independent samples. It assumes that the data is (normally distributed) and that the (variances of the two groups are equal)
paired_ttest

data: [list,list] targ:None

A paired Student's t-test is a statistical test used to determine if there is a significant difference between the means of two related samples. It is used when the data sets are paired or matched in some way, such as when the same group of subjects is measured before and after a treatment or intervention.
os_ttest

data: list targ:popmean

A one sample Student's t-test is a statistical test used to determine if there is a significant difference between the mean of a sample and a known or hypothesized population mean. It is used when you have one sample of data and want to compare its mean to a specific value.
utest

data: [list,list] targ:None

The Mann-Whitney test, also known as the Wilcoxon rank-sum test, is a nonparametric statistical test used to determine whether there is a significant difference between the distributions of two independent samples. It is often used when the data does not meet the assumptions of parametric tests like the t-test
kstest_twosample

data: [list,list] targ:None

The Kolmogorov-Smirnov test is a nonparametric statistical test that determines whether a sample comes from a specific distribution. It compares the empirical cumulative distribution function (ECDF) of the sample to the cumulative distribution function (CDF) of the specified distribution
kstest_onesample_normal

data: list targ:None

The Kolmogorov-Smirnov test for a normal distribution is a statistical test that determines whether a sample of data comes from a normal distribution
kstest_onesample_uniform

data: list targ:None

The Kolmogorov-Smirnov test for a uniform distribution is a statistical test that determines whether a sample of data comes from a uniform distribution
kstest_onesample_exponential

data: list targ:None

The Kolmogorov-Smirnov test for a exponential distribution is a statistical test that determines whether a sample of data comes from a exponential distribution
lilliefors_normal

data: list targ:None

The Lilliefors test, also known as the Kolmogorov-Smirnov test for normality, is a statistical test used to determine whether a sample of data comes from a normal distribution. It is similar to the Kolmogorov-Smirnov test, but it is specifically designed for testing against a normal distribution.
shapirowilk_normal

data: list targ:None

The Shapiro-Wilk test is another statistical test used to determine whether a sample of data comes from a normal distribution
chi2_test

data: [list,list] targ:None

The chi-square test is a statistical test used to determine if there is a significant association between two categorical variables
jarquebera_normal

data: list targ:None

The Jarque-Bera test is a statistical test used to determine whether a given dataset follows a normal distribution. It is based on the skewness and kurtosis of the data
two_sample_anova¶

data: [list,list] targ:None

The ANOVA (Analysis of Variance) test is used to determine if there are any statistically significant differences between the means of two or more groups

its_ttest

data: [`list`,`list`] targ:`None`

Independent two sample Student's t-test: This test is used to compare the means of two independent samples. It assumes that the data is (normally distributed) and that the (variances of the two groups are equal)

Sample requests

To call the specific activation function, please make sure your base request contains:

Sample Requests

two sample independent ttest
create a two sample independent ttest
students ttest
two sample independent ttest
independent two sample ttest
compare means of two samples using independent ttest

code

The function its_ttest takes in a dictionary containing two data sources, which must be specified by their reference names

# [independent two sample t-test]

# Student's t-test: This test is used to compare the means of (two independent samples) 
# It assumes that the data is (normally distributed) and that the (variances of the 
# two groups are equal)

def its_ttest(self,args:dict):

    statistic, p_value = stats.ttest_ind(args['data'][0], args['data'][1])

    print("T-statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

paired_ttest

data: [`list`,`list`] targ:`None`

A paired Student's t-test is a statistical test used to determine if there is a significant difference between the means of two related samples. It is used when the data sets are paired or matched in some way, such as when the same group of subjects is measured before and after a treatment or intervention.

code:

# [paired t-test]

# This test is used when you have paired or matched observations.
# It is used to determine if there is a significant difference between 
# the means of two related groups or conditions.

def paired_ttest(self,args:dict):

    print('[note] perform a paired two-sample t-test is used to compare the means of (two related groups)!')

    # Perform paired t-test
    statistic, p_value = stats.ttest_rel(args['data'][0], args['data'][1])

    print("T-statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

os_ttest

data: `list` targ:`popmean`

A one sample Student's t-test is a statistical test used to determine if there is a significant difference between the mean of a sample and a known or hypothesized population mean. It is used when you have one sample of data and want to compare its mean to a specific value.

code:

# [one sample t-test]

# This test is used when you want to compare the mean of a single group to a known population mean or a specific value.

def os_ttest(self,args:dict):

    if(args['popmean'] != None):

        # Perform one-sample t-test
        t_statistic, p_value = stats.ttest_1samp(args['data'], popmean=args['popmean'])

        print("t-statistic:", statistic)
        print("P-value:", p_value)

        # Compare p-value with alpha
        if p_value <= 0.05:
            print("Reject the null hypothesis")
        else:
            print("Fail to reject the null hypothesis")

    else:

        print('[note] please specify the population mean using popmean')

utest

data: [`list`,`list`] targ:`None`

The Mann-Whitney test, also known as the Wilcoxon rank-sum test, is a nonparametric statistical test used to determine whether there is a significant difference between the distributions of two independent samples. It is often used when the data does not meet the assumptions of parametric tests like the t-test

code:

# determine if there is a significant difference between the distributions

# A : [u-test]

# The [Mann-Whitney test], also known as the [Wilcoxon rank-sum test], 
# is a nonparametric statistical test used to determine whether there 
# is a significant difference between the distributions of two independent samples. 
# It is often used when the data does not meet the assumptions of parametric tests 
# like the t-test.

def utest(self,args:dict):

    # Perform Mann-Whitney U-test
    statistic, p_value = stats.mannwhitneyu(args['data'][0], args['data'][1])

    print("U-statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

kstest_twosample

data: [`list`,`list`] targ:`None`

The Kolmogorov-Smirnov test is a nonparametric statistical test that determines whether a sample comes from a specific distribution. It compares the empirical cumulative distribution function (ECDF) of the sample to the cumulative distribution function (CDF) of the specified distribution

code:

# [GENERAL] Kolmogorov Smirnov Test Two Sample Test for distribution

def kstest_twosample(self,args:dict):

    # Perform the KS test
    statistic, p_value = kstest(args['data'][0], args['data'][1])

    print('[KS] test two samples from sample distribution')
    print("KS statistic:", statistic)
    print("P-value:", p_value)

kstest_onesample_normal

data: `list` targ:`None`

The Kolmogorov-Smirnov test for a normal distribution is a statistical test that determines whether a sample of data comes from a normal distribution

code:

# Perform Kolmogorov-Smirnov test for [normal] distribution

def kstest_onesample_normal(self,args:dict):

    statistic, p_value = kstest(args['data'], 'norm')

    print('[KS] test sample from (normal) distribution')
    print("KS statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

kstest_onesample_uniform

data: `list` targ:`None`

The Kolmogorov-Smirnov test for a uniform distribution is a statistical test that determines whether a sample of data comes from a uniform distribution

code:

# Perform Kolmogorov-Smirnov test for [Uniform] distribution

def kstest_onesample_uniform(self,args:dict):

    statistic, p_value = kstest(args['data'], 'uniform')

    print('[KS] test sample from (uniform) distribution')
    print("KS statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

kstest_onesample_exponential

data: `list` targ:`None`

The Kolmogorov-Smirnov test for a exponential distribution is a statistical test that determines whether a sample of data comes from a exponential distribution

code:

# Perform Kolmogorov-Smirnov test for [Exponential] distribution

def kstest_onesample_exponential(self,args:dict):

    statistic, p_value = kstest(args['data'], 'expon')

    print('[KS] test sample from (exponential) distribution')
    print("KS statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

lilliefors_normal

data: `list` targ:`None`

The Lilliefors test, also known as the Kolmogorov-Smirnov test for normality, is a statistical test used to determine whether a sample of data comes from a normal distribution. It is similar to the Kolmogorov-Smirnov test, but it is specifically designed for testing against a normal distribution.

code:

# Lilliefors Test to check if distribution is normal distribution

def lilliefors_normal(self,args:dict):

    # Perform the Lilliefors test
    statistic, p_value = lilliefors(args['data'])

    print("Lilliefors test statistic:", statistic)
    print("Lilliefors p-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

shapirowilk_normal

data: `list` targ:`None`

The Shapiro-Wilk test is another statistical test used to determine whether a sample of data comes from a normal distribution

code:

# Shapiro-Wilk Test to check if distribution is normal

def shapirowilk_normal(self,args:dict):

    # Perform Shapiro-Wilk test
    statistic, p_value = shapiro(args['data'])

    # Print the test statistic and p-value
    print("Test Statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

chi2_test

data: [`list`,`list`] targ:`None`

The chi-square test is a statistical test used to determine if there is a significant association between two categorical variables

code:

# [Chi2 statistical test]

# Calculate a one-way chi-square test
# The chi-square test is a statistical test used to determine 
# if there is a significant association between two categorical variables.

# chi-square statistic measures how much the observed frequencies deviate 
# from the expected frequencies. A higher value indicates a greater discrepancy.

def chi2_test(self,args:dict):

    # perform the chi-squared test
    statistic, p_value = chisquare(args['data'][0], f_exp=args['data'][1])

    print("Chi-squared statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

jarquebera_normal

data: `list` targ:`None`

The Jarque-Bera test is a statistical test used to determine whether a given dataset follows a normal distribution. It is based on the skewness and kurtosis of the data

code:

# [ Jarque-Bera test ]

# The Jarque-Bera test is a statistical test used to determine whether 
# a given dataset follows a normal distribution. It is based on the 
# skewness and kurtosis of the data. 

def jarquebera_normal(self,args:dict):

    # Perform the Jarque-Bera test
    statistic, p_value = stats.jarque_bera(args['data'])

    print('Statistic:', statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

two_sample_anova

data: [`list`,`list`] targ:`None`

The ANOVA (Analysis of Variance) test is used to determine if there are any statistically significant differences between the means of two or more groups

code:

# [ ANOVA test ] (limited to two samples)

# ANOVA (Analysis of Variance) test is used to determine if there are any statistically significant differences between the (means) of two or more groups

def two_sample_anova(self,args:dict):

    # Perform one-way ANOVA test
    statistic, p_value = stats.f_oneway(args['data'][0], args['data'][1])

    # Print the results
    print("Statistic:", statistic)
    print("p-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis") 

Reference to the sub folder in src ↩
Reference to the machine learning project phase identification defined here ↩

mstats_tests

Module Information¶

Module Group¶

Project Stage ID¶

Purpose¶

Module Files¶

Activation Functions¶

its_ttest

data: [list,list] targ:None

Sample requests

code

paired_ttest

data: [list,list] targ:None

code:

os_ttest

data: list targ:popmean

code:

utest

data: [list,list] targ:None

code:

kstest_twosample

data: [list,list] targ:None

code:

kstest_onesample_normal

data: list targ:None

code:

kstest_onesample_uniform

data: list targ:None

code:

kstest_onesample_exponential

data: list targ:None

code:

lilliefors_normal

data: list targ:None

code:

shapirowilk_normal

data: list targ:None

code:

chi2_test

data: [list,list] targ:None

code:

jarquebera_normal

data: list targ:None

code:

two_sample_anova

data: [list,list] targ:None

code:

data: [`list`,`list`] targ:`None`

data: [`list`,`list`] targ:`None`

data: `list` targ:`popmean`

data: [`list`,`list`] targ:`None`

data: [`list`,`list`] targ:`None`

data: `list` targ:`None`

data: `list` targ:`None`

data: `list` targ:`None`

data: `list` targ:`None`

data: `list` targ:`None`

data: [`list`,`list`] targ:`None`

data: `list` targ:`None`

data: [`list`,`list`] targ:`None`