Skip to content

mstats_tests

Module Information

Module Group

src/stats1

Project Stage ID

42

Purpose

The purpose of this module is to provide the user with the ability to do data sample comparison tests that are available in statsmodels and scipy.stats libraries. The module requires list or parameter value for sample comparison to test statistical hypotheses and is not aimed at dataframe based data

Module Files

Here are the locations of the relevant files associated with the module

Activation Functions

A list of all available activation functions in the module mstats_tests

  •   its_ttest


    data: [list,list] targ:None

    Independent two sample Student's t-test: This test is used to compare the means of two independent samples. It assumes that the data is (normally distributed) and that the (variances of the two groups are equal)

  •   paired_ttest


    data: [list,list] targ:None

    A paired Student's t-test is a statistical test used to determine if there is a significant difference between the means of two related samples. It is used when the data sets are paired or matched in some way, such as when the same group of subjects is measured before and after a treatment or intervention.

  •   os_ttest


    data: list targ:popmean

    A one sample Student's t-test is a statistical test used to determine if there is a significant difference between the mean of a sample and a known or hypothesized population mean. It is used when you have one sample of data and want to compare its mean to a specific value.

  •   utest


    data: [list,list] targ:None

    The Mann-Whitney test, also known as the Wilcoxon rank-sum test, is a nonparametric statistical test used to determine whether there is a significant difference between the distributions of two independent samples. It is often used when the data does not meet the assumptions of parametric tests like the t-test

  •   kstest_twosample


    data: [list,list] targ:None

    The Kolmogorov-Smirnov test is a nonparametric statistical test that determines whether a sample comes from a specific distribution. It compares the empirical cumulative distribution function (ECDF) of the sample to the cumulative distribution function (CDF) of the specified distribution

  •   kstest_onesample_normal


    data: list targ:None

    The Kolmogorov-Smirnov test for a normal distribution is a statistical test that determines whether a sample of data comes from a normal distribution

  •   kstest_onesample_uniform


    data: list targ:None

    The Kolmogorov-Smirnov test for a uniform distribution is a statistical test that determines whether a sample of data comes from a uniform distribution

  •   kstest_onesample_exponential


    data: list targ:None

    The Kolmogorov-Smirnov test for a exponential distribution is a statistical test that determines whether a sample of data comes from a exponential distribution

  •   lilliefors_normal


    data: list targ:None

    The Lilliefors test, also known as the Kolmogorov-Smirnov test for normality, is a statistical test used to determine whether a sample of data comes from a normal distribution. It is similar to the Kolmogorov-Smirnov test, but it is specifically designed for testing against a normal distribution.

  •   shapirowilk_normal


    data: list targ:None

    The Shapiro-Wilk test is another statistical test used to determine whether a sample of data comes from a normal distribution

  •   chi2_test


    data: [list,list] targ:None

    The chi-square test is a statistical test used to determine if there is a significant association between two categorical variables

  •   jarquebera_normal


    data: list targ:None

    The Jarque-Bera test is a statistical test used to determine whether a given dataset follows a normal distribution. It is based on the skewness and kurtosis of the data

  •   two_sample_anova¶


    data: [list,list] targ:None

    The ANOVA (Analysis of Variance) test is used to determine if there are any statistically significant differences between the means of two or more groups

its_ttest

data: [list,list] targ:None

Independent two sample Student's t-test: This test is used to compare the means of two independent samples. It assumes that the data is (normally distributed) and that the (variances of the two groups are equal)

Sample requests

To call the specific activation function, please make sure your base request contains:

Sample Requests

  • two sample independent ttest
  • create a two sample independent ttest
  • students ttest
  • two sample independent ttest
  • independent two sample ttest
  • compare means of two samples using independent ttest

code

The function its_ttest takes in a dictionary containing two data sources, which must be specified by their reference names

# [independent two sample t-test]

# Student's t-test: This test is used to compare the means of (two independent samples) 
# It assumes that the data is (normally distributed) and that the (variances of the 
# two groups are equal)

def its_ttest(self,args:dict):

    statistic, p_value = stats.ttest_ind(args['data'][0], args['data'][1])

    print("T-statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

paired_ttest

data: [list,list] targ:None

A paired Student's t-test is a statistical test used to determine if there is a significant difference between the means of two related samples. It is used when the data sets are paired or matched in some way, such as when the same group of subjects is measured before and after a treatment or intervention.

code:

# [paired t-test]

# This test is used when you have paired or matched observations.
# It is used to determine if there is a significant difference between 
# the means of two related groups or conditions.

def paired_ttest(self,args:dict):

    print('[note] perform a paired two-sample t-test is used to compare the means of (two related groups)!')

    # Perform paired t-test
    statistic, p_value = stats.ttest_rel(args['data'][0], args['data'][1])

    print("T-statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

os_ttest

data: list targ:popmean

A one sample Student's t-test is a statistical test used to determine if there is a significant difference between the mean of a sample and a known or hypothesized population mean. It is used when you have one sample of data and want to compare its mean to a specific value.

code:

# [one sample t-test]

# This test is used when you want to compare the mean of a single group to a known population mean or a specific value.

def os_ttest(self,args:dict):

    if(args['popmean'] != None):

        # Perform one-sample t-test
        t_statistic, p_value = stats.ttest_1samp(args['data'], popmean=args['popmean'])

        print("t-statistic:", statistic)
        print("P-value:", p_value)

        # Compare p-value with alpha
        if p_value <= 0.05:
            print("Reject the null hypothesis")
        else:
            print("Fail to reject the null hypothesis")

    else:

        print('[note] please specify the population mean using popmean')

utest

data: [list,list] targ:None

The Mann-Whitney test, also known as the Wilcoxon rank-sum test, is a nonparametric statistical test used to determine whether there is a significant difference between the distributions of two independent samples. It is often used when the data does not meet the assumptions of parametric tests like the t-test

code:

# determine if there is a significant difference between the distributions

# A : [u-test]

# The [Mann-Whitney test], also known as the [Wilcoxon rank-sum test], 
# is a nonparametric statistical test used to determine whether there 
# is a significant difference between the distributions of two independent samples. 
# It is often used when the data does not meet the assumptions of parametric tests 
# like the t-test.

def utest(self,args:dict):

    # Perform Mann-Whitney U-test
    statistic, p_value = stats.mannwhitneyu(args['data'][0], args['data'][1])

    print("U-statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

kstest_twosample

data: [list,list] targ:None

The Kolmogorov-Smirnov test is a nonparametric statistical test that determines whether a sample comes from a specific distribution. It compares the empirical cumulative distribution function (ECDF) of the sample to the cumulative distribution function (CDF) of the specified distribution

code:

# [GENERAL] Kolmogorov Smirnov Test Two Sample Test for distribution

def kstest_twosample(self,args:dict):

    # Perform the KS test
    statistic, p_value = kstest(args['data'][0], args['data'][1])

    print('[KS] test two samples from sample distribution')
    print("KS statistic:", statistic)
    print("P-value:", p_value)

kstest_onesample_normal

data: list targ:None

The Kolmogorov-Smirnov test for a normal distribution is a statistical test that determines whether a sample of data comes from a normal distribution

code:

# Perform Kolmogorov-Smirnov test for [normal] distribution

def kstest_onesample_normal(self,args:dict):

    statistic, p_value = kstest(args['data'], 'norm')

    print('[KS] test sample from (normal) distribution')
    print("KS statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

kstest_onesample_uniform

data: list targ:None

The Kolmogorov-Smirnov test for a uniform distribution is a statistical test that determines whether a sample of data comes from a uniform distribution

code:

# Perform Kolmogorov-Smirnov test for [Uniform] distribution

def kstest_onesample_uniform(self,args:dict):

    statistic, p_value = kstest(args['data'], 'uniform')

    print('[KS] test sample from (uniform) distribution')
    print("KS statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

kstest_onesample_exponential

data: list targ:None

The Kolmogorov-Smirnov test for a exponential distribution is a statistical test that determines whether a sample of data comes from a exponential distribution

code:

# Perform Kolmogorov-Smirnov test for [Exponential] distribution

def kstest_onesample_exponential(self,args:dict):

    statistic, p_value = kstest(args['data'], 'expon')

    print('[KS] test sample from (exponential) distribution')
    print("KS statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis")

lilliefors_normal

data: list targ:None

The Lilliefors test, also known as the Kolmogorov-Smirnov test for normality, is a statistical test used to determine whether a sample of data comes from a normal distribution. It is similar to the Kolmogorov-Smirnov test, but it is specifically designed for testing against a normal distribution.

code:

# Lilliefors Test to check if distribution is normal distribution

def lilliefors_normal(self,args:dict):

    # Perform the Lilliefors test
    statistic, p_value = lilliefors(args['data'])

    print("Lilliefors test statistic:", statistic)
    print("Lilliefors p-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis") 

shapirowilk_normal

data: list targ:None

The Shapiro-Wilk test is another statistical test used to determine whether a sample of data comes from a normal distribution

code:

# Shapiro-Wilk Test to check if distribution is normal

def shapirowilk_normal(self,args:dict):

    # Perform Shapiro-Wilk test
    statistic, p_value = shapiro(args['data'])

    # Print the test statistic and p-value
    print("Test Statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis") 

chi2_test

data: [list,list] targ:None

The chi-square test is a statistical test used to determine if there is a significant association between two categorical variables

code:

# [Chi2 statistical test]

# Calculate a one-way chi-square test
# The chi-square test is a statistical test used to determine 
# if there is a significant association between two categorical variables.

# chi-square statistic measures how much the observed frequencies deviate 
# from the expected frequencies. A higher value indicates a greater discrepancy.

def chi2_test(self,args:dict):

    # perform the chi-squared test
    statistic, p_value = chisquare(args['data'][0], f_exp=args['data'][1])

    print("Chi-squared statistic:", statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis") 

jarquebera_normal

data: list targ:None

The Jarque-Bera test is a statistical test used to determine whether a given dataset follows a normal distribution. It is based on the skewness and kurtosis of the data

code:

# [ Jarque-Bera test ]

# The Jarque-Bera test is a statistical test used to determine whether 
# a given dataset follows a normal distribution. It is based on the 
# skewness and kurtosis of the data. 

def jarquebera_normal(self,args:dict):

    # Perform the Jarque-Bera test
    statistic, p_value = stats.jarque_bera(args['data'])

    print('Statistic:', statistic)
    print("P-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis") 

two_sample_anova

data: [list,list] targ:None

The ANOVA (Analysis of Variance) test is used to determine if there are any statistically significant differences between the means of two or more groups

code:

# [ ANOVA test ] (limited to two samples)

# ANOVA (Analysis of Variance) test is used to determine if there are any statistically significant differences between the (means) of two or more groups

def two_sample_anova(self,args:dict):

    # Perform one-way ANOVA test
    statistic, p_value = stats.f_oneway(args['data'][0], args['data'][1])

    # Print the results
    print("Statistic:", statistic)
    print("p-value:", p_value)

    # Compare p-value with alpha (0.05)
    if p_value <= 0.05:
        print("Reject the null hypothesis")
    else:
        print("Fail to reject the null hypothesis") 

  1. Reference to the sub folder in src 

  2. Reference to the machine learning project phase identification defined here