mstats_tests
Module Information¶
Module Group¶
src/stats1
Project Stage ID¶
42
Purpose¶
The purpose of this module is to provide the user with the ability to do data sample comparison tests that are available in statsmodels and scipy.stats libraries. The module requires list or parameter value for sample comparison to test statistical hypotheses and is not aimed at dataframe based data
Module Files¶
Here are the locations of the relevant files associated with the module
Activation Functions¶
A list of all available activation functions in the module mstats_tests
-
data: [
list
,list
] targ:None
Independent two sample Student's t-test: This test is used to compare the means of two independent samples. It assumes that the data is (normally distributed) and that the (variances of the two groups are equal)
-
data: [
list
,list
] targ:None
A paired Student's t-test is a statistical test used to determine if there is a significant difference between the means of two related samples. It is used when the data sets are paired or matched in some way, such as when the same group of subjects is measured before and after a treatment or intervention.
-
data:
list
targ:popmean
A one sample Student's t-test is a statistical test used to determine if there is a significant difference between the mean of a sample and a known or hypothesized population mean. It is used when you have one sample of data and want to compare its mean to a specific value.
-
data: [
list
,list
] targ:None
The Mann-Whitney test, also known as the Wilcoxon rank-sum test, is a nonparametric statistical test used to determine whether there is a significant difference between the distributions of two independent samples. It is often used when the data does not meet the assumptions of parametric tests like the t-test
-
data: [
list
,list
] targ:None
The Kolmogorov-Smirnov test is a nonparametric statistical test that determines whether a sample comes from a specific distribution. It compares the empirical cumulative distribution function (ECDF) of the sample to the cumulative distribution function (CDF) of the specified distribution
-
data:
list
targ:None
The Kolmogorov-Smirnov test for a normal distribution is a statistical test that determines whether a sample of data comes from a normal distribution
-
data:
list
targ:None
The Kolmogorov-Smirnov test for a uniform distribution is a statistical test that determines whether a sample of data comes from a uniform distribution
-
data:
list
targ:None
The Kolmogorov-Smirnov test for a exponential distribution is a statistical test that determines whether a sample of data comes from a exponential distribution
-
data:
list
targ:None
The Lilliefors test, also known as the Kolmogorov-Smirnov test for normality, is a statistical test used to determine whether a sample of data comes from a normal distribution. It is similar to the Kolmogorov-Smirnov test, but it is specifically designed for testing against a normal distribution.
-
data:
list
targ:None
The Shapiro-Wilk test is another statistical test used to determine whether a sample of data comes from a normal distribution
-
data: [
list
,list
] targ:None
The chi-square test is a statistical test used to determine if there is a significant association between two categorical variables
-
data:
list
targ:None
The Jarque-Bera test is a statistical test used to determine whether a given dataset follows a normal distribution. It is based on the skewness and kurtosis of the data
-
data: [
list
,list
] targ:None
The ANOVA (Analysis of Variance) test is used to determine if there are any statistically significant differences between the means of two or more groups
its_ttest
data: [list
,list
] targ:None
Independent two sample Student's t-test: This test is used to compare the means of two independent samples. It assumes that the data is (normally distributed) and that the (variances of the two groups are equal)
Sample requests
To call the specific activation function, please make sure your base request contains:
Sample Requests
- two sample independent ttest
- create a two sample independent ttest
- students ttest
- two sample independent ttest
- independent two sample ttest
- compare means of two samples using independent ttest
code
The function its_ttest takes in a dictionary containing two data sources, which must be specified by their reference names
# [independent two sample t-test]
# Student's t-test: This test is used to compare the means of (two independent samples)
# It assumes that the data is (normally distributed) and that the (variances of the
# two groups are equal)
def its_ttest(self,args:dict):
statistic, p_value = stats.ttest_ind(args['data'][0], args['data'][1])
print("T-statistic:", statistic)
print("P-value:", p_value)
# Compare p-value with alpha
if p_value <= 0.05:
print("Reject the null hypothesis")
else:
print("Fail to reject the null hypothesis")
paired_ttest
data: [list
,list
] targ:None
A paired Student's t-test is a statistical test used to determine if there is a significant difference between the means of two related samples. It is used when the data sets are paired or matched in some way, such as when the same group of subjects is measured before and after a treatment or intervention.
code:
# [paired t-test]
# This test is used when you have paired or matched observations.
# It is used to determine if there is a significant difference between
# the means of two related groups or conditions.
def paired_ttest(self,args:dict):
print('[note] perform a paired two-sample t-test is used to compare the means of (two related groups)!')
# Perform paired t-test
statistic, p_value = stats.ttest_rel(args['data'][0], args['data'][1])
print("T-statistic:", statistic)
print("P-value:", p_value)
# Compare p-value with alpha
if p_value <= 0.05:
print("Reject the null hypothesis")
else:
print("Fail to reject the null hypothesis")
os_ttest
data: list
targ:popmean
A one sample Student's t-test is a statistical test used to determine if there is a significant difference between the mean of a sample and a known or hypothesized population mean. It is used when you have one sample of data and want to compare its mean to a specific value.
code:
# [one sample t-test]
# This test is used when you want to compare the mean of a single group to a known population mean or a specific value.
def os_ttest(self,args:dict):
if(args['popmean'] != None):
# Perform one-sample t-test
t_statistic, p_value = stats.ttest_1samp(args['data'], popmean=args['popmean'])
print("t-statistic:", statistic)
print("P-value:", p_value)
# Compare p-value with alpha
if p_value <= 0.05:
print("Reject the null hypothesis")
else:
print("Fail to reject the null hypothesis")
else:
print('[note] please specify the population mean using popmean')
utest
data: [list
,list
] targ:None
The Mann-Whitney test, also known as the Wilcoxon rank-sum test, is a nonparametric statistical test used to determine whether there is a significant difference between the distributions of two independent samples. It is often used when the data does not meet the assumptions of parametric tests like the t-test
code:
# determine if there is a significant difference between the distributions
# A : [u-test]
# The [Mann-Whitney test], also known as the [Wilcoxon rank-sum test],
# is a nonparametric statistical test used to determine whether there
# is a significant difference between the distributions of two independent samples.
# It is often used when the data does not meet the assumptions of parametric tests
# like the t-test.
def utest(self,args:dict):
# Perform Mann-Whitney U-test
statistic, p_value = stats.mannwhitneyu(args['data'][0], args['data'][1])
print("U-statistic:", statistic)
print("P-value:", p_value)
# Compare p-value with alpha
if p_value <= 0.05:
print("Reject the null hypothesis")
else:
print("Fail to reject the null hypothesis")
kstest_twosample
data: [list
,list
] targ:None
The Kolmogorov-Smirnov test is a nonparametric statistical test that determines whether a sample comes from a specific distribution. It compares the empirical cumulative distribution function (ECDF) of the sample to the cumulative distribution function (CDF) of the specified distribution
code:
# [GENERAL] Kolmogorov Smirnov Test Two Sample Test for distribution
def kstest_twosample(self,args:dict):
# Perform the KS test
statistic, p_value = kstest(args['data'][0], args['data'][1])
print('[KS] test two samples from sample distribution')
print("KS statistic:", statistic)
print("P-value:", p_value)
kstest_onesample_normal
data: list
targ:None
The Kolmogorov-Smirnov test for a normal distribution is a statistical test that determines whether a sample of data comes from a normal distribution
code:
# Perform Kolmogorov-Smirnov test for [normal] distribution
def kstest_onesample_normal(self,args:dict):
statistic, p_value = kstest(args['data'], 'norm')
print('[KS] test sample from (normal) distribution')
print("KS statistic:", statistic)
print("P-value:", p_value)
# Compare p-value with alpha (0.05)
if p_value <= 0.05:
print("Reject the null hypothesis")
else:
print("Fail to reject the null hypothesis")
kstest_onesample_uniform
data: list
targ:None
The Kolmogorov-Smirnov test for a uniform distribution is a statistical test that determines whether a sample of data comes from a uniform distribution
code:
# Perform Kolmogorov-Smirnov test for [Uniform] distribution
def kstest_onesample_uniform(self,args:dict):
statistic, p_value = kstest(args['data'], 'uniform')
print('[KS] test sample from (uniform) distribution')
print("KS statistic:", statistic)
print("P-value:", p_value)
# Compare p-value with alpha (0.05)
if p_value <= 0.05:
print("Reject the null hypothesis")
else:
print("Fail to reject the null hypothesis")
kstest_onesample_exponential
data: list
targ:None
The Kolmogorov-Smirnov test for a exponential distribution is a statistical test that determines whether a sample of data comes from a exponential distribution
code:
# Perform Kolmogorov-Smirnov test for [Exponential] distribution
def kstest_onesample_exponential(self,args:dict):
statistic, p_value = kstest(args['data'], 'expon')
print('[KS] test sample from (exponential) distribution')
print("KS statistic:", statistic)
print("P-value:", p_value)
# Compare p-value with alpha (0.05)
if p_value <= 0.05:
print("Reject the null hypothesis")
else:
print("Fail to reject the null hypothesis")
lilliefors_normal
data: list
targ:None
The Lilliefors test, also known as the Kolmogorov-Smirnov test for normality, is a statistical test used to determine whether a sample of data comes from a normal distribution. It is similar to the Kolmogorov-Smirnov test, but it is specifically designed for testing against a normal distribution.
code:
# Lilliefors Test to check if distribution is normal distribution
def lilliefors_normal(self,args:dict):
# Perform the Lilliefors test
statistic, p_value = lilliefors(args['data'])
print("Lilliefors test statistic:", statistic)
print("Lilliefors p-value:", p_value)
# Compare p-value with alpha (0.05)
if p_value <= 0.05:
print("Reject the null hypothesis")
else:
print("Fail to reject the null hypothesis")
shapirowilk_normal
data: list
targ:None
The Shapiro-Wilk test is another statistical test used to determine whether a sample of data comes from a normal distribution
code:
# Shapiro-Wilk Test to check if distribution is normal
def shapirowilk_normal(self,args:dict):
# Perform Shapiro-Wilk test
statistic, p_value = shapiro(args['data'])
# Print the test statistic and p-value
print("Test Statistic:", statistic)
print("P-value:", p_value)
# Compare p-value with alpha (0.05)
if p_value <= 0.05:
print("Reject the null hypothesis")
else:
print("Fail to reject the null hypothesis")
chi2_test
data: [list
,list
] targ:None
The chi-square test is a statistical test used to determine if there is a significant association between two categorical variables
code:
# [Chi2 statistical test]
# Calculate a one-way chi-square test
# The chi-square test is a statistical test used to determine
# if there is a significant association between two categorical variables.
# chi-square statistic measures how much the observed frequencies deviate
# from the expected frequencies. A higher value indicates a greater discrepancy.
def chi2_test(self,args:dict):
# perform the chi-squared test
statistic, p_value = chisquare(args['data'][0], f_exp=args['data'][1])
print("Chi-squared statistic:", statistic)
print("P-value:", p_value)
# Compare p-value with alpha (0.05)
if p_value <= 0.05:
print("Reject the null hypothesis")
else:
print("Fail to reject the null hypothesis")
jarquebera_normal
data: list
targ:None
The Jarque-Bera test is a statistical test used to determine whether a given dataset follows a normal distribution. It is based on the skewness and kurtosis of the data
code:
# [ Jarque-Bera test ]
# The Jarque-Bera test is a statistical test used to determine whether
# a given dataset follows a normal distribution. It is based on the
# skewness and kurtosis of the data.
def jarquebera_normal(self,args:dict):
# Perform the Jarque-Bera test
statistic, p_value = stats.jarque_bera(args['data'])
print('Statistic:', statistic)
print("P-value:", p_value)
# Compare p-value with alpha (0.05)
if p_value <= 0.05:
print("Reject the null hypothesis")
else:
print("Fail to reject the null hypothesis")
two_sample_anova
data: [list
,list
] targ:None
The ANOVA (Analysis of Variance) test is used to determine if there are any statistically significant differences between the means of two or more groups