# ttest()¶

Returns data tables as Pandas DataFrames with relevant information pertaining to the statistical test conducted. Returns 2 DataFrames so all information can easily be exported, except for Wilcoxon ranked-sign test- only 1 DataFrame is returned.

DataFrame 1 (all except Wilcoxon ranked-sign test) has summary statistic information including variable name, total number of non-missing observations, standard deviation, standard error, and the 95% confidence interval. This is the same information returned from the summary_cont() method.

DataFrame 2 (all except Wilcoxon ranked-sign test) has the test results for the statistical tests. Included in this is an effect size measures of r, Cohen’s d, Hedge’s g, and Glass’s $$\Delta$$ for the independent sample t-test, paired sample t-test, and Welch’s t-test.

For the Wilcoxon ranked-sign test, the returned DataFrame contains the mean for both comparison points, the T-value, the Z-value, the two-sided p-value, and effect size measure r.

This method can perform the following tests:
• Independent sample t-test [],

• Paired sample t-test [],

• Welch’s t-test [], and

• Wilcoxon ranked-sign test []

## Arguments¶

ttest(group1, group2, group1_name= None, group2_name= None, equal_variances= True, paired= False, correction= None)

• group1 and group2, requires the data to be a Pandas Series

• group1_name and group2_name, will override the series name

• equal_variances, tells whether equal variances is assumed or not.

If not, Welch’s t-test is used if data is unpaired, or Wilcoxon rank-signed test is used if data is paired. The default is True.

• paired, tells whether the data is paired. If data is paired and equal

variance is assumed, a paired sample t-test is conducted, otherwise a Wilcoxon ranked-sign test is conducted. The default is False.

returns

• 2 Pandas DataFrames as a tuple;
• First returned DataFrame is the summary statistics

• Second returned DataFrame is the test results.

• Except for Wilcoxon ranked-sign test, only 1 DataFrame is returned

Note

Wilcoxon ranked-sign test: a 0 difference between the 2 groups is discarded from the calculation. This is the ‘wilcox’ method apart of scipy.stats.wilcoxon

### Effect size measures formulas¶

#### Cohen’s ds (between subjects design)¶

Cohen’s ds [] for a between groups design is calculated with the following equation:

$d_s = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{(n_1 - 1)SD^2_1 + (n_2 - 1)SD^2_2}{n_1 + n_2 - 2}}}$

#### Hedges’s gs (between subjects design)¶

Cohen’s ds gives a biased estimate of the effect size for a population and Hedges and Olkin [] provides an unbiased estimation. The differences between Hedges’s g and Cohen’s d is negligible when sample sizes are above 20, but it is still preferable to report Hedges’s g []. Hedge’s gs is calculated using the following formula:

$\text{Hedges's g}_s = \text{Cohen's d}_s \times (1 - \frac{3}{4(n_1 + n_2 - 9)})$

#### Glass’s $$\Delta$$ (between or within subjects design)¶

Glass’s $$\Delta$$ is the mean differences between the two groups divided by the standard deviation of the control group. When used in a within subjects design, it is recommended to use the pre- standard deviation in the denominator []; the following formula is used to calculate Glass’s $$\Delta$$:

$\Delta = \frac{(\bar{x}_1 - \bar{x}_2)}{SD_1}$

#### Cohen’s dz (within subject design)¶

Another version of Cohen’s d is used in within subject designs. This is noted by the subscript “z”. The formula for Cohen’s dz [] is as follows:

$d_z = \frac{M_{diff}}{\sqrt{\frac{\sum (X_{diff} - M_{diff})^2}{N - 1}}}$

#### Pearson correlation coefficient r (between or within subjects design)¶

Rosenthal [] provided the following formula to calculate the Pearson correlation coefficient r using the t-value and degrees of freedom:

$r = \sqrt{\frac{t^2}{t^2 + df}}$

Rosenthal [] provided the following formula to calculate the Pearson correlation coefficient r using the z-value and N. This formula is used to calculate the r coefficient for the Wilcoxon ranked-sign test.

$r = \sqrt{\frac{Z}{\sqrt{N}}}$

## Examples¶

import numpy, pandas, researchpy

numpy.random.seed(12345678)

df = pandas.DataFrame(numpy.random.randint(10, size= (100, 2)),
columns= ['healthy', 'non-healthy'])

# Independent t-test

# If you don't store the 2 returned DataFrames, it outputs as a tuple and
# is displayed
researchpy.ttest(df['healthy'], df['non-healthy'])

(      Variable      N   Mean        SD        SE  95% Conf.  Interval
0      healthy  100.0  4.590  2.749086  0.274909   4.044522  5.135478
1  non-healthy  100.0  4.160  3.132495  0.313250   3.538445  4.781555
2     combined  200.0  4.375  2.947510  0.208420   3.964004  4.785996,
Independent t-test   results
0             Difference (healthy - non-healthy) =     0.4300
1                             Degrees of freedom =   198.0000
2                                              t =     1.0317
3                          Two side test p value =     0.3035
4                         Difference < 0 p value =     0.8483
5                         Difference > 0 p value =     0.1517
6                                      Cohen's d =     0.1459
7                                      Hedge's g =     0.1454
8                                  Glass's delta =     0.1564
9                                              r =     0.0731)

# Otherwise you can store them as objects
des, res = researchpy.ttest(df['healthy'], df['non-healthy'])

des

Variable N Mean SD SE 95% Conf. Interval
0 healthy 100.0 4.590 2.749086 0.274909 4.044522 5.135478
1 non-healthy 100.0 4.160 3.132495 0.313250 3.538445 4.781555
2 combined 200.0 4.375 2.947510 0.208420 3.964004 4.785996
res

Independent t-test results
0 Difference (healthy - non-healthy) = 0.4300
1 Degrees of freedom = 198.0000
2 t = 1.0317
3 Two side test p value = 0.3035
4 Difference < 0 p value = 0.8483
5 Difference > 0 p value = 0.1517
6 Cohen's d = 0.1459
7 Hedge's g = 0.1454
8 Glass's delta = 0.1564
9 r = 0.0731
# Paired samples t-test
des, res = researchpy.ttest(df['healthy'], df['non-healthy'],
paired= True)

des

Variable N Mean SD SE 95% Conf. Interval
0 healthy 100.0 4.59 2.749086 0.274909 4.044522 5.135478
1 non-healthy 100.0 4.16 3.132495 0.313250 3.538445 4.781555
2 diff 100.0 0.43 4.063275 0.406327 -0.376242 1.236242
res

Paired samples t-test results
0 Difference (healthy - non-healthy) = 0.4300
1 Degrees of freedom = 99.0000
2 t = 1.0583
3 Two side test p value = 0.2925
4 Difference < 0 p value = 0.8537
5 Difference > 0 p value = 0.1463
6 Cohen's d = 0.1058
7 Hedge's g = 0.1054
8 Glass's delta = 0.1564
9 r = 0.1058
# Welch's t-test
des, res = researchpy.ttest(df['healthy'], df['non-healthy'],
equal_variances= False)

des

Variable N Mean SD SE 95% Conf. Interval
0 healthy 100.0 4.590 2.749086 0.274909 4.044522 5.135478
1 non-healthy 100.0 4.160 3.132495 0.313250 3.538445 4.781555
2 combined 200.0 4.375 2.947510 0.208420 3.964004 4.785996
res

Welch's t-test results
0 Difference (healthy - non-healthy) = 0.4300
1 Degrees of freedom = 194.7181
2 t = 1.0317
3 Two side test p value = 0.3035
4 Difference < 0 p value = 0.8483
5 Difference > 0 p value = 0.1517
6 Cohen's d = 0.1459
7 Hedge's g = 0.1454
8 Glass's delta = 0.1564
9 r = 0.0737
# Wilcoxon signed-rank test
researchpy.ttest(df['healthy'], df['non-healthy'],
equal_variances= False, paired= True)

Wilcoxon signed-rank test results
0 Mean for healthy = 4.5900
1 Mean for non-healthy = 4.1600
2 T value = 1849.5000
3 Z value = -0.9638
4 Two sided p value = 0.3347
5 r = -0.0681
# Exporting descriptive table (des) and result table (res) to same
# csv file
des, res = researchpy.ttest(df['healthy'], df['non-healthy'])

des.to_csv("C:\\Users\\...\\test.csv", index= False)
res.to_csv("C:\\Users\\...\\test.csv", index= False, mode= 'a')