# anova()¶

## Description¶

Performs the analysis-of-variance (ANOVA) and analysis-of-covariance (ANCOVA).

## Parameters¶

### Input¶

anova(formula_like, data = {}, sum_of_squares = 3)

• formula_like : A valid formula which will parse the data into a design matrix.

• data : The dataframe which contains the data to be analyzed.

• sum_of_squares : The type of sum of squares which is desired, the default is Type 3.

### Returns¶

Returns an object with class “anova”; this object has accessible methods which are described below.

#### anova methods¶

• results(return_type = “Dataframe”, decimals = 4, pretty_format = True)

• return_type : The type of data structure the results should be returned as. Supported options are ‘Dataframe’ which will return a Pandas DataFrame or ‘Dictionary’ which will return a dictionary.

• decimals : The number of decimal places the data should be rounded too.

• pretty_format : If pretty formatting should be applied. This adds extra empty spaces in the returned data structure for visualization of the results.

• regression_table(return_type = “Dataframe”, decimals = 4, conf_level = 0.95)

• return_type : The type of data structure the results should be returned as. Supported options are ‘Dataframe’ which will return a Pandas DataFrame or ‘Dictionary’ which will return a dictionary.

• decimals : The number of decimal places the data should be rounded too.

• conf_level : The confidence interval desired.

• predict(estimate = None)

• estimate : Desired estimate. Available options are:

• “y” or “xb” : Linear prediction

• “residuals”, “res”, or “r” : Residuals

• “standardized_residuals”, “standardized_r”, or “r_std” : Standardized residuals

• “studentized_residuals”, “student_r”, or “r_stud” : Studentized (jackknifed) residuals

• “leverage”, “lev” : Leverage of the observation (diagonal of the H matrix)

See predict() for formula information.

## Effect Size Measures Formulas¶

By default, this method will return the measures of $$R^2$$, $$\text{Adj. }R^2$$, $$\eta^2$$, and $$\omega^2$$; note that for the factor terms the reported $$\eta^2$$ and $$\omega^2$$ will be partial, i.e. $$\eta^2_p$$ and $$\omega^2_p$$ respectively. Additionally, $$R^2$$ and $$\eta^2$$ are the same but have different names due to coming from different frameworks which uses different terminology. Formulas for how to calculate these effect sizes comes from 1.

### Eta-squared ($$\eta^2$$) and $$R^2$$¶

$\eta^2 = \frac{\text{SS}_{model}}{\text{SS}_{total}}$

### Adjusted $$R^2$$¶

$\text{Adj. }R^2 = 1 - \frac{\text{df}_{total}}{\text{df}_{error}} * \frac{\text{SS}_{error}}{\text{SS}_{total}}$

### Partial Eta-squared ($$\eta^2_p$$)¶

$\eta^2_p = \frac{\text{SS}_{effect}}{\text{SS}_{effect} + \text{SS}_{error}}$

### Omega-squared ($$\omega^2$$)¶

$\omega^2 = \frac{\text{SS}_{effect} - (\text{df}_{effect} * \text{MS}_{error})}{\text{SS}_{total} + \text{MS}_{error}}$

### Partial Omega-squared ($$\omega^2_p$$)¶

$\omega^2_p = \frac{\text{SS}_{effect} - (\text{df}_{effect} * \text{MS}_{error})}{\text{SS}_{effect} + (\text{N} - \text{df}_{effect}) * \text{MS}_{error}}$

Where N is the total number of observations included in the model.

## Examples¶

First to load required libraries for this example. Below, an example data set will be loaded in using statsmodels.datasets; the data loaded in is a data set available through Stata called ‘systolic’.

import researchpy as rp
import pandas as pd
# Used to load example data #
import statsmodels.datasets

systolic = statsmodels.datasets.webuse('systolic')


Now let’s get some quick information regarding the data set.

systolic.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 58 entries, 0 to 57
Data columns (total 3 columns):
#   Column    Non-Null Count  Dtype
---  ------    --------------  -----
0   drug      58 non-null     int16
1   disease   58 non-null     int16
2   systolic  58 non-null     int16


Now to take a look at the descriptive statistics of the univariate data. The output indicates that there are no missing observations and that each variable is stored as an integer.

rp.summarize(systolic["systolic"])

Name N Mean Median Variance SD SE 95% Conf. Interval
0 systolic 58 18.8793 21 163.862 12.8009 1.6808 [15.5135, 22.2451]
rp.crosstab(systolic["disease"], systolic["drug"])

Variable Outcome Count Percent
0 drug 4 16 27.59
1 2 15 25.86
2 1 15 25.86
3 3 12 20.69
4 disease 3 20 34.48
5 2 19 32.76
6 1 19 32.76

Now to conduct the ANOVA; by default Type 3 sum of squares are used. There are a few ways one can conduct an ANOVA using Researchpy, the suggested approach is to assign the ANOVA model to an object that way one can utilize the built-in methods. If one does not want to do that, then running the model with and displaying the results in one-line will work too; the output will be returned as a tuple. The suggested approach will be shown in this example.

m = anova("systolic ~ C(drug) + C(disease) + C(drug):C(disease)", data = systolic, sum_of_squares = 3)

desc, table = m.results()
print(desc, table, sep = "\n"*2)


Note: Effect size values for factors are partial.

Number of obs = 58.0000
Root MSE = 10.5096
R-squared = 0.4560
Source Sum of Squares Degrees of Freedom Mean Squares F value p-value Eta squared Omega squared
Model 4,259.3385 11 387.2126 3.5057 0.0013 0.4560 0.3221
drug 2,997.4719 3.0000 999.1573 9.0460 0.0001 0.3711 0.2939
disease 415.8730 2.0000 207.9365 1.8826 0.1637 0.0757 0.0295
drug:disease 707.2663 6.0000 117.8777 1.0672 0.3958 0.1222 0.0069
Residual 5,080.8167 46 110.4525
Total 9,340.1552 57 163.8624

If it’s of interest, one can also access the underlying regression table.

m.regression_table()

systolic Coef. Std. Err. t p-value 95% Conf. Interval
Intercept 29.3333 4.2905 6.8367 0.0000 [20.6969, 37.9697]
drug
1 (reference)
2 -1.3333 6.3639 -0.2095 0.8350 [-14.1432, 11.4765]
3 -13.0000 7.4314 -1.7493 0.0869 [-27.9587, 1.9587]
4 -15.7333 6.3639 -2.4723 0.0172 [-28.5432, -2.9235]
disease
1 (reference)
2 -1.0833 6.7839 -0.1597 0.8738 [-14.7387, 12.572]
3 -8.9333 6.3639 -1.4038 0.1671 [-21.7432, 3.8765]
drug:disease
2:2 6.5833 9.7839 0.6729 0.5044 [-13.1107, 26.2774]
2:3 -0.9000 8.9999 -0.1000 0.9208 [-19.0159, 17.2159]
3:2 -10.8500 10.2435 -1.0592 0.2950 [-31.4692, 9.7692]
3:3 1.1000 10.2435 0.1074 0.9150 [-19.5192, 21.7192]
4:2 0.3167 9.3017 0.0340 0.9730 [-18.4066, 19.04]
4:3 9.5333 9.2022 1.0360 0.3056 [-8.9897, 28.0564]

## References¶

1

R. J. Grissom and J. J. Kim. Effect Sizes for Research: Univariate and Multivariate Applications. Routledge, second edition, 2012. ISBN 978-0-415-87769-5.