

Performs the analysis-of-variance (ANOVA) and analysis-of-covariance (ANCOVA).



anova(formula_like, data = {}, sum_of_squares = 3)

  • formula_like : A valid formula which will parse the data into a design matrix.

  • data : The dataframe which contains the data to be analyzed.

  • sum_of_squares : The type of sum of squares which is desired, the default is Type 3.


Returns an object with class “anova”; this object has accessible methods which are described below.

anova methods

  • results(return_type = “Dataframe”, decimals = 4, pretty_format = True)

    • return_type : The type of data structure the results should be returned as. Supported options are ‘Dataframe’ which will return a Pandas DataFrame or ‘Dictionary’ which will return a dictionary.

    • decimals : The number of decimal places the data should be rounded too.

    • pretty_format : If pretty formatting should be applied. This adds extra empty spaces in the returned data structure for visualization of the results.

  • regression_table(return_type = “Dataframe”, decimals = 4, conf_level = 0.95)

    • return_type : The type of data structure the results should be returned as. Supported options are ‘Dataframe’ which will return a Pandas DataFrame or ‘Dictionary’ which will return a dictionary.

    • decimals : The number of decimal places the data should be rounded too.

    • conf_level : The confidence interval desired.

  • predict(estimate = None)

    • estimate : Desired estimate. Available options are:

      • “y” or “xb” : Linear prediction

      • “residuals”, “res”, or “r” : Residuals

      • “standardized_residuals”, “standardized_r”, or “r_std” : Standardized residuals

      • “studentized_residuals”, “student_r”, or “r_stud” : Studentized (jackknifed) residuals

      • “leverage”, “lev” : Leverage of the observation (diagonal of the H matrix)

    See predict() for formula information.

Effect Size Measures Formulas

By default, this method will return the measures of \(R^2\), \(\text{Adj. }R^2\), \(\eta^2\), \(\epsilon^2\), and \(\omega^2\). Please note that for the factor terms, the reported effect sizes are partial, i.e., \(\eta^2_p\), \(\epsilon^2_p\), and \(\omega^2_p\) respectively. See Olejnik and Aligna (2000) [1], Kelley and Preacher (2012) [2], and/or Grissom and Kim (2012) [3]

Additionally, \(R^2\) and \(\eta^2\) are the same but have different names due to coming from different frameworks which uses different terminology. Formulas for how to calculate these effect sizes comes from (Olejnik & Aligna, 2000) ; see

Eta-squared (\(\eta^2\)) and \(R^2\)

\[\eta^2 = \frac{\text{SS}_{model}}{\text{SS}_{total}}\]

Adjusted \(R^2\)

\[\text{Adj. }R^2 = 1 - \frac{\text{df}_{total}}{\text{df}_{error}} * \frac{\text{SS}_{error}}{\text{SS}_{total}}\]

Partial Eta-squared (\(\eta^2_p\))

\[\eta^2_p = \frac{\text{SS}_{effect}}{\text{SS}_{effect} + \text{SS}_{error}}\]

Omega-squared (\(\omega^2\))

\[\omega^2 = \frac{\text{SS}_{effect} - (\text{df}_{effect} * \text{MS}_{error})}{\text{SS}_{total} + \text{MS}_{error}}\]

Partial Omega-squared (\(\omega^2_p\))

\[\omega^2_p = \frac{\text{SS}_{effect} - (\text{df}_{effect} * \text{MS}_{error})}{\text{SS}_{effect} + (\text{N} - \text{df}_{effect}) * \text{MS}_{error}}\]

Where N is the total number of observations included in the model.


First to load required libraries for this example. Below, an example data set will be loaded in using statsmodels.datasets; the data loaded in is a data set available through Stata called ‘systolic’.

import researchpy as rp
 import pandas as pd
 # Used to load example data #
 import statsmodels.datasets

 systolic = statsmodels.datasets.webuse('systolic')

Now let’s get some quick information regarding the data set.
<class 'pandas.core.frame.DataFrame'>
 Int64Index: 58 entries, 0 to 57
Data columns (total 3 columns):
#   Column    Non-Null Count  Dtype
---  ------    --------------  -----
0   drug      58 non-null     int16
1   disease   58 non-null     int16
2   systolic  58 non-null     int16

Now to take a look at the descriptive statistics of the univariate data. The output indicates that there are no missing observations and that each variable is stored as an integer.

Name N Mean Median Variance SD SE 95% Conf. Interval
0 systolic 58 18.8793 21 163.862 12.8009 1.6808 [15.5135, 22.2451]
rp.crosstab(systolic["disease"], systolic["drug"])
Variable Outcome Count Percent
0 drug 4 16 27.59
1 2 15 25.86
2 1 15 25.86
3 3 12 20.69
4 disease 3 20 34.48
5 2 19 32.76
6 1 19 32.76

Now to conduct the ANOVA; by default Type 3 sum of squares are used. There are a few ways one can conduct an ANOVA using Researchpy, the suggested approach is to assign the ANOVA model to an object that way one can utilize the built-in methods. If one does not want to do that, then running the model with and displaying the results in one-line will work too; the output will be returned as a tuple. The suggested approach will be shown in this example.

m = anova("systolic ~ C(drug) + C(disease) + C(drug):C(disease)", data = systolic, sum_of_squares = 3)

 desc, table = m.results()
 print(desc, table, sep = "\n"*2)

Note: Effect size values for factors are partial.

Number of obs = 58.0000
Root MSE = 10.5096
R-squared = 0.4560
Adj R-squared = 0.3259
Source Sum of Squares Degrees of Freedom Mean Squares F value p-value Eta squared Omega squared
Model 4,259.3385 11 387.2126 3.5057 0.0013 0.4560 0.3221
drug 2,997.4719 3.0000 999.1573 9.0460 0.0001 0.3711 0.2939
disease 415.8730 2.0000 207.9365 1.8826 0.1637 0.0757 0.0295
drug:disease 707.2663 6.0000 117.8777 1.0672 0.3958 0.1222 0.0069
Residual 5,080.8167 46 110.4525
Total 9,340.1552 57 163.8624

If it’s of interest, one can also access the underlying regression table.

systolic Coef. Std. Err. t p-value 95% Conf. Interval
Intercept 29.3333 4.2905 6.8367 0.0000 [20.6969, 37.9697]
1 (reference)
2 -1.3333 6.3639 -0.2095 0.8350 [-14.1432, 11.4765]
3 -13.0000 7.4314 -1.7493 0.0869 [-27.9587, 1.9587]
4 -15.7333 6.3639 -2.4723 0.0172 [-28.5432, -2.9235]
1 (reference)
2 -1.0833 6.7839 -0.1597 0.8738 [-14.7387, 12.572]
3 -8.9333 6.3639 -1.4038 0.1671 [-21.7432, 3.8765]
2:2 6.5833 9.7839 0.6729 0.5044 [-13.1107, 26.2774]
2:3 -0.9000 8.9999 -0.1000 0.9208 [-19.0159, 17.2159]
3:2 -10.8500 10.2435 -1.0592 0.2950 [-31.4692, 9.7692]
3:3 1.1000 10.2435 0.1074 0.9150 [-19.5192, 21.7192]
4:2 0.3167 9.3017 0.0340 0.9730 [-18.4066, 19.04]
4:3 9.5333 9.2022 1.0360 0.3056 [-8.9897, 28.0564]
