# anova()¶

## Description¶

Performs the analysis-of-variance (ANOVA) and analysis-of-covariance (ANCOVA).

## Parameters¶

### Input¶

**anova(formula_like, data = {}, sum_of_squares = 3)**

formula_like: A valid formula which will parse the data into a design matrix.

data: The dataframe which contains the data to be analyzed.

sum_of_squares: The type of sum of squares which is desired, the default is Type 3.

### Returns¶

Returns an object with class “anova”; this object has accessible methods which are described below.

#### anova methods¶

results(return_type = “Dataframe”, decimals = 4, pretty_format = True)

return_type: The type of data structure the results should be returned as. Supported options are ‘Dataframe’ which will return a Pandas DataFrame or ‘Dictionary’ which will return a dictionary.

decimals: The number of decimal places the data should be rounded too.

pretty_format: If pretty formatting should be applied. This adds extra empty spaces in the returned data structure for visualization of the results.

regression_table(return_type = “Dataframe”, decimals = 4, conf_level = 0.95)

return_type: The type of data structure the results should be returned as. Supported options are ‘Dataframe’ which will return a Pandas DataFrame or ‘Dictionary’ which will return a dictionary.

decimals: The number of decimal places the data should be rounded too.

conf_level: The confidence interval desired.

predict(estimate = None)

estimate: Desired estimate. Available options are:

“y”or“xb”: Linear prediction

“residuals”,“res”, or“r”: Residuals

“standardized_residuals”,“standardized_r”, or“r_std”: Standardized residuals

“studentized_residuals”,“student_r”, or“r_stud”: Studentized (jackknifed) residuals

“leverage”,“lev”: Leverage of the observation (diagonal of the H matrix)See predict() for formula information.

## Effect Size Measures Formulas¶

By default, this method will return the measures of \(R^2\), \(\text{Adj. }R^2\), \(\eta^2\), \(\epsilon^2\), and \(\omega^2\). Please note that for the factor terms, the reported effect sizes are partial, i.e., \(\eta^2_p\), \(\epsilon^2_p\), and \(\omega^2_p\) respectively. See Olejnik and Aligna (2000) 1, Kelley and Preacher (2012) 2, and/or Grissom and Kim (2012) 3

Additionally, \(R^2\) and \(\eta^2\) are the same but have different names due to coming from different frameworks which uses different terminology. Formulas for how to calculate these effect sizes comes from (Olejnik & Aligna, 2000) ; see

### Eta-squared (\(\eta^2\)) and \(R^2\)¶

### Adjusted \(R^2\)¶

### Partial Eta-squared (\(\eta^2_p\))¶

### Omega-squared (\(\omega^2\))¶

### Partial Omega-squared (\(\omega^2_p\))¶

Where N is the total number of observations included in the model.

## Examples¶

First to load required libraries for this example. Below, an example data set will be loaded in using statsmodels.datasets; the data loaded in is a data set available through Stata called ‘systolic’.

```
import researchpy as rp
import pandas as pd
# Used to load example data #
import statsmodels.datasets
systolic = statsmodels.datasets.webuse('systolic')
```

Now let’s get some quick information regarding the data set.

```
systolic.info()
```

```
<class 'pandas.core.frame.DataFrame'>
Int64Index: 58 entries, 0 to 57
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 drug 58 non-null int16
1 disease 58 non-null int16
2 systolic 58 non-null int16
```

Now to take a look at the descriptive statistics of the univariate data. The output indicates that there are no missing observations and that each variable is stored as an integer.

```
rp.summarize(systolic["systolic"])
```

Name | N | Mean | Median | Variance | SD | SE | 95% Conf. Interval | |
---|---|---|---|---|---|---|---|---|

0 | systolic | 58 | 18.8793 | 21 | 163.862 | 12.8009 | 1.6808 | [15.5135, 22.2451] |

```
rp.crosstab(systolic["disease"], systolic["drug"])
```

Variable | Outcome | Count | Percent | |
---|---|---|---|---|

0 | drug | 4 | 16 | 27.59 |

1 | 2 | 15 | 25.86 | |

2 | 1 | 15 | 25.86 | |

3 | 3 | 12 | 20.69 | |

4 | disease | 3 | 20 | 34.48 |

5 | 2 | 19 | 32.76 | |

6 | 1 | 19 | 32.76 |

Now to conduct the ANOVA; by default Type 3 sum of squares are used. There are a few ways one can conduct an ANOVA using Researchpy, the suggested approach is to assign the ANOVA model to an object that way one can utilize the built-in methods. If one does not want to do that, then running the model with and displaying the results in one-line will work too; the output will be returned as a tuple. The suggested approach will be shown in this example.

```
m = anova("systolic ~ C(drug) + C(disease) + C(drug):C(disease)", data = systolic, sum_of_squares = 3)
desc, table = m.results()
print(desc, table, sep = "\n"*2)
```

Note: Effect size values for factors are partial.

Number of obs = | 58.0000 |
---|---|

Root MSE = | 10.5096 |

R-squared = | 0.4560 |

Adj R-squared = | 0.3259 |

Source | Sum of Squares | Degrees of Freedom | Mean Squares | F value | p-value | Eta squared | Omega squared |
---|---|---|---|---|---|---|---|

Model | 4,259.3385 | 11 | 387.2126 | 3.5057 | 0.0013 | 0.4560 | 0.3221 |

drug | 2,997.4719 | 3.0000 | 999.1573 | 9.0460 | 0.0001 | 0.3711 | 0.2939 |

disease | 415.8730 | 2.0000 | 207.9365 | 1.8826 | 0.1637 | 0.0757 | 0.0295 |

drug:disease | 707.2663 | 6.0000 | 117.8777 | 1.0672 | 0.3958 | 0.1222 | 0.0069 |

Residual | 5,080.8167 | 46 | 110.4525 | ||||

Total | 9,340.1552 | 57 | 163.8624 |

If it’s of interest, one can also access the underlying regression table.

```
m.regression_table()
```

systolic | Coef. | Std. Err. | t | p-value | 95% Conf. Interval |
---|---|---|---|---|---|

Intercept | 29.3333 | 4.2905 | 6.8367 | 0.0000 | [20.6969, 37.9697] |

drug | |||||

1 | (reference) | ||||

2 | -1.3333 | 6.3639 | -0.2095 | 0.8350 | [-14.1432, 11.4765] |

3 | -13.0000 | 7.4314 | -1.7493 | 0.0869 | [-27.9587, 1.9587] |

4 | -15.7333 | 6.3639 | -2.4723 | 0.0172 | [-28.5432, -2.9235] |

disease | |||||

1 | (reference) | ||||

2 | -1.0833 | 6.7839 | -0.1597 | 0.8738 | [-14.7387, 12.572] |

3 | -8.9333 | 6.3639 | -1.4038 | 0.1671 | [-21.7432, 3.8765] |

drug:disease | |||||

2:2 | 6.5833 | 9.7839 | 0.6729 | 0.5044 | [-13.1107, 26.2774] |

2:3 | -0.9000 | 8.9999 | -0.1000 | 0.9208 | [-19.0159, 17.2159] |

3:2 | -10.8500 | 10.2435 | -1.0592 | 0.2950 | [-31.4692, 9.7692] |

3:3 | 1.1000 | 10.2435 | 0.1074 | 0.9150 | [-19.5192, 21.7192] |

4:2 | 0.3167 | 9.3017 | 0.0340 | 0.9730 | [-18.4066, 19.04] |

4:3 | 9.5333 | 9.2022 | 1.0360 | 0.3056 | [-8.9897, 28.0564] |

## References¶

- 1
Stephen Olejnik and James Algina. Measures of effect size for comparative studies: applications, interpretations, and limitations.

*Contemporary Educational Psycholoty*, 25:241–286, 2000. 10.1006/ceps.2000.1040.- 2
Ken Kelly and Kristopher Preacher. On effect size.

*Psychological Methods*, 7(2):137–152, 2012. 10.1006/ceps.2000.1040.- 3
R. J. Grissom and J. J. Kim.

*Effect Sizes for Research: Univariate and Multivariate Applications*. Routledge, second edition, 2012. ISBN 978-0-415-87769-5.