# predict()¶

## Description¶

Calculates linear predictions, various residuals, leverage statistics, distance statistics, and others. Included in classes which support this method, it’s also available as separate method using -predict()-.

### Classes where predict included¶

ols

anova

## Parameters¶

### Input¶

**predict(mdl_data = {}, estimate = None)**

mdl_data: Class object from ols or anova

estimate: Desired estimate. Available options are:

“y”or“xb”: Linear prediction

“residuals”,“res”, or“r”: Residuals

“standardized_residuals”,“standardized_r”, or“rstand”: Standardized residuals

“studentized_residuals”,“student_r”, or“rstud”: Studentized (jackknifed) residuals

“leverage”,“lev”: Leverage of the observation (diagonal of the H matrix)

### Returns¶

Numpy array with n x 1 dimensions; where n is the number of observations

## Formulas¶

Note

Y is the dependent variable array/vector

X is the independent variable design matrix

H is the hat matrix

\(^T\) indicates a transpose, i.e. \(X^T\) is the transpose of X

### Linear prediction 1¶

The linear prediction is calculated as:

### Residuals 1¶

Residuals are calculated as:

### Standardized residuals 1¶

Standardized residuals are calculated using the following formula:

### Studentized (jackknifed) residuals 1¶

Studentized (jackknifed) residuals are calculated using the following formula:

where:

n = number of observations

k = number of predictors

r = standardized residual

### Leverage 1¶

Calculate the leverage of the observation; the leverage is the diagonal of the H (hat) matrix. The H matrix is calculated using:

## Examples¶

First to load required libraries for this example. Below, an example data set will be loaded in using statsmodels.datasets; the data loaded in is a data set available through Stata called ‘systolic’.

```
import researchpy as rp
import pandas as pd
# Used to load example data #
import statsmodels.datasets
systolic = statsmodels.datasets.webuse('systolic')
```

Now let’s get some quick information regarding the data set.

```
systolic.info()
```

```
<class 'pandas.core.frame.DataFrame'>
Int64Index: 58 entries, 0 to 57
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 drug 58 non-null int16
1 disease 58 non-null int16
2 systolic 58 non-null int16
```

Now to take a look at the descriptive statistics of the univariate data. The output indicates that there are no missing observations and that each variable is stored as an integer.

```
rp.summarize(systolic["systolic"])
```

Name | N | Mean | Median | Variance | SD | SE | 95% Conf. Interval | |
---|---|---|---|---|---|---|---|---|

0 | systolic | 58 | 18.8793 | 21 | 163.862 | 12.8009 | 1.6808 | [15.5135, 22.2451] |

```
rp.crosstab(systolic["disease"], systolic["drug"])
```

Variable | Outcome | Count | Percent | |
---|---|---|---|---|

0 | drug | 4 | 16 | 27.59 |

1 | 2 | 15 | 25.86 | |

2 | 1 | 15 | 25.86 | |

3 | 3 | 12 | 20.69 | |

4 | disease | 3 | 20 | 34.48 |

5 | 2 | 19 | 32.76 | |

6 | 1 | 19 | 32.76 |

```
m = anova("systolic ~ C(drug) + C(disease) + C(drug):C(disease)", data = systolic, sum_of_squares = 3)
desc, table = m.results()
print(desc, table, sep = "\n"*2)
```

Note: Effect size values for factors are partial.

Number of obs = | 58.0000 |
---|---|

Root MSE = | 10.5096 |

R-squared = | 0.4560 |

Adj R-squared = | 0.3259 |

Source | Sum of Squares | Degrees of Freedom | Mean Squares | F value | p-value | Eta squared | Omega squared |
---|---|---|---|---|---|---|---|

Model | 4,259.3385 | 11 | 387.2126 | 3.5057 | 0.0013 | 0.4560 | 0.3221 |

drug | 2,997.4719 | 3.0000 | 999.1573 | 9.0460 | 0.0001 | 0.3711 | 0.2939 |

disease | 415.8730 | 2.0000 | 207.9365 | 1.8826 | 0.1637 | 0.0757 | 0.0295 |

drug:disease | 707.2663 | 6.0000 | 117.8777 | 1.0672 | 0.3958 | 0.1222 | 0.0069 |

Residual | 5,080.8167 | 46 | 110.4525 | ||||

Total | 9,340.1552 | 57 | 163.8624 |

```
m.predict(estimate="r")[:10]
```

```
array([[ 12.6667],
[ 14.6667],
[ 6.6667],
[-16.3333],
[-10.3333],
[ -7.3333],
[ 4.75 ],
[ -2.25 ],
[ 4.75 ],
[ -7.25 ]])
```