# summarize()¶

## Description¶

Calculates univariate descriptive statistics and returns the information as either a Pandas DataFrame object (the default) or Python dictionary object.

The following calculations are available:

• count of non-missing (N),

• the average (Mean),

• the median (Median),

• variance (Variance),

• standard deviation (SD),

• standard error (SE),

• confidence interval (CI),

• minimum value (Min),

• maximum value (Max),

• range (Range),

• the kurtosis (Kurtosis), and

• the skew (Skew).

If no univariate descriptive statistics are specified, by default the N, Mean, Median, Variance, SD, SE, and 95% confidence interval are calculated.

## Parameters¶

### Input¶

summarize(data = {}, name = None, stats = [], ci_level = 0.95, decimals = 4, return_type = “Dataframe”)

• data : The Pandas DataFrame or array_like object which contains the data to be analyzed.

• stats : The univariate statistics to be calculated entered as a list; the default is [“N”, “Mean”, “Median”, “Variance”, “SD”, “SE”, “CI”].

• ci_level : The confidence interval to be calculated; the default is 0.95, i.e., 95% confidence interval.

• decimals : The rounding to be applied to the data; the default is 4.

• return_type : The data structure to be returned; available options are “Dataframe” or “Dictionary” with the default being “Dataframe”.

### Returns¶

Pandas DataFrame or Python dictionary object containing the univariate descriptive statistics.

## Examples¶

First to load required libraries for this example. Below, an example data set will be loaded in using statsmodels.datasets; the data loaded in is a data set available through Stata called ‘auto’.

```import researchpy as rp
import pandas as pd
# Used to load example data #
import statsmodels.datasets

auto = statsmodels.datasets.webuse('auto')
auto.info()
```
```<class 'pandas.core.frame.DataFrame'>
Int64Index: 74 entries, 0 to 73
Data columns (total 12 columns):
#   Column        Non-Null Count  Dtype
---  ------        --------------  -----
0   make          74 non-null     object
1   price         74 non-null     int16
2   mpg           74 non-null     int16
3   rep78         69 non-null     float64
4   headroom      74 non-null     float32
5   trunk         74 non-null     int16
6   weight        74 non-null     int16
7   length        74 non-null     int16
8   turn          74 non-null     int16
9   displacement  74 non-null     int16
10  gear_ratio    74 non-null     float32
11  foreign       74 non-null     category
dtypes: category(1), float32(2), float64(1), int16(7), object(1)
memory usage: 3.5+ KB
```

### Single Variable¶

First demonstration will show how to get descriptive statistics for a single variable.

```rp.summarize(auto.price)
```
Name N Mean Median Variance SD SE 95% Conf. Interval
price 74 6,165.2568 5,006.5000 8,699,525.9743 2,949.4959 342.8719 [5481.914, 6848.5995]

### Two Variables¶

Now let’s get information from 2 variables at the same time.

```rp.summarize(auto[["price", "mpg"]])
```
Name N Mean Median Variance SD SE 95% Conf. Interval
price 74 6,165.2568 5,006.5000 8,699,525.9743 2,949.4959 342.8719 [5481.914, 6848.5995]
mpg 74 21.2973 20.0000 33.4720 5.7855 0.6726 [19.9569, 22.6377]

### Pandas Groupby Objects¶

This method also supports calculations for Pandas Series and Pandas DataFrame Groupby objects.

#### Pandas Series Groupby Object¶

```rp.summarize(auto.groupby("foreign")["price"])
```
foreign N Mean Median Variance SD SE 95% Conf. Interval
Domestic 52 6,072.4231 4,782.5000 9,592,054.9155 3,097.1043 429.4911 [5210.1837, 6934.6624]
Foreign 22 6,384.6818 5,759.0000 6,874,438.7035 2,621.9151 558.9942 [5222.1898, 7547.1738]

#### Pandas Dataframe Groupby Object¶

```rp.summarize(auto.groupby(["foreign"])[["price", "mpg"]])
```
foreign price mpg
N Mean Median Variance SD SE 95% Conf. Interval N Mean Median Variance SD SE 95% Conf. Interval
Domestic 52 6,072.4231 4,782.5000 9,592,054.9155 3,097.1043 429.4911 [5210.1837, 6934.6624] 52 19.8269 19.0000 22.4989 4.7433 0.6578 [18.5064, 21.1475]
Foreign 22 6,384.6818 5,759.0000 6,874,438.7035 2,621.9151 558.9942 [5222.1898, 7547.1738] 22 24.7727 24.5000 43.7078 6.6112 1.4095 [21.8415, 27.704]