summarize()
Description
Calculates univariate descriptive statistics and returns the information as either a Pandas DataFrame object (the default) or Python dictionary object.
The following calculations are available:
count of non-missing (N),
the average (Mean),
the median (Median),
variance (Variance),
standard deviation (SD),
standard error (SE),
confidence interval (CI),
minimum value (Min),
maximum value (Max),
range (Range),
the kurtosis (Kurtosis), and
the skew (Skew).
If no univariate descriptive statistics are specified, by default the N, Mean, Median, Variance, SD, SE, and 95% confidence interval are calculated.
Parameters
Input
summarize(data = {}, name = None, stats = [], ci_level = 0.95, decimals = 4, return_type = “Dataframe”)
data : The Pandas DataFrame or array_like object which contains the data to be analyzed.
stats : The univariate statistics to be calculated entered as a list; the default is [“N”, “Mean”, “Median”, “Variance”, “SD”, “SE”, “CI”].
ci_level : The confidence interval to be calculated; the default is 0.95, i.e., 95% confidence interval.
decimals : The rounding to be applied to the data; the default is 4.
return_type : The data structure to be returned; available options are “Dataframe” or “Dictionary” with the default being “Dataframe”.
Returns
Pandas DataFrame or Python dictionary object containing the univariate descriptive statistics.
Examples
Loading Packages and Data
First to load required libraries for this example. Below, an example data set will be loaded in using statsmodels.datasets; the data loaded in is a data set available through Stata called ‘auto’.
import researchpy as rp
import pandas as pd
# Used to load example data #
import statsmodels.datasets
auto = statsmodels.datasets.webuse('auto')
auto.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 74 entries, 0 to 73
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 make 74 non-null object
1 price 74 non-null int16
2 mpg 74 non-null int16
3 rep78 69 non-null float64
4 headroom 74 non-null float32
5 trunk 74 non-null int16
6 weight 74 non-null int16
7 length 74 non-null int16
8 turn 74 non-null int16
9 displacement 74 non-null int16
10 gear_ratio 74 non-null float32
11 foreign 74 non-null category
dtypes: category(1), float32(2), float64(1), int16(7), object(1)
memory usage: 3.5+ KB
Single Variable
First demonstration will show how to get descriptive statistics for a single variable.
rp.summarize(auto.price)
Name | N | Mean | Median | Variance | SD | SE | 95% Conf. Interval |
---|---|---|---|---|---|---|---|
price | 74 | 6,165.2568 | 5,006.5000 | 8,699,525.9743 | 2,949.4959 | 342.8719 | [5481.914, 6848.5995] |
Two Variables
Now let’s get information from 2 variables at the same time.
rp.summarize(auto[["price", "mpg"]])
Name | N | Mean | Median | Variance | SD | SE | 95% Conf. Interval |
---|---|---|---|---|---|---|---|
price | 74 | 6,165.2568 | 5,006.5000 | 8,699,525.9743 | 2,949.4959 | 342.8719 | [5481.914, 6848.5995] |
mpg | 74 | 21.2973 | 20.0000 | 33.4720 | 5.7855 | 0.6726 | [19.9569, 22.6377] |
Pandas Groupby Objects
This method also supports calculations for Pandas Series and Pandas DataFrame Groupby objects.
Pandas Series Groupby Object
rp.summarize(auto.groupby("foreign")["price"])
foreign | N | Mean | Median | Variance | SD | SE | 95% Conf. Interval |
---|---|---|---|---|---|---|---|
Domestic | 52 | 6,072.4231 | 4,782.5000 | 9,592,054.9155 | 3,097.1043 | 429.4911 | [5210.1837, 6934.6624] |
Foreign | 22 | 6,384.6818 | 5,759.0000 | 6,874,438.7035 | 2,621.9151 | 558.9942 | [5222.1898, 7547.1738] |
Pandas Dataframe Groupby Object
rp.summarize(auto.groupby(["foreign"])[["price", "mpg"]])
foreign | price | mpg | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
N | Mean | Median | Variance | SD | SE | 95% Conf. Interval | N | Mean | Median | Variance | SD | SE | 95% Conf. Interval | |
Domestic | 52 | 6,072.4231 | 4,782.5000 | 9,592,054.9155 | 3,097.1043 | 429.4911 | [5210.1837, 6934.6624] | 52 | 19.8269 | 19.0000 | 22.4989 | 4.7433 | 0.6578 | [18.5064, 21.1475] |
Foreign | 22 | 6,384.6818 | 5,759.0000 | 6,874,438.7035 | 2,621.9151 | 558.9942 | [5222.1898, 7547.1738] | 22 | 24.7727 | 24.5000 | 43.7078 | 6.6112 | 1.4095 | [21.8415, 27.704] |