************* summarize() ************* Description =========== Calculates univariate descriptive statistics and returns the information as either a Pandas DataFrame object (the default) or Python dictionary object. The following calculations are available: * count of non-missing (N), * the average (Mean), * the median (Median), * variance (Variance), * standard deviation (SD), * standard error (SE), * confidence interval (CI), * minimum value (Min), * maximum value (Max), * range (Range), * the kurtosis (Kurtosis), and * the skew (Skew). If no univariate descriptive statistics are specified, by default the N, Mean, Median, Variance, SD, SE, and 95% confidence interval are calculated. Parameters ========== Input ----- **summarize(data = {}, name = None, stats = [], ci_level = 0.95, decimals = 4, return_type = "Dataframe")** * **data** : The Pandas DataFrame or array_like object which contains the data to be analyzed. * **stats** : The univariate statistics to be calculated entered as a list; the default is ["N", "Mean", "Median", "Variance", "SD", "SE", "CI"]. * **ci_level** : The confidence interval to be calculated; the default is 0.95, i.e., 95% confidence interval. * **decimals** : The rounding to be applied to the data; the default is 4. * **return_type** : The data structure to be returned; available options are "Dataframe" or "Dictionary" with the default being "Dataframe". Returns ------- Pandas DataFrame or Python dictionary object containing the univariate descriptive statistics. Examples ======== Loading Packages and Data ------------------------- First to load required libraries for this example. Below, an example data set will be loaded in using statsmodels.datasets; the data loaded in is a data set available through Stata called 'auto'. .. code:: python import researchpy as rp import pandas as pd # Used to load example data # import statsmodels.datasets auto = statsmodels.datasets.webuse('auto') auto.info() .. parsed-literal:: Int64Index: 74 entries, 0 to 73 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 make 74 non-null object 1 price 74 non-null int16 2 mpg 74 non-null int16 3 rep78 69 non-null float64 4 headroom 74 non-null float32 5 trunk 74 non-null int16 6 weight 74 non-null int16 7 length 74 non-null int16 8 turn 74 non-null int16 9 displacement 74 non-null int16 10 gear_ratio 74 non-null float32 11 foreign 74 non-null category dtypes: category(1), float32(2), float64(1), int16(7), object(1) memory usage: 3.5+ KB Single Variable --------------- First demonstration will show how to get descriptive statistics for a single variable. .. code:: python rp.summarize(auto.price) .. raw:: html
Name N Mean Median Variance SD SE 95% Conf. Interval
price 74 6,165.2568 5,006.5000 8,699,525.9743 2,949.4959 342.8719 [5481.914, 6848.5995]
Two Variables ------------- Now let's get information from 2 variables at the same time. .. code:: python rp.summarize(auto[["price", "mpg"]]) .. raw:: html
Name N Mean Median Variance SD SE 95% Conf. Interval
price 74 6,165.2568 5,006.5000 8,699,525.9743 2,949.4959 342.8719 [5481.914, 6848.5995]
mpg 74 21.2973 20.0000 33.4720 5.7855 0.6726 [19.9569, 22.6377]
Pandas Groupby Objects ---------------------- This method also supports calculations for Pandas Series and Pandas DataFrame Groupby objects. Pandas Series Groupby Object ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: python rp.summarize(auto.groupby("foreign")["price"]) .. raw:: html
foreign N Mean Median Variance SD SE 95% Conf. Interval
Domestic 52 6,072.4231 4,782.5000 9,592,054.9155 3,097.1043 429.4911 [5210.1837, 6934.6624]
Foreign 22 6,384.6818 5,759.0000 6,874,438.7035 2,621.9151 558.9942 [5222.1898, 7547.1738]
Pandas Dataframe Groupby Object ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: python rp.summarize(auto.groupby(["foreign"])[["price", "mpg"]]) .. raw:: html
foreign price mpg
N Mean Median Variance SD SE 95% Conf. Interval N Mean Median Variance SD SE 95% Conf. Interval
Domestic 52 6,072.4231 4,782.5000 9,592,054.9155 3,097.1043 429.4911 [5210.1837, 6934.6624] 52 19.8269 19.0000 22.4989 4.7433 0.6578 [18.5064, 21.1475]
Foreign 22 6,384.6818 5,759.0000 6,874,438.7035 2,621.9151 558.9942 [5222.1898, 7547.1738] 22 24.7727 24.5000 43.7078 6.6112 1.4095 [21.8415, 27.704]