summary_cont()

Returns a nice data table as a Pandas DataFrame that includes the variable name, total number of non-missing observations, standard deviation, standard error, and the 95% confidence interval. This is compatible with Pandas Series, DataFrame, and GroupBy objects.

Arguments

summary_cont(group1, conf = 0.95, decimals = 4)

  • group1, must either be a Pandas Series or DataFrame with multiple

    columns stated

  • conf, must be entered in decimal format. The default confidence interval being calculated is at 95%

  • decimals, rounds the output table to the specified decimal.

returns

  • Pandas DataFrame

Examples

import numpy, pandas, researchpy

numpy.random.seed(12345678)

df = pandas.DataFrame(numpy.random.randint(10, size= (100, 2)),
                  columns= ['healthy', 'non-healthy'])
df['tx'] = ""
df.loc[0:50, 'tx'] = "Placebo"
df.loc[50:101, 'tx'] = "Experimental"

df['dose'] = ""
df.loc[0:26, 'dose'] = "10 mg"
df.loc[26:51, 'dose'] = "25 mg"
df.loc[51:76, 'dose'] = "10 mg"
df.loc[76:101, 'dose'] = "25 mg"
# Summary statistics for a Series (single variable)
researchpy.summary_cont(df['healthy'])
Variable N Mean SD SE 95% Conf. Interval
0 healthy 100.0 4.59 2.749086 0.274909 4.044522 5.135478
# Summary statistics for multiple Series
researchpy.summary_cont(df[['healthy', 'non-healthy']])
Variable N Mean SD SE 95% Conf. Interval
0 healthy 100.0 4.59 2.749086 0.274909 4.044522 5.135478
1 non-healthy 100.0 4.16 3.132495 0.313250 3.538445 4.781555
# Easy to export results, assign to Python object which will have
# the Pandas DataFrame class
results = researchpy.summary_cont(df[['healthy', 'non-healthy']])

results.to_csv("results.csv", index= False)
# This works with GroupBy objects as well
researchpy.summary_cont(df['healthy'].groupby(df['tx']))
N Mean SD SE 95% Conf. Interval
tx
Experimental 50 4.66 2.560373 0.362091 3.943096 5.376904
Placebo 50 4.52 2.950199 0.417221 3.693944 5.346056
# Even with a GroupBy object with a hierarchical index
researchpy.summary_cont(df.groupby(['tx', 'dose'])['healthy', 'non-healthy'])
healthy non-healthy
count mean std sem 95% Conf. Interval count mean std sem 95% Conf. Interval
tx dose
Experimental 10 mg 25 4.360000 2.514624 0.502925 3.374267 5.345733 25 4.160000 3.197395 0.639479 2.906621 5.413379
25 mg 25 4.960000 2.621704 0.524341 3.932292 5.987708 25 4.240000 3.205204 0.641041 2.983560 5.496440
Placebo 10 mg 26 4.115385 2.984318 0.585273 2.968250 5.262520 26 3.961538 3.143002 0.616393 2.753407 5.169670
25 mg 24 4.958333 2.911434 0.594294 3.793517 6.123150 24 4.291667 3.168859 0.646841 3.023859 5.559474
# Above is the default output, but if the results want to be compared
# above/below each other use .apply()

df.groupby(['tx', 'dose'])['healthy', 'non-healthy'].apply(researchpy.summary_cont)
Variable N Mean SD SE 95% Conf. Interval
tx dose
Experimental 10 mg 0 healthy 25.0 4.360000 2.514624 0.502925 3.322014 5.397986
1 non-healthy 25.0 4.160000 3.197395 0.639479 2.840180 5.479820
25 mg 0 healthy 25.0 4.960000 2.621704 0.524341 3.877814 6.042186
1 non-healthy 25.0 4.240000 3.205204 0.641041 2.916957 5.563043
Placebo 10 mg 0 healthy 26.0 4.115385 2.984318 0.585273 2.909992 5.320777
1 non-healthy 26.0 3.961538 3.143002 0.616393 2.692052 5.231024
25 mg 0 healthy 24.0 4.958333 2.911434 0.594294 3.728942 6.187724
1 non-healthy 24.0 4.291667 3.168859 0.646841 2.953575 5.629758