summary_cat()
=============
Returns a data table as a Pandas DataFrame that includes the counts and
percentages of each category. If there are missing data
present (numpy.nan), they will be excluded from the counts. However, if the
missing data is coded as a string, it will be included as it's own category.
Arguments
---------
**summary_cat(group1, ascending= False)**
* **group1**, can be a Pandas Series or DataFrame with multiple columns stated
* **ascending**, determines the output ascending order or not. Default is
descending.
**returns**
* Pandas DataFrame
Examples
--------
.. code:: python
import numpy, pandas, researchpy
numpy.random.seed(123)
df = pandas.DataFrame(numpy.random.randint(2, size= (101, 2)),
columns= ['disease', 'treatment'])
.. code:: python
# Handles a single Pandas Series
researchpy.summary_cat(df['disease'])
.. raw:: html
|
Variable |
Outcome |
Count |
Percent |
| 0 |
disease |
0 |
53 |
52.48 |
| 1 |
|
1 |
48 |
47.52 |
.. code:: python
# Can handle multiple Series, although the output is not pretty
researchpy.summary_cat(df[['disease', 'treatment']])
.. raw:: html
|
Variable |
Outcome |
Count |
Percent |
| 0 |
disease |
0 |
53 |
52.48 |
| 1 |
|
1 |
48 |
47.52 |
| 2 |
treatment |
1 |
52 |
51.49 |
| 3 |
|
0 |
49 |
48.51 |
.. code:: python
# If missing is a string, it will show up as it's own category
df['disease'][0] = ""
researchpy.summary_cat(df['disease'])
.. raw:: html
|
Variable |
Outcome |
Count |
Percent |
| 0 |
disease |
0 |
52 |
51.49 |
| 1 |
|
1 |
48 |
47.52 |
| 2 |
|
|
1 |
0.99 |
.. code:: python
# However, is missing is a numpy.nan, it will be excluded from the counts
df['disease'][0] = numpy.nan
researchpy.summary_cat(df['disease'])
.. raw:: html
|
Variable |
Outcome |
Count |
Percent |
| 0 |
disease |
0 |
52 |
52.0 |
| 1 |
|
1 |
48 |
48.0 |
.. code:: python
# Results can easily be exported using many methods including the default
# Pandas exporting methods
results = researchpy.summary_cat(df['disease'])
results.to_csv("summary_cats.csv", index= False)
.. code:: python
# This is the default, showing for comparison of immediately below
researchpy.summary_cat(df['disease'], ascending= False)
.. raw:: html
|
Variable |
Outcome |
Count |
Percent |
| 0 |
disease |
0 |
52 |
52.0 |
| 1 |
|
1 |
48 |
48.0 |
.. code:: python
researchpy.summary_cat(df['disease'], ascending= True)
.. raw:: html
|
Variable |
Outcome |
Count |
Percent |
| 0 |
disease |
1 |
48 |
48.0 |
| 1 |
|
0 |
52 |
52.0 |