signrank()

Description

Conducts the Wilcoxon signed-ranks test for paired-sample data. Data can be entered using the formula_like structure, or by passing two array like structures. How to use both of these approaches will be demonstrated. The results can be returned as Pandas DataFrame object (default) or as a Python dictionary object.

The data model is passed to signrank and then the conduct method needs to be applied. This method returns 3 data objects within a tuple.

Parameters

Input

signrank(formula_like = None, data = {}, group1 = None, group2 = None, zero_method = “pratt”, correction = False, mode = “auto”)

  • formula_like : A valid formula which will parse the data into a design matrix.

  • data : The dataframe which contains the data to be analyzed; required if using formula_like.

  • group1 : The array like object which contains data for the paired-sample.

  • group2 : The array like object which contains data for the paired-sample.

  • zero_method : How to handle the zero-differences in the ranking process. Available options are (see scipy.stats.wilcoxon):

    • “pratt” : Includes zero-differences in the ranking process, but drops the ranks of the zeros (default).

    • “wilcox” : Discards all zero-differences.

  • correction : Boolean value indicating if the continuity correction should be applied; see scipy.stats.wilcoxon for more information.

  • mode : Method to calculate the p-value, see scipy.stats.wilcoxon for more information. Options are:

    • “auto” : Use the exact distribution if there are no more than 25 observations and no ties, otherwise a normal approximation will be used (default).

    • “exact” : Use the exact distribution, can be used if there are no more than 25 observations and no ties.

    • “approx” : Use a normal approximation.

Returns

Returns an object with class “signrank”; this object has an accessible method which is described below.

signrank methods

  • conduct(return_type = “Dataframe”, effect_size = [])

    • return_type : The type of data structure the results should be returned as. Supported options are ‘Dataframe’ which will return a Pandas DataFrame or ‘Dictionary’ which will return a dictionary.

    • effect_size : A list object which indicates which effect size measures should be calculated. Available options are: * pd : Calculates the Rank-Biserial r coefficient. * pearson : Calculates the Pearson r coefficient.

    After using the conduct method three objects will be returned within a tuple. The first object provides descriptive information regarding the ranks, the second object contains the adjustment information, and the third object contains the test results.

Effect size measures formulas

By default no effect size measures are calculated; Rank-Biserial r calculation is from Kerby (2012) [1] while the Pearson r calculation is from Fritz, Morris, and Richler (2012) [2]

Rank-Biserial r

\[\text{Rank-Biserial r = } \frac{\sum{Ranks}_{+} - \sum{Ranks}_{-}}{\sum{Ranks}_{total}}\]

Pearson r

\[\text{Pearson r = } \frac{Z}{\sqrt{N}}\]

Where N is the total number of observations included in the model.

Examples

Loading Packages and Data

First to load required libraries for this example. Below, an example data set will be loaded in using statsmodels.datasets; the data loaded in is a data set available through Stata called ‘fuel’.

import researchpy as rp
import pandas as pd
# Used to load example data #
import statsmodels.datasets

fuel = statsmodels.datasets.webuse('fuel')
fuel["id"] = range(1, fuel.shape[0] + 1)
fuel.info()
mpg1 mpg2 id
20.0000 24.0000 1
23.0000 25.0000 2
21.0000 21.0000 3
25.0000 22.0000 4
18.0000 23.0000 5

The data is currently in a wide structure where each column, mpg1 and mpg2, represent a value for the same ID. This format is supported by signrank. The long format structure is also supported using the formula_like approach, in order to have the data ready for this demonstration section the transformation will be conducted here.

fuel2 = pandas.melt(fuel, id_vars = "id",
                     value_vars = ["mpg1", "mpg2"],
                     var_name = "mpg")

fuel2.head()
id mpg value
1 mpg1 20.0000
2 mpg1 23.0000
3 mpg1 21.0000
4 mpg1 25.0000
5 mpg1 18.0000

Signrank using Wide Structured Datasets

Since the test returns 3 data objects, this demonstration will assign each data object to variable. This is not required, but it makes the output look cleaner.

desc, var_adj, res = signrank(group1 = fuel.mpg1, group2 = fuel.mpg2).conduct()

print(desc, var_adj, res, sep = "\n"*2)
sign obs sum ranks expected
positive 3 13.5000 38.5000
negative 8 63.5000 38.5000
zero 1 1.0000 1.0000
all 12 78.0000 78.0000
unadjusted variance adjustment for ties adjustment for zeros adjusted variance
162.5000 -1.6250 -0.2500 160.6250
z w pval
-1.9726 13.5000 0.0485

If one does not assign each object to a variable, the output is still readable.

signrank(group1 = fuel.mpg1, group2 = fuel.mpg2).conduct()

( sign obs sum ranks expected 0 positive 3 13.5000 38.5000 1 negative 8 63.5000 38.5000 2 zero 1 1.0000 1.0000 3 all 12 78.0000 78.0000, unadjusted variance adjustment for ties adjustment for zeros adjusted variance 0 162.5000 -1.6250 -0.2500 160.6250,

z w pval

0 -1.9726 13.5000 0.0485)

Signrank using Long Structured Datasets

desc, var_adj, res = signrank("value ~ C(mpg)", fuel2).conduct()

print(desc, var_adj, res, sep = "\n"*2)
sign obs sum ranks expected
positive 3 13.5000 38.5000
negative 8 63.5000 38.5000
zero 1 1.0000 1.0000
all 12 78.0000 78.0000
unadjusted variance adjustment for ties adjustment for zeros adjusted variance
162.5000 -1.6250 -0.2500 160.6250
z w pval
-1.9726 13.5000 0.0485

References