************* ols() ************* Description =========== Conducts linear regression using the ordinary least squares approach. Parameters ========== Input ----- **ols(formula_like, data = {})** * **formula_like** : A valid formula which will parse the data into a design matrix. * **data** : The dataframe which contains the data to be analyzed. Returns ------- Returns an object with class "ols"; this object has accessible methods which are described below. ols methods ^^^^^^^^^^^^^ * **results(return_type = "Dataframe", decimals = 4, pretty_format = True, conf_level = 0.95)** * **return_type** : The type of data structure the results should be returned as. Supported options are 'Dataframe' which will return a Pandas DataFrame or 'Dictionary' which will return a dictionary. * **decimals** : The number of decimal places the data should be rounded too. * **pretty_format** : If pretty formatting should be applied. This adds extra empty spaces in the returned data structure for visualization of the results. * **conf_level** : The confidence interval desired. -results- will return 3 objects, (1) is summary information, (2) is model table, and (3) is the regression table. * **predict(estimate = None)** * **estimate** : Desired estimate. Available options are: * *"y"* or *"xb"* : Linear prediction * *"residuals"*, *"res"*, or *"r"* : Residuals * *"standardized_residuals"*, *"standardized_r"*, or *"r_std"* : Standardized residuals * *"studentized_residuals"*, *"student_r"*, or *"r_stud"* : Studentized (jackknifed) residuals * *"leverage"*, *"lev"* : Leverage of the observation (diagonal of the H matrix) See :ref:`predict` for formula information. Effect Size Measures Formulas ============================= By default, this method will return the measures of :math:`R^2`, :math:`\text{Adj. }R^2`, :math:`\eta^2`, :math:`\epsilon^2`, and :math:`\omega^2`. Please note that for the factor terms, the reported effect sizes are partial, i.e., :math:`\eta^2_p`, :math:`\epsilon^2_p`, and :math:`\omega^2_p` respectively. See Olejnik and Aligna (2000) :footcite:p:`Olejnik&Algina2000`, Kelley and Preacher (2012) :footcite:p:`Kelly&Preacher2012`, and/or Grissom and Kim (2012) :footcite:p:`Grissom&Kim2012` Eta-squared (:math:`\eta^2`) and :math:`R^2` ---------------------------------------------- .. math:: \eta^2 = \frac{\text{SS}_{model}}{\text{SS}_{total}} Adjusted :math:`R^2` --------------------- .. math:: \text{Adj. }R^2 = 1 - \frac{\text{df}_{total}}{\text{df}_{error}} * \frac{\text{SS}_{error}}{\text{SS}_{total}} Omega-squared (:math:`\omega^2`) ----------------------------------- .. math:: \omega^2 = \frac{\text{SS}_{effect} - (\text{df}_{effect} * \text{MS}_{error})}{\text{SS}_{total} + \text{MS}_{error}} Examples ======== First to load required libraries for this example. Below, an example data set will be loaded in using statsmodels.datasets; the data loaded in is a data set available through Stata called 'systolic'. .. code:: python import researchpy as rp import pandas as pd # Used to load example data # import statsmodels.datasets systolic = statsmodels.datasets.webuse('systolic') Now let's get some quick information regarding the data set. .. code:: python systolic.info() .. parsed-literal:: Int64Index: 58 entries, 0 to 57 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 drug 58 non-null int16 1 disease 58 non-null int16 2 systolic 58 non-null int16 Now to take a look at the descriptive statistics of the univariate data. The output indicates that there are no missing observations and that each variable is stored as an integer. .. code:: python rp.summarize(systolic["systolic"]) .. raw:: html
Name N Mean Median Variance SD SE 95% Conf. Interval
0 systolic 58 18.8793 21 163.862 12.8009 1.6808 [15.5135, 22.2451]
.. code:: python rp.crosstab(systolic["disease"], systolic["drug"]) .. raw:: html
Variable Outcome Count Percent
0 drug 4 16 27.59
1 2 15 25.86
2 1 15 25.86
3 3 12 20.69
4 disease 3 20 34.48
5 2 19 32.76
6 1 19 32.76
Now to fit the linear regression model, below is sample syntax. .. code:: python m = ols("systolic ~ C(drug) + C(disease) + C(drug):C(disease)", data = systolic) desc, mod, table = m.results() print(desc, mod, table, sep = "\n"*2) .. raw:: html
Number of obs = 58.0000
Root MSE = 10.5096
R-squared = 0.4560
Adj R-squared = 0.3259


Source Sum of Squares Degrees of Freedom Mean Squares F value p-value Eta squared Omega squared
Model 4259.3385 11 387.2126 3.5057 0.0013 0.456 0.3221
Residual 5080.8167 46 110.4525
Total 9340.1552 57 163.8624


systolic Coef. Std. Err. t p-value 95% Conf. Interval
Intercept 29.3333 4.2905 6.8367 0.0000 [20.6969, 37.9697]
drug
1 (reference)
2 -1.3333 6.3639 -0.2095 0.8350 [-14.1432, 11.4765]
3 -13.0000 7.4314 -1.7493 0.0869 [-27.9587, 1.9587]
4 -15.7333 6.3639 -2.4723 0.0172 [-28.5432, -2.9235]
disease
1 (reference)
2 -1.0833 6.7839 -0.1597 0.8738 [-14.7387, 12.572]
3 -8.9333 6.3639 -1.4038 0.1671 [-21.7432, 3.8765]
drug:disease
2:2 6.5833 9.7839 0.6729 0.5044 [-13.1107, 26.2774]
2:3 -0.9000 8.9999 -0.1000 0.9208 [-19.0159, 17.2159]
3:2 -10.8500 10.2435 -1.0592 0.2950 [-31.4692, 9.7692]
3:3 1.1000 10.2435 0.1074 0.9150 [-19.5192, 21.7192]
4:2 0.3167 9.3017 0.0340 0.9730 [-18.4066, 19.04]
4:3 9.5333 9.2022 1.0360 0.3056 [-8.9897, 28.0564]
References ========== .. footbibliography::