25 Hilarious Robust Regression Puns

[Question] Poisson regression (with robust SE, HC3) for obtaining riskratio in binomial outcomes in a cohort study

What are the limitations of using robust poisson regression for obtaining riskratio in binomial outcomes in a cohort study?

Is it considered okay? Is it considered okay for outcomes ranging from 1% to 50%?

I'm quite new to this so easy-to-follow answers would be greatfully appreciated.

Very thankful for answers.

👍︎ 9

💬︎

👤︎ u/Nabbus

📅︎ Aug 03 2021

🚨︎ report

Robust confidence intervals in poisson regression for binomial outcome

I have used poisson regression for cohort data in order to get relative risk for a binomial outcome (I have googled this and it seems fine?). The log-binomial regression does not work for my dataset ("Error: no valid set of coefficients has been found: please supply starting values"). Perhaps this is because I need to adjust for many variables.

I have understood that in order to do this I should make the poisson regression robust, as to not get too wide confidence intervals.

How do I get robust confidence intervals (not just standard errors) for a poisson regression in r?

Very thankful for answers

👍︎ 2

💬︎

👤︎ u/Nabbus

📅︎ Aug 03 2021

🚨︎ report

ELI5: heteroskedastic robust standard errors with regards to linear probability and logistic regression models

I somehow managed to completely forget how the two are related. I'm trying to review material for linear probability models but am stumped by my notes on robust standard errors for this topic. An ELI5 would be much appreciated since I'm having a tough time trying the two together

👍︎ 20

💬︎

👤︎ u/jbnpoc

📅︎ May 02 2021

🚨︎ report

Best model choice for data with patients having >1 observations? Multi-level logistic regression? Robust standard errors?

I have a dataset of admissions to emergency dept's over a 2yr period. The outcome is binary and "rare-event-ish" (10% of total). We're mainly interested if certain patient characteristics predict the outcome. We have 10 variables of interest (all categorical/binary), ~20,000 patients, and ~32,000 ED admissions. Some patients have >1 admission in this time period, so a basic logistic regression an encounter level would violate the assumption of independence. Though this wasn't a major interest of ours initially, there is a site variable (11 different emergency dept's) that could be used to look at between/within sites.

Would logistic regression with robust standard errors work in this case? This assumes there is some correlation within clusters (within patients) and adjusts for that.
Would multilevel logistic regression with patient admissions clustered within ED sites be more appropriate and/or rigorous? Sample size and event rate vary considerably between sites (.2%-40% for sample size and .3%-58% for event rate).
Another model not mentioned?

I've done some MLM, though it has been years since grad school. Any tips or papers that might help is appreciated! I'm using SAS and R.

👍︎ 12

💬︎

👤︎ u/DrData82

📅︎ Apr 11 2021

🚨︎ report

[D] least absolute deviations (to a power?) and robust regression

I've been thinking a little bit about least squares and how one method of getting a robust regression alternative is to use Least Absolute Deviations. One of the things that seems to be potentially problematic is that there could be multiple solutions, with the example on wikipedia given as such:

https://upload.wikimedia.org/wikipedia/en/8/89/Least_absolute_deviations_regression_method_diagram.gif

Linked from: https://en.wikipedia.org/wiki/Least_absolute_deviations

Would a fairly simple solution to this be to use a near-1 power? That is, instead of minimizing ABSOLUTE(residuals), you could instead minimize [ABSOLUTE(residuals)]^1.01 or 1.1 or 1.0001 or some other value less than 2? When you get to 2, of course, you're at least squares and no longer reducing the effect of outliers, so I'm thinking of values between 1 and 2, but most likely near 1. Is this common and I'm just not aware of what this is called? Would this be a reasonable approach to robust regression while getting unique solutions vs the LAD approach?

👍︎ 2

💬︎

👤︎ u/Michigan_Water

📅︎ Apr 21 2021

🚨︎ report

Reporting Robust Regression Coefficients

Hi all,

I’m running a very simple bivariate linear regression using robust methods using the pbcor() function in the WRS2 package on R. This function provides a “robust correlation coefficient” equal to, say, 0.63. I’m wondering if this could be reported as “r = 0.63” or if there is another symbol / term by which I should refer to the robust correlation coefficient?

Thanks!

👍︎ 5

💬︎

👤︎ u/ImproperPrior

📅︎ May 01 2021

🚨︎ report

Robust regressions

Is there an R package that estimates R^2 and p-value equivalents that isn't the MASS package? Publishing in psychology and they're trained to expect p-values vs. just model fits alone.

Also, I cannot find this online but do variables need to be scaled before using them in a robust regression? When I throw scaled variables in rlm() within MASS the model won't converge, but it will if I don't scale them.

👍︎ 2

💬︎

👤︎ u/DreadPiratePotato

📅︎ Feb 17 2021

🚨︎ report

Tutorial on Robust regression using R

Hi everyone

I have been frequently asked by my students and colleagues what to do when the assumptions of traditional regression are violated (e.g. violation of the normality assumption and the homogeneity of variance assumption).
I have written a tutorial on Robust regression using R and StatsNotebook.
R codes and step-by-step instruction for StatsNotebook are provided.

Robust Regression tutorial using R and StatsNotebook

I would love to hear your feedbacks!

👍︎ 21

💬︎

👤︎ u/statsnotebook

📅︎ Dec 09 2020

🚨︎ report

[Q] - Mathematically why is Ridge regression more robust to issues of high variance compared with the least-squares estimator?

The title says it all. I think it might have to do with the fact that XTX is always full rank, but I don't know what is happening in more depths.

Thanks

👍︎ 35

💬︎

👤︎ u/dimem16

📅︎ Oct 16 2020

🚨︎ report

Rolling regression forecasting: Procedure to determine optimal and robust lookback window? Static or dynamic length?

Interestingly the lookback window is a very important factor when doing rolling regression for forecasting, yet I haven't really seen any robust procedure to determine optimal lookback window?

e.g. Given you have daily financial variable Y(t+n) that is being regressed on X(t)... how do you determine the lookback window?

Should it be static? dynamic? what would be the selection criteria? The one that has the highest parameter stability? Highest r-squared?

How about the window with the lowest mean squared error?

👍︎ 4

💬︎

👤︎ u/extremelyblackmale

📅︎ Dec 12 2020

🚨︎ report

How can I get Robust to heteroskedaticity regressions in R?

👍︎ 2

💬︎

👤︎ u/Caperalcaparra

📅︎ Oct 15 2020

🚨︎ report

Simple (robust) regression - coefficient t-test (pValue) vs. f-test (p-value)

I've run a simple robust regression using fitlm and 'RobustOpts','on'. But I'm having troubles interpreting the results. The one predictor is insignificant (based on the t-test pValue) but the test of the regression model is significant (based on f-test p-value). How should I interpret this?

Thanks!!

https://preview.redd.it/eddypngv5st51.png?width=560&format=png&auto=webp&s=33a1e1f75d9b7e66360f52ef7f9cbc2b031fb0f0

https://preview.redd.it/jtpywa9u5st51.png?width=310&format=png&auto=webp&s=d92da57c0f0ca5eb5397226e0dbfc8d8c381aab6

👍︎ 3

💬︎

👤︎ u/tyler_oeric

📅︎ Oct 18 2020

🚨︎ report

Does anyone use or came across weighted least square or robust regression in their work?

I am wondering if anyone use technique like Weighted Least Square or robust regression in their work. How does these models stack up against tree-based model, regularized model, or other ml model?

I also posted the question in stackexchange.

https://stats.stackexchange.com/questions/470044/when-does-we-use-weighted-ls-regression-generalized-ls-regression-or-robust-re

👍︎ 2

💬︎

👤︎ u/bot_cereal

📅︎ Jun 04 2020

🚨︎ report

Robust procedure to determine "optimal" lookback period for rolling regression forecasts? old.reddit.com/r/algotrad…

👍︎ 2

💬︎

👤︎ u/StockThotz

📅︎ Jul 04 2020

🚨︎ report

What is a robust method for choosing variables for regression?

I recently came across a new method for selecting independent variables / covariates for regression (purposeful selection by Bursac et al., 2008), which made me wonder what everyone in this thread may suggest for variable selection. Other available methods that I am aware of include hierarchical, forced entry, and stepwise.

With respect to conducting regressions from a medical standpoint, I would be inclined to choose variables based on biologic plausibility or support from previous literature. This reasoning may seem a bit flimsy, especially when there is limited literature or clinical knowledge on the variables I am exploring. What does everyone else think? I understand that the question is rather general. Thanks!

👍︎ 3

💬︎

👤︎ u/Neuroguy99

📅︎ Oct 22 2017

🚨︎ report

Does estimating robust standard errors eliminate heteroscedasticity entirely in an OLS regression?

👍︎ 9

💬︎

👤︎ u/Money-Mayweather

📅︎ Mar 03 2018

🚨︎ report

GitHub - sjchoi86/choicenet: Implementation of ChoiceNet, a Robust Regression Method under Severe Noise github.com/sjchoi86/choic…

👍︎ 13

💬︎

👤︎ u/samchoi7

📅︎ Jun 18 2018

🚨︎ report

Robust Regression and Outlier Detection via Gaussian Processes bugra.github.io/work/note…

👍︎ 14

💬︎

👤︎ u/bugra

📅︎ May 11 2014

🚨︎ report

What if errors aren't N(0,sig^2)? Are linear regression estimates robust?

I have some data for which I know what the theoretical relationship is: exponential decay. I can transform that into a linear relationship (natural log transform), and I've done a simple linear regression with the transformed data to test how good the theory is. The R^2 values tell me that this exponential decay model is very useful (>98% variance), but technically I'm failing the assumption about normally distributed errors. My residuals (though small) are increasing (maybe quadraticly) with the predictor. (residuals plot here)

What estimates of the linear regression are invalidated by this violated assumption? Can I still use my estimated beta, beta confidence interval, R^2 ?

edit: here's a plot of my data and fit.

👍︎ 7

💬︎

👤︎ u/24601G

📅︎ Jan 17 2011

🚨︎ report

Robust Regression with t-Distributed Residuals austinrochford.com/posts/…

👍︎ 10

💬︎

👤︎ u/clbam8

📅︎ Mar 09 2015

🚨︎ report

ELI5: Robust Regression

I'm doing my thesis on the effects of electromagnetic fields on humans (wont go into detail), and I have collected samples for the last 4-5 weeks.
It has been recommended to me to use the Robust Regression method when dealing with those samples, but I am not familiar with that research/statistical technique.
I would appreciate any explanation for this question, or any recommended material to further read about this subject.

Edit: Thanks to everyone for your input.

👍︎ 5

💬︎

👤︎ u/lndigoChild

📅︎ May 30 2014

🚨︎ report

What's the difference between robust and generalized linear regression?

If there are real differences, could anyone explain the difference(s) in how they're calculated, in laymen's terms?

👍︎ 7

💬︎

👤︎ u/randombozo

📅︎ Dec 17 2011

🚨︎ report

Is there a situation where it is best to use regular OLS regression over robust regression?

I'm working on a project in which the data has highly influential points and it looks like it would make sense to run a robust regression, something I'm unfamiliar with. After reading about it here http://www.ats.ucla.edu/stat/r/dae/rreg.htm

I'm fairly comfortable running this analysis, and I like the fact that the output looks very similar to regular OLS. In fact, I noticed the following note in the article:

> When comparing the results of a regular OLS regression and a robust regression, if the results are very different, you will most likely want to use the results from the robust regression. Large differences suggest that the model parameters are being highly influenced by outliers.

Does this imply then that if the results are similar between regular vs. robust regression, it means that regular OLS assumptions are met (setting aside homoscedasticity)? If so, why don't we just run robust regression all the time to avoid the issue of meeting OLS assumptions?

👍︎ 7

💬︎

👤︎ u/AllezCannes

📅︎ May 26 2015

🚨︎ report

Help with MATLAB and robust regression

In my homework, I was given a set of data that provides two column vectors, X and Y, which are the vertical and horizontal components of a set of data point. So after plotting it, I saw that there was an outlier that muddled my regression line. The professor wants us to toss out the bad points. How can I write a loop that would locate such data points with the largest magnitude error? Can I use MATLAB's max function for this? I want to remove a point on each iteration,record the mean squared error of the fit and finally plot it.

👍︎ 3

💬︎

👤︎ u/yargninjapirate

📅︎ Oct 07 2011

🚨︎ report

Rolling regression forecasting: Robust procedure to determine the optimal lookback window? Static or dynamic length?

e.g. Given you have daily time series variable Y(t+n) that is being regressed on X(t)... how do you determine the lookback window?

Should it be static? dynamic? what would be the selection criteria? The one that has the highest parameter stability? Highest r-squared?

👍︎ 2

💬︎

👤︎ u/extremelyblackmale

📅︎ Dec 12 2020

🚨︎ report