A list of puns related to "Robust Regression"
What are the limitations of using robust poisson regression for obtaining riskratio in binomial outcomes in a cohort study?
Is it considered okay? Is it considered okay for outcomes ranging from 1% to 50%?
I'm quite new to this so easy-to-follow answers would be greatfully appreciated.
Very thankful for answers.
I have used poisson regression for cohort data in order to get relative risk for a binomial outcome (I have googled this and it seems fine?). The log-binomial regression does not work for my dataset ("Error: no valid set of coefficients has been found: please supply starting values"). Perhaps this is because I need to adjust for many variables.
I have understood that in order to do this I should make the poisson regression robust, as to not get too wide confidence intervals.
How do I get robust confidence intervals (not just standard errors) for a poisson regression in r?
Very thankful for answers
I somehow managed to completely forget how the two are related. I'm trying to review material for linear probability models but am stumped by my notes on robust standard errors for this topic. An ELI5 would be much appreciated since I'm having a tough time trying the two together
I have a dataset of admissions to emergency dept's over a 2yr period. The outcome is binary and "rare-event-ish" (10% of total). We're mainly interested if certain patient characteristics predict the outcome. We have 10 variables of interest (all categorical/binary), ~20,000 patients, and ~32,000 ED admissions. Some patients have >1 admission in this time period, so a basic logistic regression an encounter level would violate the assumption of independence. Though this wasn't a major interest of ours initially, there is a site variable (11 different emergency dept's) that could be used to look at between/within sites.
I've done some MLM, though it has been years since grad school. Any tips or papers that might help is appreciated! I'm using SAS and R.
I've been thinking a little bit about least squares and how one method of getting a robust regression alternative is to use Least Absolute Deviations. One of the things that seems to be potentially problematic is that there could be multiple solutions, with the example on wikipedia given as such:
https://upload.wikimedia.org/wikipedia/en/8/89/Least_absolute_deviations_regression_method_diagram.gif
Linked from: https://en.wikipedia.org/wiki/Least_absolute_deviations
Would a fairly simple solution to this be to use a near-1 power? That is, instead of minimizing ABSOLUTE(residuals), you could instead minimize [ABSOLUTE(residuals)]^1.01 or 1.1 or 1.0001 or some other value less than 2? When you get to 2, of course, you're at least squares and no longer reducing the effect of outliers, so I'm thinking of values between 1 and 2, but most likely near 1. Is this common and I'm just not aware of what this is called? Would this be a reasonable approach to robust regression while getting unique solutions vs the LAD approach?
Hi all,
Iβm running a very simple bivariate linear regression using robust methods using the pbcor() function in the WRS2 package on R. This function provides a βrobust correlation coefficientβ equal to, say, 0.63. Iβm wondering if this could be reported as βr = 0.63β or if there is another symbol / term by which I should refer to the robust correlation coefficient?
Thanks!
Is there an R package that estimates R^2 and p-value equivalents that isn't the MASS package? Publishing in psychology and they're trained to expect p-values vs. just model fits alone.
Also, I cannot find this online but do variables need to be scaled before using them in a robust regression? When I throw scaled variables in rlm() within MASS the model won't converge, but it will if I don't scale them.
Hi everyone
I have been frequently asked by my students and colleagues what to do when the assumptions of traditional regression are violated (e.g. violation of the normality assumption and the homogeneity of variance assumption).
I have written a tutorial on Robust regression using R and StatsNotebook.
R codes and step-by-step instruction for StatsNotebook are provided.
Robust Regression tutorial using R and StatsNotebook
I would love to hear your feedbacks!
The title says it all. I think it might have to do with the fact that XTX is always full rank, but I don't know what is happening in more depths.
Thanks
Interestingly the lookback window is a very important factor when doing rolling regression for forecasting, yet I haven't really seen any robust procedure to determine optimal lookback window?
e.g. Given you have daily financial variable Y(t+n) that is being regressed on X(t)... how do you determine the lookback window?
Should it be static? dynamic? what would be the selection criteria? The one that has the highest parameter stability? Highest r-squared?
How about the window with the lowest mean squared error?
I've run a simple robust regression using fitlm and 'RobustOpts','on'. But I'm having troubles interpreting the results. The one predictor is insignificant (based on the t-test pValue) but the test of the regression model is significant (based on f-test p-value). How should I interpret this?
Thanks!!
https://preview.redd.it/eddypngv5st51.png?width=560&format=png&auto=webp&s=33a1e1f75d9b7e66360f52ef7f9cbc2b031fb0f0
https://preview.redd.it/jtpywa9u5st51.png?width=310&format=png&auto=webp&s=d92da57c0f0ca5eb5397226e0dbfc8d8c381aab6
I am wondering if anyone use technique like Weighted Least Square or robust regression in their work. How does these models stack up against tree-based model, regularized model, or other ml model?
I also posted the question in stackexchange.
I recently came across a new method for selecting independent variables / covariates for regression (purposeful selection by Bursac et al., 2008), which made me wonder what everyone in this thread may suggest for variable selection. Other available methods that I am aware of include hierarchical, forced entry, and stepwise.
With respect to conducting regressions from a medical standpoint, I would be inclined to choose variables based on biologic plausibility or support from previous literature. This reasoning may seem a bit flimsy, especially when there is limited literature or clinical knowledge on the variables I am exploring. What does everyone else think? I understand that the question is rather general. Thanks!
I have some data for which I know what the theoretical relationship is: exponential decay. I can transform that into a linear relationship (natural log transform), and I've done a simple linear regression with the transformed data to test how good the theory is. The R^2 values tell me that this exponential decay model is very useful (>98% variance), but technically I'm failing the assumption about normally distributed errors. My residuals (though small) are increasing (maybe quadraticly) with the predictor. (residuals plot here)
What estimates of the linear regression are invalidated by this violated assumption? Can I still use my estimated beta, beta confidence interval, R^2 ?
edit: here's a plot of my data and fit.
I'm doing my thesis on the effects of electromagnetic fields on humans (wont go into detail), and I have collected samples for the last 4-5 weeks.
It has been recommended to me to use the Robust Regression method when dealing with those samples, but I am not familiar with that research/statistical technique.
I would appreciate any explanation for this question, or any recommended material to further read about this subject.
Edit: Thanks to everyone for your input.
If there are real differences, could anyone explain the difference(s) in how they're calculated, in laymen's terms?
I'm working on a project in which the data has highly influential points and it looks like it would make sense to run a robust regression, something I'm unfamiliar with. After reading about it here http://www.ats.ucla.edu/stat/r/dae/rreg.htm
I'm fairly comfortable running this analysis, and I like the fact that the output looks very similar to regular OLS. In fact, I noticed the following note in the article:
> When comparing the results of a regular OLS regression and a robust regression, if the results are very different, you will most likely want to use the results from the robust regression. Large differences suggest that the model parameters are being highly influenced by outliers.
Does this imply then that if the results are similar between regular vs. robust regression, it means that regular OLS assumptions are met (setting aside homoscedasticity)? If so, why don't we just run robust regression all the time to avoid the issue of meeting OLS assumptions?
In my homework, I was given a set of data that provides two column vectors, X and Y, which are the vertical and horizontal components of a set of data point. So after plotting it, I saw that there was an outlier that muddled my regression line. The professor wants us to toss out the bad points. How can I write a loop that would locate such data points with the largest magnitude error? Can I use MATLAB's max function for this? I want to remove a point on each iteration,record the mean squared error of the fit and finally plot it.
e.g. Given you have daily time series variable Y(t+n) that is being regressed on X(t)... how do you determine the lookback window?
Should it be static? dynamic? what would be the selection criteria? The one that has the highest parameter stability? Highest r-squared?
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.