31 Hilarious Clustered standard errors Puns

When to use Robust Vs Clustered Standard Error?

I kind of understand each one from a technical perspective but I still can't understand when is one better than the other or what are the main differences between each technique or why both are needed besides to solve the problem of constant variance violation.

Would someone clarify the concept with examples or point out a source to understand the difference between both and their applications?

👍︎ 12

💬︎

👤︎ u/Sinsiski

📅︎ Nov 27 2021

🚨︎ report

Clustered Standard Errors vs Standard Errors

Hi,

I am running a random effect regression with log wage being my dependent variable and year dummies, race dummies, years of education being my control variables. I am clustering over the id variable. However, I cannot make anything of the results as they are quite ambiguous. In theory, the clustered standard errors should be smaller. I would really appreciate it with someone could explain to me the rule of thumb related to variables that are needed to be looked at when making the comparisons.

Non-clustering standard errors results:

Random-effects GLS regression Number of obs = 4,360

Group variable: nr Number of groups = 545

R-sq: Obs per group:

within = 0.1625 min = 8

between = 0.1296 avg = 8.0

overall = 0.1448 max = 8

Wald chi2(10)     =	819.51

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

lwage Coef. Std. Err. z P>z [95% Conf. Interval]

d81 .1193902 .021487 5.56 0.000 .0772765 .1615039

d82 .1781901 .021487 8.29 0.000 .1360764 .2203038

d83 .2257865 .021487 10.51 0.000 .1836728 .2679001

d84 .2968181 .021487 13.81 0.000 .2547044 .3389318

d85 .3459333 .021487 16.10 0.000 .3038196 .388047

d86 .4062418 .021487 18.91 0.000 .3641281 .4483555

d87 .4730023 .021487 22.01 0.000 .4308886 .515116

educ .0770943 .009177 8.40 0.000 .0591076 .0950809

black -.1225637 .0496994 -2.47 0.014 -.2199728 -.0251546

hispan .024623 .0446744 0.55 0.582 -.0629371 .1121831

_cons .4966384 .1122718 4.42 0.000 .2765897 .7166871

sigma_u .34337144

sigma_e .35469771

rho .48377912 (fraction of variance due to u_i)

Results from clustering:

Random-effects GLS regression Number of obs = 4,360

Group variable: nr Number of groups = 545

R-sq: Obs per group:

within = 0.1625 min = 8

between = 0.1296 avg = 8.0

overall = 0.1448 max = 8

Wald chi2(10) = 494.13

corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

(Std. Err. adjusted for 545 clusters in nr)

Robust

lwage Coef. Std. Err. z P>z [95% Conf. Interval]

d81 .1193902 .0

... keep reading on reddit ➡

👍︎ 5

💬︎

👤︎ u/HammU420

📅︎ Aug 02 2021

🚨︎ report

Firm Fixed Effect and Clustered Standard Error

I run OLS and got significant results but reviewer is asking for fixed effect, When I run fixed effect, my result disappers. Now Question is Endogenity problem is there, I run GMM, he said its not convincing for Endogenity? What Should I do?

👍︎ 3

💬︎

👤︎ u/Evening_Whereas6121

📅︎ Dec 17 2020

🚨︎ report

Standard errors Clustered

Hello, I have a question regarding clustered standard errors. For my research I need to use these. I have a dataset containting observations for different firms over different year. What commands should I use for these standard clustered errors?

👍︎ 2

💬︎

👤︎ u/peanutbutterzzz

📅︎ Nov 06 2020

🚨︎ report

How to Clustered Standard Errors with Few/Small Number of Clusters?

Hi All,

I have a quick econometrics questions. What is the best solution for clustering standard errors when you have few (N < 50 or even N < 25) clusters? Is it better to use a small-cluster error adjustment matrix (I.e. HC2 or HC3)? Or is it better to bootstrap standard errors? If bootstrapping, does it matter if it is pairwise/xy or "wild?"

This is for a scattered difference in difference BTW (panel data with unit level clusters), not clustered treatment (I.e. randomization at village level) if that matters. Its for my thesis, not homewok. Advisors did no have very useful advice to this question so asking here. I'm using "wild" bootstrapped SEs for my paper now but it is taking an eternity to run the models and adjust errors (because bootstrapping is a slow process) and I'm wondering if there is a better way to do it. However, I'm not sure if there are fundemental differences across these solutions when it comes to adjusting for few clusters (sorry, I need to brush up on my quant fundementals).

Thanks in advance for any help!

👍︎ 3

💬︎

👤︎ u/LA2Oaktown

📅︎ Dec 07 2020

🚨︎ report

How to run a paired t-test with clustered standard errors

Hi,

I have a data set where people a buying a quantity of a good an a range of prices, under two different conditions. I've been instructed to compare conditions by running a paired t-test on the average demand clustering standard errors at the subject level.

Unfortunately, I'm somewhat out of my depth (probably in terms of stats and R knowledge) and would greatly appreciate some help.

In the image below, Column A is the subject ID, Column B is the condition (high and low dose), Row 1 is price, C3:M26 is the demand.

https://preview.redd.it/9yw3cqdqq6631.png?width=1766&format=png&auto=webp&s=1514c1d085df39a0d8ae62201c919361d35061c5

Thanks

👍︎ 6

💬︎

👤︎ u/mindstewpodcast

📅︎ Jun 23 2019

🚨︎ report

Do I need to bootstrap two-way clustered standard errors?

Hi all,

I have a model including a regressor generated by another model and I cluster the standard errors by firm and year. Do I still need to bootstrap my standard errors to overcome the generated regressor problem?

Many thanks!

👍︎ 2

💬︎

👤︎ u/coles_corner

📅︎ Jan 11 2019

🚨︎ report

Clustered Standard Error

Hi all,

I'm doing a regression on the effect of voter ID laws. I have been informed I need to use clustered standard errors, but frankly am way, way out of my depth on this one.

Is anyone familiar enough with the concept to help me thought it? I'm using r.

Thanks a bunch.

👍︎ 2

💬︎

👤︎ u/P4L1M1N0

📅︎ Apr 07 2019

🚨︎ report

Clustered/Robust Standard Errors in SAS

I was asked to get cluster my standard errors in SAS models. This person I am working with uses STATA and showed me the cluster command that he uses at the end of his models. My SAS/STATA translation guide is not helpful here. All I am finding online is the surveyreg procedure, which presents robust standard errrors (I am assuming robust/clustered are the same things or similar based on what I am reading). However, the surveyreg procedure is not effective when I have models with dichotomous outcome variables.

👍︎ 2

💬︎

👤︎ u/Kennyv777

📅︎ Oct 14 2016

🚨︎ report

At what level should I cluster my standard errors and what’s the intuition behind?

👍︎ 8

💬︎

👤︎ u/quinoba

📅︎ Nov 25 2021

🚨︎ report

Attempting to create a clustered bar chart with 3 variables over 3 individuals and specific standard devaitons for each variable/individual combo.

I am trying to make a bar graph with clustered bars. I measured three variables repeatedly across 3 participants. I would like to graph the average value for each participant and each variable with error bars showing the standard deviation of each, clustered by the variable name.

My data is organized as follows:

	AVG1	SD1	AVG2	SD2	AVG3	SD3
Person 1	10	2	15	2	56	7
Person 2	25	3	45	4	76	10
Person 3	30	1	35	5	23	3

So ideally I would want three clusters of bars. The first cluster would have 3 bars of height 10 25 and 30 with error bars ranging 8-12, 22-28, and 29-31 repectively. Then I would want the send and3rd variables included in the chart.

The way my data is organized, I get 6 clusters instead of 3 and I cannot adjust the error bars to reflect the differences between individuals/variables.

How can I organize my data to achieve my goal? I have all the raw data still in the same document.

I saw I am supposed to include my excel version on here but I do not know what it is. I am on a PC running Windows 10 though.

👍︎ 2

💬︎

👤︎ u/awsfhie2

📅︎ Dec 17 2021

🚨︎ report

Do you use cluster-robust standard errors in a first difference model? What about a random effects model?

Hi All,

Relatively straight forward question I could not find an answer to online. Are cluster-robust standard errors needed when analyzing panel data using a first difference model? I know you should cluster SEs at the unit level when analyzing panel data with FEs but doing so with a first difference seems wrong since the difference in the errors is unlikely to be serially correlated at the unit level. I'm presting relatively soon and don't want to get this wrong :/

I'm unlikely to use a REs model but since I'm asking about FD models.... might as well learn as much as possible from you stats geniuses!

Thanks!

👍︎ 11

💬︎

👤︎ u/LA2Oaktown

📅︎ Mar 26 2021

🚨︎ report

industry and year fixed effects with firm level standard error clustering (plm?)

Hi everyone,

I'm having problems replicating some research, where observations are firm-year level and the author has industry & time fixed effects, but standard errors clustered on firm level.

To simplify, my data looks something like this:

Firm	Year		Industry	Y	X1	X2	X3
1		2000		3			0.3	0	0	0.78
1		2001		3			0.4	0	0	0.70
2		2000		1			0.3	0	0	0.78
2		2001		1			0.3	1	0	0.78
3		2000		3			0.3	0	1	0.78
3		2001		3			0.3	0	1	0.78

I cannot add industry fixed effects by doing

FE &lt;- plm( Y ~ X1 + X2 + X3, data=panel, index = (c("Industry", Year")), model = "within", effect = "twoways")

As multiple firms have same industry in same year. I saw some recommendations of adding the industry as dummy via factor, which I've tried to implement like this:

FE &lt;- plm( Y ~ X1 + X2 + X3 + factor(Industry), data=panel, index = (c("Year")), model = "within", effect = "individual")

Is this equivalent of doing industry and time fixed effects? If so, how would I go about adding clustering of standard errors on firm level?

If my approach to the fixed effects is wrong, how could I do both the fixed effects and the clustering?

Thanks for any help! (crossposted this in /r/AskStatistics as both seem relevant)

👍︎ 7

💬︎

👤︎ u/Dafe8

📅︎ Apr 27 2020

🚨︎ report

Clustering standard errors by hand using python apithymaxim.wordpress.com…

👍︎ 9

💬︎

👤︎ u/cautiousbiker

📅︎ Aug 12 2020

🚨︎ report

My first 2 tubs are trial and error but these latest 2 tubs seem to be much more clustered. Im working on better FAE (every 2-3 hours) how do they look? reddit.com/gallery/j3f7xm

👍︎ 5

💬︎

👤︎ u/thisismyshroomac

📅︎ Oct 01 2020

🚨︎ report

Anyone know why the orange dots (tankers) are clustered offshore near NYC? Is this standard practice or pandemic-related? imgur.com/UCY0QO8

👍︎ 21

💬︎

👤︎ u/rbromblin

📅︎ Apr 15 2020

🚨︎ report

[Q] In linear regression, why is standard error high when predictors are correlated?

Practically, I can simulate and see it on my own. However, I'm looking for an analytical explanation of this phenomena.

Any reference book/pdf is also appreciated.

Secondly, does that increase the error in test set or does the error stay same? Unfortunately I came across both answers (sources of which I can't recall right now). Any pointers are appreciated.

👍︎ 12

💬︎

👤︎ u/statsIsImportant

📅︎ Jan 15 2022

🚨︎ report

Are arguemnts such as how to cluster standard errors, endogeneity etc. ultimately all arguments that cant be tested but must be argued logically?

it seems to me that when thinking of endogeneity, or what the sampling distribution of the estimator looks like (i.e. cluster at state level)- these are all just arguments you would have to justify verbally/appeal to logic and know how, and ultimately are not questions that the data can tell you for sure (since standard errors and and all involve the error term which is unobservable). is this correct?

👍︎ 4

💬︎

👤︎ u/Whynvme

📅︎ Jan 19 2020

🚨︎ report

HELP: In a regression, how to cluster standard errors two-ways using Stargazer?

Dear all,

I am running an OLS regression using Stargazer and I would like to cluster standard errors two-ways. I am not sure that I have specified the two-way cluster correctly. Could you confirm that the following is the correct way to do it?

stargazer(
  (coeftest(model1,vcovCL, cluster = ~ FundID1 + FundID2)),
  (coeftest(model2,vcovCL, cluster = ~ FundID1 + FundID2))
  ,type='latex', font.size = 'tiny'
)

The code works. Standard errors are greater when I cluster as I did ( cluster = ~ FundID1 + FundID2 ) then when clustering only with cluster = FundID1 or with only cluster = FundID2. However, I could not find anything like this online. Did I do it correctly?

Many thanks,

👍︎ 3

💬︎

👤︎ u/BARDUCO

📅︎ Dec 02 2019

🚨︎ report

"We are building a society that resembles the Hunger Games, with elite college grads clustered in a few dense gated communities doing knowledge work and whose food, energy, safety, etc. is provided by the rest of the country filled with people who only went to State school, or high school." pairagraph.com/dialogue/e…

👍︎ 621

💬︎

👤︎ u/SpringSprung33

📅︎ Jan 07 2022

🚨︎ report

In the formula used to calculate Standard error, we use the square root of the number of the sample’s size, why is that? Why not use the sample’s size number?

I just want to understand the formula and not memorize it just to solve questions.

👍︎ 9

💬︎

👤︎ u/Nursestudent195

📅︎ Jan 01 2022

🚨︎ report

Is the standard error (SE) truly a standard deviation?

Based on an actual, fun debate I had last week:

Recall standard deviation is a measure of dispersion. Is the standard error (SE) truly a standard deviation? (There is only one correct answer and three false choices)

(same LI poll here https://www.linkedin.com/posts/bionicturtle_frm-activity-6887464252769206272-pCMY)

View Poll

👍︎ 4

💬︎

👤︎ u/davidharper2

📅︎ Jan 13 2022

🚨︎ report

Clustered around the radio in snowsuits, waiting with baited breath. You cheer, your mom groans.

👍︎ 474

💬︎

👤︎ u/5_Frog_Margin

📅︎ Jan 08 2022

🚨︎ report

Some new 2022s, a red Liberty Walk GT-R, and a rather interesting Mario Standard Kart with an error from a Walmart holiday shipper! reddit.com/gallery/r55l1j

👍︎ 22

💬︎

👤︎ u/WesternMaryland236

📅︎ Nov 29 2021

🚨︎ report

Remastered V-1999 MK.II is done! Bunch of people were asking for a bottom airlock to make wreck salvaging easier. Here you go! Plus, I added bunch of cosmetics to make the sub more clustered, like a proper underwater coffin. And, now there is an entirely new section: Living Quarters (with two TVs!) reddit.com/gallery/rvxvjg

👍︎ 236

💬︎

👤︎ u/WORTOKUA

📅︎ Jan 04 2022

🚨︎ report

Just finished "Clustered: Extended Survey". I had to sit and do nothing for 60 real life minutes while the scans completed. That's flawed design.

Seriously. One scanner is 6 hours, all four scanners is 1 hour. So you deploy them all and just sit...for an entire 60 minutes, doing other things than the game. Or dick around and poke around in some caves out of curiosity for exotics. That's it.

So I sat here for an entire hour of not playing a game. To me that's deeply flawed game design, requiring you to just sit there and NOT play it. I get that the game wanted me to repair the shelters the antennas sit in from severe weather, but in the end that just ended up not really mattering. Build the four shelters for antennas, wait an entire hour.

I think it's time for me to step away from this game for a few months until changes are made. I just landed on a prospect and ...got right back in the pod and took off again because what's the point -- to unlock some more workshop items? Nah. I think I've seen enough and waited around / grinded enough. 130hours in, I've got a feel for it all.

I hope the devs see this and watch their declining player counts carefully.

👍︎ 63

💬︎

👤︎ u/Eldrake

📅︎ Jan 11 2022

🚨︎ report

Hello, noob question here: when I slice in chitubox I get an estimated time of 1-2 hours, but when I put the usb in the printe (Photon mono with anycubic standard grey resin) I get a really longer time.. why is that like this? Also I often get the error message shown in the third pic, but the 'door' reddit.com/gallery/r14tbx

👍︎ 5

💬︎

👤︎ u/Etren88

📅︎ Nov 24 2021

🚨︎ report

Getting this error message on my instrument cluster on 3 year old Audi A4. Any ideas what could be the reason?

👍︎ 6

💬︎

👤︎ u/Old_Celebration279

📅︎ Jan 11 2022

🚨︎ report

[Q] Determining the formula for standard error of coefficients in Poisson Regression?

I see that with logistic regression that the standard error can be computed as in How to compute the standard errors of a logistic regressions coefficients which amounted to taking

`[;\sqrt{(X^TVX)^{-1}};]` where V is a diagonal matrix where the diagonal entries was probability of being in class A, `[;\pi_a * (1-\pi_a);]`

___

Looking at the same for linear regression (based on my understanding of Standard errors for multiple regression coefficients? ) we can compute the standard error of the coefficients by

`[;\sqrt{\sigma^2(X^TVX)^{-1}};]`

where s is the variance of the residuals (as per my understanding of

___

From the above I have 2 questions:

It seems like from the above we are using more or less the same form (square root of the inverse of something). Am I on to something? How do we determine that "something"? In logistic regression it was V, a diagonal matrix, and in linear regression it is the variance of the residuals). It seems like we're encompassing a notion of "how wrong" our prediction is compared to some label.
How might I derive the same for a Poisson regression?

___

Normally I'd just use R or statsmodels, but I'm building a custom library for encrypted ML/stats and I need to build all of this from scratch

👍︎ 24

💬︎

👤︎ u/iamquah

📅︎ Nov 14 2021

🚨︎ report

How to Clustered Standard Errors with Few/Small Number of Clusters?

Hi All,

Full disclosure, this is a crosspost from r/AskEconomics

I have a quick econometrics questions. What is the best solution for clustering standard errors when you have few (N < 50 or even N < 25) clusters? Is it better to use a small-cluster error adjustment matrix (I.e. HC2 or HC3)? Or is it better to bootstrap standard errors? If bootstrapping, does it matter if it is pairwise/xy or "wild?"

This is for a scattered difference in difference BTW (panel data with unit level clusters), not clustered treatment (I.e. randomization at village level) if that matters. Its for my thesis, not homewok. Advisors did no have very useful advice to this question so asking here. I'm using "wild" bootstrapped SEs for my paper now but it is taking an eternity to run the models and adjust errors (because bootstrapping is a slow process) and I'm wondering if there is a better way to do it. However, I'm not sure if there are fundemental differences across these solutions when it comes to adjusting for few clusters (sorry, I need to brush up on my quant fundementals).

Thanks in advance for any help!

👍︎ 3

💬︎

👤︎ u/LA2Oaktown

📅︎ Dec 07 2020

🚨︎ report

industry and year fixed effects with firm level standard error clustering (r /plm?)

Hi everyone,

I'm having problems replicating some research, where observations are firm-year level and the author has industry & time fixed effects, but standard errors clustered on firm level.

To simplify, my data looks something like this:

Firm	Year		Industry	Y	X1	X2	X3
1		2000		3			0.3	0	0	0.78
1		2001		3			0.4	0	0	0.70
2		2000		1			0.3	0	0	0.78
2		2001		1			0.3	1	0	0.78
3		2000		3			0.3	0	1	0.78
3		2001		3			0.3	0	1	0.78

I cannot add industry fixed effects by doing

FE &lt;- plm( Y ~ X1 + X2 + X3, data=panel, index = (c("Industry", Year")), model = "within", effect = "twoways")

As multiple firms have same industry in same year. I saw some recommendations of adding the industry as dummy via factor, which I've tried to implement like this:

FE &lt;- plm( Y ~ X1 + X2 + X3 + factor(Industry), data=panel, index = (c("Year")), model = "within", effect = "individual")

Is this equivalent of doing industry and time fixed effects? If so, how would I go about adding clustering of standard errors on firm level?

If my approach to the fixed effects is wrong, how could I do both the fixed effects and the clustering?

Thanks for any help! (crossposted this in /r/rstats as both seem relevant)

👍︎ 4

💬︎

👤︎ u/Dafe8

📅︎ Apr 27 2020

🚨︎ report