29 Hilarious Two sample hypothesis testing Puns

Created this for my stats class. Teaching hypothesis testing with two samples this week.

👍︎ 133

💬︎

📅︎ Apr 14 2020

How do i decide the size of the sample to run hypothesis testing.

I have a data set of about 2 Million and I need to run Wilcoxon test on it, how do i decide the size of the sample that i should I be running the test on.

If I run it on lets say 1000 size the test result seem good but as and when I increase the size of the sample the p value tends to 0 which makes no sense.

Can someone tell me how do i decide the optimum size of the sample to run test on

👍︎ 8

💬︎

👤︎ u/1046514378

📅︎ Jan 23 2022

🚨︎ report

What do I use instead of the t.test command when I am trying to run a hypothesis test on a two samples when CLT is not assumed (ie one of the sample sizes is under 40)?

👍︎ 2

💬︎

👤︎ u/bransiscolindor

📅︎ Nov 11 2021

🚨︎ report

Hiring - Hypothesis Testing for two Means (independent samples) and Hypothesis Testing of Variances Assignment

REQUIREMENTS: I am only going to hire people that will do at least 1 sample question with me (Each question has multiple parts..about 4-5 parts per question).

Do not message me asking for sample questions, and that you will get back to me later with the answer...Clearly you are going to Chegg or to another website/source to find the answer.

Please contact me with your discord if interested and I will reply within 1 business day.

👍︎ 2

💬︎

👤︎ u/Iniesta_May6-2009

📅︎ Jul 06 2020

🚨︎ report

Testing a Hypothesis: Manually running "t-Test: Paired Two Samples for Means" in Excel.

Good afternoon, Reddit!

I'm going over testing a hypothesis in my Stats class and am attempting to re-create the "t-Test: Paired Two Samples for Means" Excel function, except with actual functions instead of re-running the Excel test through the Data>Data Analysis tool. Our class places a heavy emphasis on Excel and having the formulas auto-populate when I input data in the spreadsheet would be invaluable. Unfortunately, the formulas may be beyond my reach... have any of you done something similar to this before? I am specifically struggling with recreating the formulas for Degrees of Freedom and t-Stat when your two sets of data are not equal. This is what I have so far (you'll have to scroll down to the t-Test portion).

Here are the formulas I am trying to re-create in Excel.

👍︎ 2

💬︎

👤︎ u/LightningPaladin

📅︎ May 23 2019

🚨︎ report

Practical questions about "the population" on one sample hypothesis testing

(Disclaimer: I am learning about hypothesis testing, so probably my questions are stupid and/or look confusing, but it's because I am confused, so any light on these would be highly appreciated.)

Let's say I want to test a claim about a population but I don't have a problem statement with the values of the mean, error, etc, but instead I have a data set. For example:

I work at a Campbell's factory and I have a database of 100M produced canned soups. I want to test a claim about the mean X of the population (say, the mean content of soup per can) given that I can't run my calculation on the whole data set but just on a sample (say, 100k). My questions are:

If I test a claim with my 100k sample, I would get a statement about the population. Would "the population" refer to the 100M canned soups in my DB or to the whole population of historical canned soups produced at the factory?
If I want to use a z-score, I need to know the standard deviation of the population. Guess the answer to this one relates to the previous question as it depends on the meaning of "population", but: if I could compute the stdev on the 100M DB, would that be the stdev I need? Or do I need the stdev considering every single canned soup produced at the factory since the beginning of time?
If I use a t-score (now I don't need the stdev of the population), does the situation change? I mean, would my conclusion still refer to the same "population" as in question 1?

Independently of the previous answers, let's say I have tested a claim about the population where population = every canned soup produced since the beginning of time:

Does that conclusion apply only to every soup already produced, or can I say that it applies to every soup produced and to-be-produced (i.e. can I make a statement on the whole production process and say that it will hold for future products [given all the conditions of the factory stay the same])? Mathematically I don't see how this difference is considered in the z or t tests, so that's why I'm asking.

Bonus: if you know of a good learning resource that can help me to understand this topic (or where I can find the answers to my questions), please let me know as most lectures I have found go directly to the calculations given a clear statement with well-defined parameters, but that's not what I want to read about (at least for now).

Thanks!

👍︎ 7

💬︎

👤︎ u/Silver_Book_938

📅︎ Dec 08 2021

🚨︎ report

Intuition for Hypothesis Testing: Minimum sample size given α and β (Two Tailed test)

Hi guys, I am trying to understand the intuition behind the derivation of the two tail minimum sample size proof.

Attached Image of derivation from the textbook, workings in pencil on the side. [But I have transposed it to text below in the bolded portion] https://imgur.com/a/JKnherC

In one tail, the intuition is clear cut, since it is unconstrained, to satisfy both alpha and beta, both Z and Z' axis must share a common X-bar random variable.

such that

U + Zc (sigma/sqrt(n) = U' + Z1'(sigma/sqrt(n))

Solving for n = ((Z1-zc)^2*Sigma) / (U-U')^2

In two tail case the Beta is constrained, but we have to account for the area P(X>X2) which is required to find n. [In the one tail case, this area is not excluded, and is a part of Beta]. We just need to find a value of n that satisfy the Alpha , Beta and the unknown (PX>x2) component.

The inital portion is the same, where X1 = Zc = Z1' and X2 = Zd = Z2'

∴ U + Zc (sigma/sqrt(n) = U' + Z1'(sigma/sqrt(n))

∴ U + Zd (sigma/sqrt(n) = U' + Z2'(sigma/sqrt(n))

I got stuck as to how to proceed because of the P(X>X2) term

From my textbook, the next step is this assumption that

For symmetric distribution, 2U = x1+x2.

∴ 2U = U' + U' + Z1'(sigma/sqrt(n)) + Z2'(Sigma/sqrt(n))

can any math pro explain the intuition why does he bring up 2U = X1 + X2?

If the test statistic formula is Zc or Z1' = (X1-U)/(sigma(sqrt(n))

rearranging to make X the subject, wouldn't it be

X1 = U + Zc(sigma/sqrt(n))

X1 + X2 = U + zc(sigma/sqrt(n)) + U + zd(sigma/sqrt(n))

and NOT X1 + X2 = 2U.

an explanation of this step would be greatly appreciated. Thank You.

👍︎ 3

💬︎

👤︎ u/GERDpatient

📅︎ Oct 16 2019

🚨︎ report

Two-sample hypothesis tests with Fisher's Exact Test dataanalysisclassroom.com…

👍︎ 11

💬︎

👤︎ u/realDevineni

📅︎ Jan 10 2021

🚨︎ report

The Two-Sample Hypothesis Tests in R dataanalysisclassroom.com…

👍︎ 2

💬︎

👤︎ u/realDevineni

📅︎ Mar 03 2021

🚨︎ report

Lesson 98 – The Two-Sample Hypothesis Tests using the Bootstrap dataanalysisclassroom.com…

👍︎ 2

💬︎

👤︎ u/realDevineni

📅︎ Feb 21 2021

🚨︎ report

[Question] (Q) Question about which Hypothesis test to use (wilcoxon signed-rank test vs. two sample t-test)

I would like to use a statistical test to test my hypotheses for my bachelor thesis. But I'm not sure which one to take. I have a sample size of 115 and would like to compare two data sets of the same size. These are dependent samples. The data sets are not normally distributed and this is a prerequisite for the t-test. However, the sample is so large that the test could be used due to the central limit theorem. I did both tests, and they give me different results.

Wilcoxon test: p-value=0.001927

T-test: p-value=0.1052

I am now unsure which test is more robust or delivers more significant results. In the literature, the wilcoxon test does not perform very well, but I find the result of the wilcoxon test more plausible. Does one of the tests give me a more reliable result or how can I find out which result is the right one?

👍︎ 2

💬︎

👤︎ u/Nefalius

📅︎ May 24 2020

🚨︎ report

People who tend to believe in pseudoscience are hasty when testing a hypothesis, easily satisfied with evidence, and quick to jump to conclusions, a study suggests. In two hypothesis-testing tasks, participants' belief in pseudoscience was negatively associated with time spent collecting evidence. sapienjournal.org/people-…

👍︎ 2

💬︎

👤︎ u/Uyterhoeven_Kieth_5

📅︎ Jan 16 2022

🚨︎ report

Help with the DF formula for Hypothesis Testing - difference in two means.

The DF formula from my lecture slides is ever so slightly different to what I’m seeing elsewhere, so I just wanted to know why this is. Is the formula that I’ve found online for a slightly different equation?

The formula I have from my lecture slides has n1 & n2 as the very bottom denominator: https://imgur.com/gallery/k2YzfKK

Whereas the formula’s that I have found online have n1-1 & n2-1 as the very bottom denominator: https://imgur.com/gallery/lQzQEWO

Sorry if this is a stupid question, but I appreciate the help!

👍︎ 2

💬︎

👤︎ u/Ultra1894

📅︎ Jan 07 2022

🚨︎ report

[Q] How to set hypothesis for Independent two-sample t test

My problem is: I have three poisons and 48 dead mice. I need to know which poison is the most efficient, so which set of 16 mice has a smaller average living time after the poison has been given.

Furthermore, I know that the kind of poison given influences living time because I performed an One-way Anova to find out so. I also know that variances among populations are equal (Anova basic assumption), although I don't know the value.

I'm in the case of having to perform three pairwise t tests. If for example I start comparing the first poison with the second, I'll set my hypotheses as follows:

H0 : the population mean of the lived time after poison 1 >= the population mean of ... poison 2;

H1 : the population mean of the lived time after poison 1 < the population mean of ... poison 2;

I get a t observed, i compare it with my critical value and I decide, for example to not reject H0. This means that the first poison is not more efficient than the second. Ok. However, if I previously set my hypotheses specularly:

H0 : the population mean of the lived time after poison 2 >= the population mean of ... poison 1

H1 : the population mean of the lived time after poison 2 < the population mean of ... poison 1

I would have get the same t observed (maybe with the opposite sign) and the same critical value. However, this would have meant that not rejecting H0 meant that the second poison is not more efficient than the first. These two are incoherent. If instead of not rejecting both times I rejected H0 it would have been even worse.

What am I missing? Is there any rule to set hypotheses?

Thank you all very much

👍︎ 5

💬︎

👤︎ u/FILARG

📅︎ Oct 11 2019

🚨︎ report

when your friend can’t remember what to do with variances in a two sample hypothesis test

👍︎ 30

💬︎

👤︎ u/rysterini

📅︎ Feb 10 2019

🚨︎ report

Sample size for a hypothesis testing

Hi,

I would like to know how to calculate the sample size for a hypothesis test for the difference between two means? If someone could let me know I will really appreciate,

Thanks,

👍︎ 5

💬︎

👤︎ u/LFMM78

📅︎ Aug 24 2020

🚨︎ report

Difference of Two Proportions Hypothesis Test with Weighted Sample Data

I have survey data which contain respondents' answers to several questions. As the survey contained a disproportionate number of people from certain demographic groups, the survey results are weighted by race, sex and age.

I have responses to the same questions for two years (eg. 2016 and 2017), and am trying to find out if the proportion of people who responded "yes" to a particular question has fallen. That is, I have calculated the weighted agreement rate to the question in 2016 (p1) and the weighted agreement rate to the question in 2017 (p2), and am trying to see if p1 - p2 = 0.

I think I have a good idea of how to perform the simple hypothesis test for a difference between two proportions is (described at https://onlinecourses.science.psu.edu/stat414/node/268). However, I do not have a formal background in statistics (only have some experience in introductory college courses and econometrics). Thus, I am wondering if:

Weighting the samples by demographic variables has changed the standard error; thus, a more complicated hypothesis test formula is required. If so, what is this formula?
Whether there are other methods, other than applying this possibly more complicated formula, to rigorously test for a difference between the two proportions. For instance, are there non parametric hypothesis tests that can be used?

Thanks for your help!

👍︎ 5

💬︎

👤︎ u/Moomootank

📅︎ Sep 15 2017

🚨︎ report

[Q] Need help in selecting sample size and hypothesis testing. I am a Civil Engineer so I don't know too much about it.

Let's say a study is to be done to check the effect of a drug on a variable (say BP, i.e. blood pressure) which is high in patients. The purpose is to show that the drug helps reduce the BP levels in patients.

The study was done on, say, 100 patients and their BP levels before the treatment were noted (these values are higher than normal because of their sickness). Then their BP levels are noted after the treatment with the drug. From this data I can get their mean and standard deviation before the treatment and after the treatment.

How do I approach this problem to show that there is at least 50% decrease in the BP levels after taking the drug (in the actual data the mean reduces by more than 60%). What should my null hypothesis be? And what tests should I use to test the hypothesis? And even before that If such a study has to be done how should I select the sample size for 95% confidence level and 20% relative precision, if I want to show that there is 50% decrease or more after the treatment.

It would also be helpful if you guys could suggest me some material to understand this or even better, provide a link to it.

👍︎ 5

💬︎

👤︎ u/thewhitedragonfly

📅︎ Sep 26 2019

🚨︎ report

Samples necessary for hypothesis testing?

Suppose I have n people who compete in m competitions in which they are awarded points n-1 to 0 and I calculate the average score for each person. Can I do a hypothesis test to see if a given person is statistically better than their competitors? How many people and competitions do I need to satisfy the central limit theorem?

👍︎ 2

💬︎

👤︎ u/Fullmosa

📅︎ Aug 17 2020

🚨︎ report

CFA Lvl 1 - February 2022 - Sampling, Estimation, Hypothesis Testing, and Linear Regression - Recommendation

Good Day everyone,

I am having a hard time learning Reading 5 to 7, subject names in the title.

Any recommendations on how to learn this material and make it stick? I would appreciate any external or supplemental material that has been useful for you or others.

P.S: I am using Kaplan currently as the study material (2nd time taking the Level 1 CFA exam)

👍︎ 3

💬︎

👤︎ u/Alexgon212

📅︎ Oct 09 2021

🚨︎ report

Hypothesis Testing for two models measured on the same cross validation folds

Consider the scenario where performance of two models (A and B) are measured on the same cross validation folds. To evaluate a hypothesis that the average performance of model A is better than the average performance of model B, what type of hypothesis test should you use?

Unpaired, two-sided t-test
Paired, two-sided t-test
Paired, one-sided t-test
Unpaired, one-sided t-test

👍︎ 2

💬︎

👤︎ u/Intelligent_Mail_752

📅︎ Aug 06 2021

🚨︎ report

Hypothesis test for comparing two samples based on categorical variables / proportions?

For a marketing analytics project, I have been using Student's t-tests to compare samples (~5-10m observations) to the population (~35m observations) by comparing means of continuous variables (revenue, basket size, etc.). I am doing this to make sure the samples aren't biased vis-a-vis the whole population.

I would like to understand how to apply hypothesis testing techniques to compare my sample to the population along categorical variables. E.g., does the sample have a similar distribution across states/zipcodes as the whole population? Is the sample distributed by gender similarly to the population?

I am not sure what hypothesis test to choose here, and how to apply it to comparisons across categorical variables. I am doing the analysis in R, so any code snippets would be helpful as well. Also, I would like some validation on whether I am using the t-test correctly--i.e., I would normally use to compare two distinct populations, but in this case I am using it to compare a sample and its superset. Thanks!

👍︎ 2

💬︎

👤︎ u/uberdev

📅︎ Dec 24 2013

🚨︎ report

[University Statistics] Confidence Intervals and Hypothesis Tests Based on Two Samples or Treatments

So in my textbook, the problem presented is:

In each of the following problems, the sample sizes and population proportions are given. Find the mean, variance, and standard deviation of the estimator P-hat1 - P-hat2, and compute each probability.

Problem	n1	n2	p1	p2	P()
A	645	650	0.24	0.26	P(P-hat1 - P-hat2 >= 0.045)
B	250	270	0.37	0.33	P(P-hat1 - P-hat2 <= -0.04)
C	144	156	0.87	0.86	P(-0.05 < P-hat1 - P-hat2 < 0.05)

This problem has the answers in the back so i tried to use them to make sure I was doing it right:

Mean	Variance	Standard Dev	Probability
A	-0.020	0.000579	0.0241
B	0.040	0.001751	0.0418
C	0.010	0.001557	0.0395

The mean, variance, and standard deviation were pretty simple for me to figure out, but the probability part is not make sense at all. I have never been close to figuring out how they are getting the number they get. I'm feeling really frustrated because nothing I'm trying from the book is working and this is my last assignment for the semester. If you need any more info from me, let me know too. I would just like to know how to do this.

👍︎ 2

💬︎

👤︎ u/Duud101x

📅︎ Dec 04 2015

🚨︎ report

Is it possible to use the Critical Region/tradition method for hypothesis testing with only 25 samples?

part a of a question required that I test a hypothesis using the crtiical region method, but it gave me all the relevant data and the sample size was 5738. Part b asks me what would answer questiosn as if their were only 25. is it even posible to do a critical region method with only 25 samples? It only really asks if the two tailed p value (was original a left tailed) was 0.062, would I still reject the original null hypothesis from part a at 5% level of significance.

👍︎ 3

💬︎

👤︎ u/MrDatOAP

📅︎ Oct 19 2019

🚨︎ report

[Probability/Statistics] Hypothesis testing about a sample mean

Hello, all. I'd appreciate any insight/pointers/assistance at all. I'm not quite sure of my hypotheses, and beyond that, I'm not sure how to interpret my results.

The question:
"You are being brought in as an expert witness in a class-action lawsuit - Tierney v. True Car Parts. Using your engineering background you are being asked to provide an argument as to the liability or not of True Car Parts for the design and production of their shafts used in fuel pumps currently in many automobiles from multiple manufacturers.

Shaft wear in excess of 3.50 microns could lead to catastrophic failures of a certain model fuel pump in extreme weather conditions. Engineers for the manufacturer of the shafts claim that the shaft wear is within acceptable limits. Lawyers representing a class action legal suit filed against the company feel that recent vehicle failures for vehicles with this shaft are due to faulty bearings causing abnormal wear and, thus, feel that the company should pay for the necessary vehicle repair and parts replacement.

The amount of shaft wear (in microns) after a simulated mileage of 250,000 miles was determined for each of n = 45 fuel pumps having copper lead as a bearing material, resulting in x̄ = 2.73 and s = 1.25. Use the appropriate hypothesis test at level .01 to determine if the shaft wear is within acceptable limits. Please state any assumption you have made, if necessary."

My attempt:
Assume: random sampling, normal distribution (n>30), and independence (n=<10% parent population).
Assign α = 0.01, n = 45, x̄ = 2.73, µ_0 = 3.50, s = 1.25,

H_0: µ =< 3.50

H_a: µ > 3.50

Perform one-sample t test: t = (x̄ - µ) / (s/sqrt(n)) = (2.73 - 3.50) / (1.25/sqrt(45)) = -4.132 (Is a negative number valid?)

Corresponding p-value (using a t-table from my textbook) p = .0005
p < α, thus, we reject H_0 and accept H_a.

My issue is this: because the sample mean (2.73) is so much smaller than 3.5 (4.132 standard deviations smaller, right?), why do my results tell me to reject my null hypothesis?

What am I missing, Reddit?

👍︎ 2

💬︎

👤︎ u/S-W-R

📅︎ Nov 20 2019

🚨︎ report

Two Unexpected Multiple Hypothesis Testing Problems astralcodexten.substack.c…

👍︎ 14

💬︎

👤︎ u/dwaxe

📅︎ Apr 06 2021

🚨︎ report

Are these notes on hypothesis testing my professor gave us wrong? Specifically, they're on using one sided confidence intervals for hypothesis testing. Do these two lines say the opposite of what they're supposed to say?

👍︎ 7

💬︎

👤︎ u/speakmoist9000

📅︎ May 13 2021

🚨︎ report

[Q] Bootstrap Hypothesis Test Initial Sample Size Requirement

Hey all,

I was wondering what sample size is large enough for a bootstrap hypothesis test.

Say for example I'm trying test if the means of two independent populations are not equal. The two populations are not normally distributed, or I don't want to assume that they are, so standard parametric tests cannot be used.

How do I determine what initial sample size is large enough to be able to perform a hypothesis test using a bootstrap approach? Or if my initial sample size was large enough after the bootstrap test has been done? How is the probability of making a type II error calculated in this case?

👍︎ 2

💬︎

👤︎ u/Headshot314

📅︎ Aug 29 2021

🚨︎ report

Did sexual selection shape human music? Testing predictions from the sexual selection hypothesis of music evolution using a large genetically informative sample of over 10,000 twins [musical skill and mating success seem unrelated???] sciencedirect.com/science…

👍︎ 3

💬︎

👤︎ u/Deleetdk

📅︎ Nov 27 2019

🚨︎ report