A list of puns related to "Two sample hypothesis testing"
I have a data set of about 2 Million and I need to run Wilcoxon test on it, how do i decide the size of the sample that i should I be running the test on.
If I run it on lets say 1000 size the test result seem good but as and when I increase the size of the sample the p value tends to 0 which makes no sense.
Can someone tell me how do i decide the optimum size of the sample to run test on
REQUIREMENTS: I am only going to hire people that will do at least 1 sample question with me (Each question has multiple parts..about 4-5 parts per question).
Do not message me asking for sample questions, and that you will get back to me later with the answer...Clearly you are going to Chegg or to another website/source to find the answer.
Please contact me with your discord if interested and I will reply within 1 business day.
Good afternoon, Reddit!
I'm going over testing a hypothesis in my Stats class and am attempting to re-create the "t-Test: Paired Two Samples for Means" Excel function, except with actual functions instead of re-running the Excel test through the Data>Data Analysis tool. Our class places a heavy emphasis on Excel and having the formulas auto-populate when I input data in the spreadsheet would be invaluable. Unfortunately, the formulas may be beyond my reach... have any of you done something similar to this before? I am specifically struggling with recreating the formulas for Degrees of Freedom and t-Stat when your two sets of data are not equal. This is what I have so far (you'll have to scroll down to the t-Test portion).
(Disclaimer: I am learning about hypothesis testing, so probably my questions are stupid and/or look confusing, but it's because I am confused, so any light on these would be highly appreciated.)
Let's say I want to test a claim about a population but I don't have a problem statement with the values of the mean, error, etc, but instead I have a data set. For example:
I work at a Campbell's factory and I have a database of 100M produced canned soups. I want to test a claim about the mean X of the population (say, the mean content of soup per can) given that I can't run my calculation on the whole data set but just on a sample (say, 100k). My questions are:
Independently of the previous answers, let's say I have tested a claim about the population where population = every canned soup produced since the beginning of time
:
Bonus: if you know of a good learning resource that can help me to understand this topic (or where I can find the answers to my questions), please let me know as most lectures I have found go directly to the calculations given a clear statement with well-defined parameters, but that's not what I want to read about (at least for now).
Thanks!
Hi guys, I am trying to understand the intuition behind the derivation of the two tail minimum sample size proof.
Attached Image of derivation from the textbook, workings in pencil on the side. [But I have transposed it to text below in the bolded portion] https://imgur.com/a/JKnherC
In one tail, the intuition is clear cut, since it is unconstrained, to satisfy both alpha and beta, both Z and Z' axis must share a common X-bar random variable.
such that
U + Zc (sigma/sqrt(n) = U' + Z1'(sigma/sqrt(n))
Solving for n = ((Z1-zc)^2*Sigma) / (U-U')^2
In two tail case the Beta is constrained, but we have to account for the area P(X>X2) which is required to find n. [In the one tail case, this area is not excluded, and is a part of Beta]. We just need to find a value of n that satisfy the Alpha , Beta and the unknown (PX>x2) component.
The inital portion is the same, where X1 = Zc = Z1' and X2 = Zd = Z2'
β΄ U + Zc (sigma/sqrt(n) = U' + Z1'(sigma/sqrt(n))
β΄ U + Zd (sigma/sqrt(n) = U' + Z2'(sigma/sqrt(n))
I got stuck as to how to proceed because of the P(X>X2) term
From my textbook, the next step is this assumption that
For symmetric distribution, 2U = x1+x2.
β΄ 2U = U' + U' + Z1'(sigma/sqrt(n)) + Z2'(Sigma/sqrt(n))
can any math pro explain the intuition why does he bring up 2U = X1 + X2?
If the test statistic formula is Zc or Z1' = (X1-U)/(sigma(sqrt(n))
rearranging to make X the subject, wouldn't it be
X1 = U + Zc(sigma/sqrt(n))
X1 + X2 = U + zc(sigma/sqrt(n)) + U + zd(sigma/sqrt(n))
and NOT X1 + X2 = 2U.
an explanation of this step would be greatly appreciated. Thank You.
I would like to use a statistical test to test my hypotheses for my bachelor thesis. But I'm not sure which one to take. I have a sample size of 115 and would like to compare two data sets of the same size. These are dependent samples. The data sets are not normally distributed and this is a prerequisite for the t-test. However, the sample is so large that the test could be used due to the central limit theorem. I did both tests, and they give me different results.
Wilcoxon test: p-value=0.001927
T-test: p-value=0.1052
I am now unsure which test is more robust or delivers more significant results. In the literature, the wilcoxon test does not perform very well, but I find the result of the wilcoxon test more plausible. Does one of the tests give me a more reliable result or how can I find out which result is the right one?
The DF formula from my lecture slides is ever so slightly different to what Iβm seeing elsewhere, so I just wanted to know why this is. Is the formula that Iβve found online for a slightly different equation?
The formula I have from my lecture slides has n1 & n2 as the very bottom denominator: https://imgur.com/gallery/k2YzfKK
Whereas the formulaβs that I have found online have n1-1 & n2-1 as the very bottom denominator: https://imgur.com/gallery/lQzQEWO
Sorry if this is a stupid question, but I appreciate the help!
My problem is: I have three poisons and 48 dead mice. I need to know which poison is the most efficient, so which set of 16 mice has a smaller average living time after the poison has been given.
Furthermore, I know that the kind of poison given influences living time because I performed an One-way Anova to find out so. I also know that variances among populations are equal (Anova basic assumption), although I don't know the value.
I'm in the case of having to perform three pairwise t tests. If for example I start comparing the first poison with the second, I'll set my hypotheses as follows:
H0 : the population mean of the lived time after poison 1 >= the population mean of ... poison 2;
H1 : the population mean of the lived time after poison 1 < the population mean of ... poison 2;
I get a t observed, i compare it with my critical value and I decide, for example to not reject H0. This means that the first poison is not more efficient than the second. Ok. However, if I previously set my hypotheses specularly:
H0 : the population mean of the lived time after poison 2 >= the population mean of ... poison 1
H1 : the population mean of the lived time after poison 2 < the population mean of ... poison 1
I would have get the same t observed (maybe with the opposite sign) and the same critical value. However, this would have meant that not rejecting H0 meant that the second poison is not more efficient than the first. These two are incoherent. If instead of not rejecting both times I rejected H0 it would have been even worse.
What am I missing? Is there any rule to set hypotheses?
Thank you all very much
Hi,
I would like to know how to calculate the sample size for a hypothesis test for the difference between two means? If someone could let me know I will really appreciate,
Thanks,
I have survey data which contain respondents' answers to several questions. As the survey contained a disproportionate number of people from certain demographic groups, the survey results are weighted by race, sex and age.
I have responses to the same questions for two years (eg. 2016 and 2017), and am trying to find out if the proportion of people who responded "yes" to a particular question has fallen. That is, I have calculated the weighted agreement rate to the question in 2016 (p1) and the weighted agreement rate to the question in 2017 (p2), and am trying to see if p1 - p2 = 0.
I think I have a good idea of how to perform the simple hypothesis test for a difference between two proportions is (described at https://onlinecourses.science.psu.edu/stat414/node/268). However, I do not have a formal background in statistics (only have some experience in introductory college courses and econometrics). Thus, I am wondering if:
Weighting the samples by demographic variables has changed the standard error; thus, a more complicated hypothesis test formula is required. If so, what is this formula?
Whether there are other methods, other than applying this possibly more complicated formula, to rigorously test for a difference between the two proportions. For instance, are there non parametric hypothesis tests that can be used?
Thanks for your help!
Let's say a study is to be done to check the effect of a drug on a variable (say BP, i.e. blood pressure) which is high in patients. The purpose is to show that the drug helps reduce the BP levels in patients.
The study was done on, say, 100 patients and their BP levels before the treatment were noted (these values are higher than normal because of their sickness). Then their BP levels are noted after the treatment with the drug. From this data I can get their mean and standard deviation before the treatment and after the treatment.
How do I approach this problem to show that there is at least 50% decrease in the BP levels after taking the drug (in the actual data the mean reduces by more than 60%). What should my null hypothesis be? And what tests should I use to test the hypothesis? And even before that If such a study has to be done how should I select the sample size for 95% confidence level and 20% relative precision, if I want to show that there is 50% decrease or more after the treatment.
It would also be helpful if you guys could suggest me some material to understand this or even better, provide a link to it.
Suppose I have n people who compete in m competitions in which they are awarded points n-1 to 0 and I calculate the average score for each person. Can I do a hypothesis test to see if a given person is statistically better than their competitors? How many people and competitions do I need to satisfy the central limit theorem?
Good Day everyone,
I am having a hard time learning Reading 5 to 7, subject names in the title.
Any recommendations on how to learn this material and make it stick? I would appreciate any external or supplemental material that has been useful for you or others.
P.S: I am using Kaplan currently as the study material (2nd time taking the Level 1 CFA exam)
Consider the scenario where performance of two models (A and B) are measured on the same cross validation folds. To evaluate a hypothesis that the average performance of model A is better than the average performance of model B, what type of hypothesis test should you use?
For a marketing analytics project, I have been using Student's t-tests to compare samples (~5-10m observations) to the population (~35m observations) by comparing means of continuous variables (revenue, basket size, etc.). I am doing this to make sure the samples aren't biased vis-a-vis the whole population.
I would like to understand how to apply hypothesis testing techniques to compare my sample to the population along categorical variables. E.g., does the sample have a similar distribution across states/zipcodes as the whole population? Is the sample distributed by gender similarly to the population?
I am not sure what hypothesis test to choose here, and how to apply it to comparisons across categorical variables. I am doing the analysis in R, so any code snippets would be helpful as well. Also, I would like some validation on whether I am using the t-test correctly--i.e., I would normally use to compare two distinct populations, but in this case I am using it to compare a sample and its superset. Thanks!
So in my textbook, the problem presented is:
In each of the following problems, the sample sizes and population proportions are given. Find the mean, variance, and standard deviation of the estimator P-hat1 - P-hat2, and compute each probability.
Problem | n1 | n2 | p1 | p2 | P() |
---|---|---|---|---|---|
A | 645 | 650 | 0.24 | 0.26 | P(P-hat1 - P-hat2 >= 0.045) |
B | 250 | 270 | 0.37 | 0.33 | P(P-hat1 - P-hat2 <= -0.04) |
C | 144 | 156 | 0.87 | 0.86 | P(-0.05 < P-hat1 - P-hat2 < 0.05) |
This problem has the answers in the back so i tried to use them to make sure I was doing it right:
Mean | Variance | Standard Dev | Probability |
---|---|---|---|
A | -0.020 | 0.000579 | 0.0241 |
B | 0.040 | 0.001751 | 0.0418 |
C | 0.010 | 0.001557 | 0.0395 |
The mean, variance, and standard deviation were pretty simple for me to figure out, but the probability part is not make sense at all. I have never been close to figuring out how they are getting the number they get. I'm feeling really frustrated because nothing I'm trying from the book is working and this is my last assignment for the semester. If you need any more info from me, let me know too. I would just like to know how to do this.
part a of a question required that I test a hypothesis using the crtiical region method, but it gave me all the relevant data and the sample size was 5738. Part b asks me what would answer questiosn as if their were only 25. is it even posible to do a critical region method with only 25 samples? It only really asks if the two tailed p value (was original a left tailed) was 0.062, would I still reject the original null hypothesis from part a at 5% level of significance.
Hello, all. I'd appreciate any insight/pointers/assistance at all. I'm not quite sure of my hypotheses, and beyond that, I'm not sure how to interpret my results.
The question:
"You are being brought in as an expert witness in a class-action lawsuit - Tierney v. True Car Parts. Using your engineering background you are being asked to provide an argument as to the liability or not of True Car Parts for the design and production of their shafts used in fuel pumps currently in many automobiles from multiple manufacturers.
Shaft wear in excess of 3.50 microns could lead to catastrophic failures of a certain model fuel pump in extreme weather conditions. Engineers for the manufacturer of the shafts claim that the shaft wear is within acceptable limits. Lawyers representing a class action legal suit filed against the company feel that recent vehicle failures for vehicles with this shaft are due to faulty bearings causing abnormal wear and, thus, feel that the company should pay for the necessary vehicle repair and parts replacement.
The amount of shaft wear (in microns) after a simulated mileage of 250,000 miles was determined for each of n = 45 fuel pumps having copper lead as a bearing material, resulting in xΜ = 2.73 and s = 1.25. Use the appropriate hypothesis test at level .01 to determine if the shaft wear is within acceptable limits. Please state any assumption you have made, if necessary."
My attempt:
Assume: random sampling, normal distribution (n>30), and independence (n=<10% parent population).
Assign Ξ± = 0.01, n = 45, xΜ = 2.73, Β΅_0 = 3.50, s = 1.25,
H_0: Β΅ =< 3.50
H_a: Β΅ > 3.50
Perform one-sample t test: t = (xΜ - Β΅) / (s/sqrt(n)) = (2.73 - 3.50) / (1.25/sqrt(45)) = -4.132 (Is a negative number valid?)
Corresponding p-value (using a t-table from my textbook) p = .0005
p < Ξ±, thus, we reject H_0 and accept H_a.
My issue is this: because the sample mean (2.73) is so much smaller than 3.5 (4.132 standard deviations smaller, right?), why do my results tell me to reject my null hypothesis?
What am I missing, Reddit?
Hey all,
I was wondering what sample size is large enough for a bootstrap hypothesis test.
Say for example I'm trying test if the means of two independent populations are not equal. The two populations are not normally distributed, or I don't want to assume that they are, so standard parametric tests cannot be used.
How do I determine what initial sample size is large enough to be able to perform a hypothesis test using a bootstrap approach? Or if my initial sample size was large enough after the bootstrap test has been done? How is the probability of making a type II error calculated in this case?
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.