[introductory statistics] help me understand how p-values, goodness-of-fit, chi squared, pearson's test, confidence intervals can be used to turn frequentist observation into a probability?

I have three examples here:

  1. with a die thrown about 6 times you would expect an average of 1,1,1,1,1,1 and thrown 6,000 times you expect (1000, 1000, 1000, 1000 , 1000 , 1000) for the counts.
  2. With a count of 1,000 numbers that are made through a process that yields benford's law you expect counts of 301, 176, 125, 97, 79, 67, 58, 51, 46.
  3. With a normal distribution of heights based on two genders you expect a figure like this

There are 3 different processes involved here. Now suppose we change the counts to get an unlikely result, which we want to use to reject the null hypothesis that heights follow a mix of two normal distributions with a mean male height of 5 feet 9 inches and mean female height of 5 foot 4 inches with normal distributions and respective standard deviations of 4 inches and 3.5 inches.

For example I make these changes to the data:

  1. For the die throws, instead of 1000,1000,1000,1000 I add 200 to the first one so it's 1200,1000,1000,1000,1000,1000.

  2. For the Benford distribution I add 200 to the second group to go from 301, 176 376, 125, 97, 79, 67, 58, 51, 46.

  3. For the heights distribution I add a 20-foot tall dude to the mix, like this: https://imgur.com/a/skNlRDF

I think each of these three changes is enough for me to reject the respective null hypothesis. (In the first one I reject the hypothesis that it came from a fair die, in the second one I reject the hypothesis that it came from 2 normal distributions with the listed mean and standard deviation, and in the third one I reject the hypothesis that it came from a process that exactly follows benford's law).

Could you walk me through how I would actually get the exact pvalues in the three different cases? They are three different distributions and three different probabilities.

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/mathsizfun3
πŸ“…︎ Feb 04 2021
🚨︎ report
Are there any assumptions for Fisher's exact test that aren't shared by Pearson's 2x2 Chi-squared test?

Besides requiring more computational power for Fisher's exact test, isn't it just a more superior version of the Chi-squared test?

πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/PainIntoPower
πŸ“…︎ Sep 01 2018
🚨︎ report
How can I test whether a die is fair? - Pearson's chi-squared test explained very nicely. rpg.stackexchange.com/a/7…
πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/idanh
πŸ“…︎ Jun 16 2017
🚨︎ report
Need help finding an appropriate statistical test similar to Pearson's Chi Squared test.

So Pearsons Chi Squared Test tests whether differences in observations between categories (say A and B) occurs by chance. The way I interpret one of the assumptions of the test is that A and B must be drawn from the same sample or samples of the same size.

I need to do a similar test, but the samples that A and B are drawn from are different sizes (they are both drawn randomly, so there shouldn't be sampling bias). A and B can't be observed at the same time because of the technique. My initial thought is that I can use Fisher's Exact Test, but I don't know how this would be set up (not that familiar with the exact test families). Or is there a completely different test or adjustment for the Pearson's X2 test?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/icantfindadangsn
πŸ“…︎ Mar 23 2017
🚨︎ report
Need help with chi square, pearson's correlation coefficient, t Student's test...

I have an exercise where I have to say if the statistic method used is the correct one and why, but I don't really get when I have to use one or another. These are the exercises:

https://imgur.com/gallery/BOHhDzn

https://imgur.com/gallery/RaFLmqA

https://imgur.com/gallery/lrjCoKX

Any help is greatly appreciated! πŸ™

πŸ‘︎ 11
πŸ’¬︎
πŸ‘€︎ u/Papayemo
πŸ“…︎ Mar 03 2021
🚨︎ report
Chi-squared test of contingency. What information should be included?

Trying to get a better understanding of how chi-squared tests are reported (what information should be included). I understand how to compute a chi-squared test (a hypothesis should be written, then calculate an appropriate test statistic, find the p-value, then reject or fail to reject the null hypothesis, an state the results). But generally speaking what information should absolutely be included when reporting the results? Aside from addressing the hypothesis, what other information is expected to be included?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/arcaresenal
πŸ“…︎ Nov 05 2021
🚨︎ report
Pearson's Chi square or fisher's exact test ?

I have a data set of 176 cases and need to examine the significance of an intervention which has two discrete outcomes. So I run the SPSS, and get this result where Pearson chi-square is 0.049, continuity correction is 0.075 and fisher's exact test is 0.056 (2-sided) & 0.038 (1 sided). And no cells have expected counts less than 5. So do I take the Pearson chi-square and report the test results as significant or do I take the continuity correction and Fischer's exact test and report as not significant. From a biomedical point of view what would be the ideal p-value to be reported here-?Pearson chi-square ? continuity correction ? fisher's exact test.

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/valium1986
πŸ“…︎ Oct 29 2019
🚨︎ report
How often do you guys use Ttest, chi-squared, linear regression, or pearson correlation in either the analysis or presentation of data?

How often do you guys use Ttest, chi-squared, linear regression, or pearson correlation in either the analysis or presentation of data (MBB, Big 4, Boutiques)?

If you have a Phd or masters in either statistics or comp sci, dont answer since you probably use these all the time.

πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/serebros
πŸ“…︎ Feb 12 2016
🚨︎ report
Hello! In need of advice regarding violated assumption of Pearsons Chi Square Test. SPSS.

Hello! I am wondering what the correct course of action is when the assumption for a Chi Square test, that less than 20% of the cells in a cross-tabulation should have an expected count less than 5, is broken. I am using SPSS.

I have found conflicting answers to what to do next. Can I simply use the likelihood ratio that is listed below the Chi Square test instead? If so, can I also use the Phi or Cramer's V tests (for measuring the strength of the association) that derived from the data that violated the assumption in the first place? If not, how can I measure the strength of the association?

Also, I am confused as to the significance of the degrees of freedom and how they relate to the critical value. Is the critical value simply the level of significance, aka 0.05? The degrees of freedom is 6 (for both chi square and likelihood ratio), how does that change my findings? Should the degrees of freedom somehow be included in how I interpret or report the data?

Hope someone can help! Thanks :)

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/IamNowUpsideDown
πŸ“…︎ Apr 24 2018
🚨︎ report
Chi-Squared Test Statistics on FRM level 1 exam?

Hello all, I'm currently studying the Quantitative Analysis chapter "Hypothesis Testing". I understand z and t tests, but will there be any other tests such as Chi-Squared and F-Test on the FRM level 1 exam?

πŸ‘︎ 4
πŸ’¬︎
πŸ“…︎ Aug 20 2021
🚨︎ report
Variable order according to Pearson's chi squared

Hello guys, i'm doing segmentation of the population of Portugal for my master thesis on GIS and I've came across this phase in a segmentation paper:

"Graphical descriptions of the 12 clusters according to the original variables. On the graph of each variable, the cluster sections and remaining sections were compared. The bar representing the frequency of each modality has been given a more intense colour when the frequency in the sections belonging to the cluster is greater than that of those that don't belong to the cluster, to make it easier to understand. In this respect, the order of variables has been determined according to Pearson's Ξ§^2 statistical test, used to measure homogeneity."

After some research I wasn't able to find (or understand) the method for knowing the order of importance that each variable had to the clustering (if that makes sense).

Statistics isn't my field so if someone can point me in the right direction or advice some works i'd appreciate it, thanks!

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/StingerOo
πŸ“…︎ Jun 26 2018
🚨︎ report
[Q] There’s a one in four chance that five chi squared tests give at least two positive associations. Is this correct?

I am researching the effect of an educational policy. A piece of research I encountered measured the effect of that policy on 5 different variables using chi squared test for independence at the 5% level. Two of these tests can back as significant.

A binomial calculation suggests that if you run 5 tests at the 5% level, the probability of 2 or more coming back as significant by random chance is about 1 in 4.

Is this a legitimate criticism of the statistics? Is it safe to accept that this policy was effective in 2 out of 5 metrics?

Thanks in advance.

πŸ‘︎ 9
πŸ’¬︎
πŸ“…︎ Jul 19 2021
🚨︎ report
[Q] Chi-Squared Test with Limited Variation

I am doing a Pearson’s Chi-Squared Test to test to see whether or not the upload speeds of the new router compared to my old router. However, the issue is that there is usually limited variation in cases like this, because data speeds are also determinant on things like your internet package and etc. I am getting an average of 21.5 Mbps on my old router but my observed values are hovering around 25.8 Mbps, but the probability is like 20%. Is there any way to augment the test to account for situations that may have less variation than others? Thank you.

πŸ‘︎ 5
πŸ’¬︎
πŸ“…︎ Jun 22 2021
🚨︎ report
Can I ignore the values of a chi squared table/test if my expected and observed values are 0?

This is for a maths IA where I'm calculating chi squared but my expected and observed values are 0. in order to calculate chi squared, i'd need to divide by the expected values (0) but this is obviously not possible. would ignoring the these values be justifiable?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Toehaha
πŸ“…︎ Jun 28 2021
🚨︎ report
Hey r/statistics, I have a few questions about using Pearson's chi-square test for goodness of fit.
  1. What is the recommended minimum total sample size for the distribution being tested? I see most of the examples online have a sample size of 100, but I wasn't sure if fewer samples could be used (the sample size of the population I'm testing is 51, and the "expected/control" sample size is 43).

  2. What is the recommended minimum sample size of each category or class being tested? For example, when breaking down those two populations into the available categories for distribution, some are fewer than five to a category. I have heard that five is a good cutoff, in which case I would lump categories together appropriately to increase categorical sample size.

  3. If either of the first two questions cause the chi-square test to be invalidated, particularly sample size, what test or approach should I look into?

Thanks for any feedback.

Edit:

Here is a text file with some raw output results using chi-squared goodness-of-fit.

So we have this cohort of tumors that also have samples of the associated normal tissue with them. For this particular analysis, I have divided these normal-tissue/tumor pairs into two groups: those that had the expression of a particular protein increase from the normal tissue to the tumor, and those that had the expression of the same protein decrease. From there, I'm looking at the distribution of other categorical characteristics based on these two sub-cohorts and trying to find any distinctions.

As for the text file, these sub-cohorts are indicated by the "Up" and "Down" in the column headers, and the different categories are the row headers. The numbers are the "counts" of each category in the cohort. For the analyses that happen to have cells with fewer than "5" in a category, a separate "truncated" analysis was conducted.

If anyone actually takes the time to get their head wrapped around it and has some suggestions for further statistical analysis (other, possibly better and/or more robust, tests in particular), I would greatly appreciate the input. Thanks again.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/tdyo
πŸ“…︎ Jun 07 2011
🚨︎ report
[Question] Can you measure interaction in a Chi-squared test of independence?

I have statistical data where I want to look at differences of frequencies with a Chi-squared test of independence. My main hypothesis includes 2 independent variables (gender, marital status), and I want to add another variable (country of origin) and look at its effect.

If this were a continuous variable I would use ANOVA and check for a statistically significant interaction, but since I am looking at frequencies I use Chi-squared. Is there a way to measure interaction in Chi-squared tests? Or should I use another test?

Thanks

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Mr-Jab
πŸ“…︎ Jun 18 2021
🚨︎ report
[Research] Can we perform chi-squared goodness of fit tests on LASSO or ridge regression estimates?

In the examples I have seen in the literature the chi-squared goodness of fit test is usually used on least-squares estimates.

The use of this test seems to hinge on the observation that the least-squares estimate vector will, by definition, minimize the sum of squares of the residuals for each feature.

I believe shrinkage estimators like LASSO and ridge regression by definition do not always reach the minimum SSE (LASSO doesn't even have a closed form solution and is the solution of an optimization problem), but their objectives are still to minimize the sum of squared errors subject to their typical corresponding constraints on the norm of the estimate vector.

Therefore does it still make sense to perform a chi-squared goodness of fit test on a LASSO or ridge regression estimate? Or should another test be used?

There is a small discussion here, but they don't quite seem to come to an answer specifically.

πŸ‘︎ 52
πŸ’¬︎
πŸ‘€︎ u/steezytang
πŸ“…︎ Jan 11 2021
🚨︎ report
AP Stats. How to distinguish between chi-squared test for homogeneity v independence??
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/peppermintyco
πŸ“…︎ Apr 22 2021
🚨︎ report
[Q] Is the Chi-Squared test appropriate to determine statistical significance for grouped continuous data?

For instance, if I wanted to find out if the age distribution of people at an event was reflective of population age distribution, could I group people by age (eg 0-4, 5-9, 10-14 etc) and then perform a chi-squared test based on the number of people in each age category? If not, what would be a better statistical significance test for this kind of application?

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/Flyinghigh11111
πŸ“…︎ Apr 28 2021
🚨︎ report
[Research] T-test vs Chi Squared

I’m doing an undergrad research project on vaccination rates of certain communities. I’m unsure if I can use a t-test for this since it’s more or less binary data (Y/N).

Would a chi squared test be more appropriate?

Essentially, I would take vaccination rates of Population 1 and 2 and see if there is a statistically significant difference.

Sorry if this is a bad question I’m a post bac student and haven’t taken statistics in over 8 years.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/MyWifeJustLeftMe
πŸ“…︎ Mar 18 2021
🚨︎ report
A light-weight package that helps you run Chi-Squared tests in JavaScript

Link to the implementation: https://github.com/neeraj3029/chi-sq-test

One can run Chi-Squared goodness-of-fit tests on numerical data, or even see chi-squared statistics to check independence among two datasets with this package. 100% JavaScript.

Feel free to write issues in case of any questions/suggestions, or maybe share/star the repo so that it reaches more people!

πŸ‘︎ 2
πŸ’¬︎
πŸ“…︎ May 23 2021
🚨︎ report
Help on how to do a Chi Squared Test? (Simple)

So I would like to do a Chi Square test on the data set I have below:

Group Did not have visit Had visit
1 50 10
2 40 20
3 30 80
4 10 100
5 2 150

I want to see how significant the difference is for each group, but not sure how to go about it. My experience with r is very limited, so apologies if I'm missing information or not explaining correctly. Appreciate any help I could get.

πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/CodeRed1234
πŸ“…︎ Mar 01 2021
🚨︎ report
Which 2x2 correlation test to use - Chi Squared or McNemar's?

Cross-posting this from r/statistics, I've realized that this sub might be more appropriate!

I have a research data set that consists of a number of videos, which we've manually coded using Themes and Sentiments, i.e. "Positive", "Advertisement", "Authority (gov. official, medical professional, etc). Each video can have any number of themes and sentiments coded to it.

We then, for each theme, did a pairwise, binary comparison with each other theme, to see if a theme has correlative value with another theme. In different terms, we took the set of videos that match a theme and the set of videos that do not match a theme, and paired them with the same binary sets on a different theme. Here's a made-up example matrix for one of these comparisons:

Authority Authority'
Positive 20 60
Positive' 30 260

My understanding is that we can find either the Phi Coeffecient, or the McNemar's Test Statistic to compare this data. I'm confused on which to use. My understanding is that:

Phi Coefficient is appropriate for unpaired, nominal data.

McNemar's is appropriate for paired, nominal, dichotomous data.

The comparisons above are nominal and dichotomous, but are they considered paired data? I'm having trouble reasoning about why they might considered be paired or unpaired.

As far as results go, McNemar's is giving results that match our hypothesis -- for instance, "Positive" and "Advertisement" show strong, positive correlation. When we take the phi coefficient, we are getting much weaker results (values closer to zero). The direction of the correlation is not different between any of our sets for the McNemar's and Chi Squared (Phi Coefficient) test.

p.s. McNemar's gives a directionless value from 0-1, and Chi Squared gives a directional value from -1 to 1, but we were able to multiply the McNemar's result by a direction derived from our results to show positive or negative correlation. I can explain how we did that if that's relevant to any answers here.

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/lamarf
πŸ“…︎ Apr 23 2021
🚨︎ report
Chi-Squared Goodness of Fit Tests

Hi All,

I have some questions regarding chi-squared goodness of fit tests, and the use of the quantiles for x^2 distributions table. I have a question that uses chi-squared goodness of fit tests. The test statistic value is 10.42 and the calculated degrees of freedom value is also 10.

From the question I can see that I look up the 10th row in the quantiles for x^2 distributions table to find which quantile my test statistic falls between. My question is, what drives the look up against the quantiles for x^2 distributions table? Is it the degrees of freedom value, or the test statistic value? As both values are 10 or 10.42, this is not obvious from the question?

I also don't see what the relevance is for calculating the degrees of freedom. This suggests to me that something is lacking from my understanding... :-) What am I missing here Reddit?

Thanks as always!

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/lonza-cat
πŸ“…︎ Apr 05 2021
🚨︎ report
Brazilian teaching about chi-squared test. Not sure how this will sound for foreigners but let's see~ [soft spoken][clicking noises][male][8:44] youtube.com/watch?v=Z-veQ…
πŸ‘︎ 20
πŸ’¬︎
πŸ‘€︎ u/mooniech1ld
πŸ“…︎ Feb 26 2021
🚨︎ report
Chi-squared test help

I'm doing some statistical research and I have two variables (young VS old couples) and 5 independent categories (5 levels of compatibility).

In my paper, I used the chi-square goodness of fit test to see how the observed vs expected (from previously known data) frequencies varied and was there a significant difference

Should I have used a chi square for independence to see is there a relationship? Or does it make sense I used both

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/mmmmiiiaaaaa
πŸ“…︎ Apr 11 2021
🚨︎ report
Urgent question about chi squared test

What happens if the chi squared value is greater than the critical value? Will the null hypothesis be accepted or the alternative hypothesis?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Fjdiejdxn3939
πŸ“…︎ Mar 01 2021
🚨︎ report
[University Psychology: Statistics] Need help with chi square, pearson's correlation coefficient, t Student's test...

I have an exercise where I have to say if the statistic method used is the correct one and why, but I don't really get when I have to use one or another. These are the exercises:

https://imgur.com/gallery/BOHhDzn

https://imgur.com/gallery/RaFLmqA

https://imgur.com/gallery/lrjCoKX

Any help is greatly appreciated! πŸ™

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Papayemo
πŸ“…︎ Mar 03 2021
🚨︎ report
[Q] Which 2x2 correlation test to use - Chi Squared or McNemar's?

I have a research data set that consists of a number of videos, which we've manually coded using Themes and Sentiments, i.e. "Positive", "Advertisement", "Authority (gov. official, medical professional, etc). Each video can have any number of themes and sentiments coded to it.

We then, for each theme, did a pairwise, binary comparison with each other theme, to see if a theme has correlative value with another theme. In different terms, we took the set of videos that match a theme and the set of videos that do not match a theme, and paired them with the same binary sets on a different theme. Here's a made-up example matrix for one of these comparisons:

Authority Authority'
Positive 20 60
Positive' 30 240

My understanding is that we can find either the Phi Coeffecient, or the McNemar's Test Statistic to compare this data. I'm confused on which to use. My understanding is that:

Phi Coefficient is appropriate for unpaired, nominal data.

McNemar's is appropriate for paired, nominal, dichotomous data.

The comparisons above are nominal and dichotomous, but are they considered paired data? I'm having trouble reasoning about why they might considered be paired or unpaired.

As far as results go, McNemar's is giving results that match our hypothesis -- for instance, "Positive" and "Advertisement" show strong, positive correlation. When we take the phi coefficient, we are getting much weaker results (values closer to zero). The direction of the correlation is not different between any of our sets for the McNemar's and Chi Squared (Phi Coefficient) test.

p.s. McNemar's gives a directionless value from 0-1, and Chi Squared gives a directional value from -1 to 1, but we were able to multiply the McNemar's result by a direction derived from our results to show positive or negative correlation. I can explain how we did that if that's relevant to any answers here.

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/lamarf
πŸ“…︎ Apr 22 2021
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.