For my senior project I am using machine learning to determine which variables are most likely to impact GPA. This is 100% anonymous and all data will be dumped after analysis. Thanks! Any major can be selected if yours isn't listed. (DM for code to verify) witsurvey.openwit.tech

👍︎ 56

💬︎

👤︎ u/jimjim975

📅︎ Jul 16 2021

🚨︎ report

Living in an area with high air pollution worsens your memory to the same extent as ageing ten years. The analysis was adjusted for a large number of other variables that could affect people’s memory, including people’s age, health, level of education airqualitynews.com/2019/1…

👍︎ 77

💬︎

👤︎ u/Wagamaga

📅︎ Oct 16 2019

🚨︎ report

[Andruzzi] I’ve tried and tested over 3,000 variables to predict quarterback success from collegiate QBs going into the NFL. (No tracking data unfortunately) Used box scores, play-by-play, combine data and text analysis from scouting reports using 3 main models on held out 2019-2020 QBs twitter.com/j_druzzi/stat…

👍︎ 55

💬︎

👤︎ u/SerShanksALot

📅︎ Apr 16 2021

🚨︎ report

Thesis help needed: Justifying a novel research proposal involving moderation analysis where there is no existing literature on the relationship between two of the variables.

Hi all.

Trigger warning for suicide mentioned as variables in research.

I'm in my first thesis research unit for my honours degree in psychology. I have been allocated my topic and supervisor, and don't have flexibility in the variables I mentioned, but how I fashion the variables is up to me.

My initial research question looked at the impact of suicide literacy and secrecy on perceived stigma in those bereaved by suicide.

There is existing literature on the relationship between literacy and stigma (it decreases stigma), and existing literacy on secrecy and stigma (it increases stigma), so seeing how they overlap I thought would be really interesting and literature points in that direction.

I went to break it down and "make it simple" to see how/why it's important and find the "so what" factors, and arrived at these points:

stigma > increased suicidality and mental health problems
secrecy > increases stigma
literacy > decreases stigma

This led me to think, okay well if secrecy leads to an increase in stigma, and literacy leads to a decrease in stigma, I wonder how literacy might moderate the relationship between secrecy and stigma. Maybe increasing suicide literacy will reduce the impact of secrecy on perceived stigma?

So I arrived at a second research question option, which is to look at the effect of suicide literacy on the relationship between secrecy and self-stigma in those bereaved by suicide. The only problem being there is no existing literature that links suicide literacy to secrecy that I can find. It would be completely new research, which makes things very difficult when it comes to providing evidence as to why it's important.

My supervisor has explained that there being a gap isn't enough to justify undertaking the research, nor does the fact I can justify it myself.

At the moment I'd like to pursue the second research question, perhaps the way around it is to combine the two justifications: There's a bunch of research on literacy and stigma, a bunch of research on secrecy and stigma. With that in mind, alongside the gap in literature between literacy and secrecy, it's worthwhile study. But in the end it feels more like, it COULD be important so maybe we should look at it.

Can anyone point me in a direction that might help me justify why this research is important, aside from the gap in literature? Or suggest reasons why a gap in literature IS enough of a reason? I'm floundering a bit with this.

Thanks so

... keep reading on reddit ➡

👍︎ 27

💬︎

👤︎ u/rplct

📅︎ Jan 23 2021

🚨︎ report

Question on what analysis I should proceed with multiple independent variables and dependent variables they all share

Hey everyone, so im trying to compare the dependent variables of several independent variables between one another. For example Groups A), B) C) D) all participated in a test that examines academic performance of academic categories E) F) G) H) I) F) and K). How do I evaluate the differences between each groups scores in each academic category in a test? Im assuming an ANOVA or a MANOVA test but im not entirely sure which one and what type.

Thanks in advance guys! My brain melts on anything stats related and im trying to get my head around it.

👍︎ 2

💬︎

👤︎ u/kingkobrakiller

📅︎ Jun 07 2021

🚨︎ report

[Q] Problem of too many levels of categorical variable in regression analysis

I have conducted an experiment in which participants experienced a frustrating task, i.e., they were systematically prevented from achieving their goal. A central research question is whether or not facial expressions of an individual can predict whether or not the individual participant experienced frustration.

Facial expressions can be coded as Action Units. Subjective ratings of frustration were done post-hoc by the participants themselves. For the sake of my statistics questions, let us assume frustration induction was successful, and subjective rating and action unit-coding reliable.

I would like to show that

Action units (AUs) shown are not distributed evenly, i.e., some AUs are shown much more often than others.
AUs are not evenly distributed across subjects, i.e., individuals deviate significantly from the mean for this AU across subjects.
Given self-reported frustration, the count of AUs shows changes for every participant, but not which AUs are shown.

A regression framework seems appropriate, especially HLM (since AUs as well as frustration ratings are specific for individuals). As predicted variable, we can use the count of an AU. Predictors would be subject, frustration rating, and AU. The null hypothesis is that knowing subject/frustration rating/AU do not help predicting the count, which I would like to reject.

The issue: There are 89 action units. Treating AUs naively as categories in a regression framework is out of the question. The number of AUs can be brought down to roughly 40-50 when excluding very low frequency AUs, and doing overall dimension reduction (PCA, clustering). The underlying problems are a) the great variance in AUs shown across participants, and b) the importance of an AU even if only shown very briefly and rarely.

The question: How do I deal with the many levels of the categorical data? Is there any way to make the regression approach work?

👍︎ 13

💬︎

👤︎ u/guayuba

📅︎ Dec 20 2020

🚨︎ report

[Discussion] Analysis of significance of independent variables for unequal dependent group sizes

I'm analyzing a dataset for academic research, but can't seem to find a proper technique / set of techniques. First off, I will describe my data:

The data:

Questionnaire replies, divided into five groups of varying skill level in a hobby
Skill level is the dependent variable, but sample sizes are unequal: lowest skill level and highest skill level have the least answers, mid-levels have most answers (close to normal distribution for skill level)
Questions were multiple-choice, but in an unconventional manner. Respondents could choose as many options (at least one) as they felt applied to them. So it is nominal/categorical data.
I have coded the answers in my data so that each option is it's own column, and has value of 0/1 based on if the option was chosen or not. (Of course, this procedure will be done for each question separately, and each question will be analyzed separately.)
(Optional: I am wondering if the data should be normalized so that each answer would "weigh" an equal amount. For example: if a person chose 4 options, the columns would get a value of 1/4=0.2, if they chose only one option, the column would get a value of 1/1=1.)
After shaping the answers into different columns, the data has categorical dependent variables and categorical independent variables (or continuous in a way if normalized)

The problem:

I would like to find out if some factors correlate with skill level, and which factors are the most significant. This is done to explore the data and find out trends with skill level. However, the data being nominal, and having varying group sizes, I believe non-parametric techniques would be the only ones that can be applied. So the criteria for an analysis technique would be:

Categorical dependent variables, 5 levels
Categorical / continuous independent variables, >5 independent variables
Can take varying sample sizes for dependent variable
Can be applied with non-parametric data
Describes significance of independent variables
It may be possible that the correlation is not linear (might not be relevant)

If you can suggest a technique or a set of techniques that would help me tackle the problem, it would be greatly appreciated. I believe something in the lines of Kruskal-Wallis/ Discriminant analysis / Logistic regression might be good, but I'm not sure how these could be applied to multiple dependent groups with multiple independent variables. Also, Mann-Whitne

... keep reading on reddit ➡

👍︎ 11

💬︎

👤︎ u/timoIjas

📅︎ Jan 27 2021

🚨︎ report

How to run meta-analysis using Metafor with continuous predictor variable

Hey all,

I've used the Metafor package several times before to run a single-paper meta-analysis, where I basically find the effect size across the studies in my paper

However, I've always done this when the predictor variable was categorical (e.g. https://rstudio-pubs-static.s3.amazonaws.com/10913_5858762ec84b458d89b0f4a4e6dd5e81.html )

I'm trying to do the same thing but with a continuous predictor variable and am having some trouble figuring out how to do this

The set-up is very straightforward: I have predictor variable X that is continuous (values are between 0 and 200) and a outcome variable Y that is also continuous (values are between 0 and 10) for N studies.

I've tried searching for this, but have surprisingly been unable to find a tutorial or instructions (maybe I"m looking for the wrong search terms or it could be because I'm not well-versed in meta-analysis..)

Are there any example codes that I could use or some webpage that has an example tutorial for something like this?

Thank you very much for your time!

👍︎ 2

💬︎

👤︎ u/hangman86

📅︎ Mar 22 2021

🚨︎ report

Tech Focus: What Is VRS And Is It A Next-Gen Game-Changer? Variable Rate Shading Analysis! youtu.be/YMf2GDvT-aE

👍︎ 29

💬︎

👤︎ u/Isnabajsja929

📅︎ Apr 30 2020

🚨︎ report

[D] is it ok to log transform variables when doing time series analysis?

I have daily income vs time (days). The income values are very large; would taking the log transform of the income values somehow "compromise" any further time series analysis?

I don't think so, but just wanted to make sure.

Thank you!

👍︎ 21

💬︎

👤︎ u/blueest

📅︎ Oct 27 2020

🚨︎ report

Would running a Multiple Correspondence Analysis on a data set this small actually tell me anything useful? I'm looking to see how strongly my dependent variable outcomes are influenced by the independent variables they are associated with.

👍︎ 3

💬︎

👤︎ u/Infinite_Bae

📅︎ Feb 18 2021

🚨︎ report

How to assess temporal precedence between 2 variables? Path analysis?

Hello everyone,

I am currently working on a project on relationship beliefs and conflict resolution styles. We are doing a daily diary study and assessing relationship beliefs at baseline and follow up and conflict styles at daily diary (5 days - will be averaged). We want to know whether baseline beliefs could predict these conflict styles at daily diaries but we also want to know whether the conflict styles at daily diaries could predict beliefs at follow up. I’ve been looking at a path analysis but I’m not sure if that’s the right way to go.

I hope I made sense - still trying to improve my stats knowledge and skill. Any advice or resources would be greatly appreciated!

👍︎ 3

💬︎

👤︎ u/jenem1015

📅︎ Mar 29 2021

🚨︎ report

[100% off]Dummy Variables in Regression Analysis freewebcart.com/udemy/dum…

👍︎ 3

💬︎

👤︎ u/abjinternational

📅︎ Mar 20 2021

🚨︎ report

[Q] What are the implications of not including strata variable in latent class analysis of complex survey data? How can I include a strata variable in PROC LCA?

I would like to use PROC LCA in SAS to do a latent class analysis using complex survey data. Weight and cluster variables can be included in the procedure, but including a strata variable is not an option. For a sample that was designed with strata, and weights that were calculated using strata, how would not using a strata variable impact the LCA results? Is there a way to include the strata variable in the analysis, similar to the inclusion of strata in the proc surveyfreq procedure? Thank you for your help.

👍︎ 2

💬︎

👤︎ u/small_bluedot

📅︎ Mar 11 2021

🚨︎ report

Good or bad idea to control for variables in analysis?

Hi there,

I am analysing some data where I have two groups, an experimental group and a control group. I have tried to make sure that the groups do not differ in terms of baseline measures but for two variables they are different, e.g. years of education.

Should I control for those variables in my analyses? I first decided to do it (did ANCOVA, regression and partial correlation) but I have since been advised against it but I didn't get a very good explanation for why not other than that it's not recommended. I also know there are different opinions about it.

So, why should I control for those variables in the analyses, or why should I not? :)

Thanks!

👍︎ 14

💬︎

👤︎ u/EmbarrasedBadger

📅︎ Aug 11 2020

🚨︎ report

Analysis of correlation between binary variables

Howdy y'all

I'm trying to find the best statistical measure for the correlation between two binary variables.

I've tried calculating a chi-squared contingency table (Both values present n11, both absent n00), but I'm not sure this is the right analysis since the high numbers of n00 outweigh n11, and make it seem like there is a positive correlation when there is mostly no values.

What statistical analysis can I run on my data to get a good measure for the significance of the correlation between two binary variables?

👍︎ 3

💬︎

👤︎ u/gurp-n-slurp

📅︎ Oct 23 2020

🚨︎ report

[Q] Difference in difference analysis with multiple controls and variables?

I am looking at the impacts that COVID-19 lockdowns have had on public transportation usage over a number of cities over the world. I was told that difference-in-difference would be the way to go about this but I'm confused as to how to actually implement it. Any guidance would be greatly appreciated.

My data is:

date	(city1)	(city1_lockdown)	(city2)	(city2_lockdown)
Mar 1	50	0	40	0
Mar 2	51	0	45	1
...	...	....	...	...
Mar 30	55	0	47	1

where (cityx) is something like daily ridership or % capacity or revenue, etc. cityx_lockdown is a dummy indicating if that city went into lockdown or not.

I tried the following with python (dont have access to stata): > import statsmodels.formula.api as smf > model = smf.ols(formula = 'city2 ~ city2_lockdown + city1 + city2_lockdown*city1', data=combo).fit() print(model.summary())

but this gives me a coeff of 0 for city2_lockdown (and city1*city2_lockdown by extension)

here I guess I am writing (variable) = f(intercept + control + control*variable_dummy).

So...questions:

is this the correct way to format my model to fit? It's not exactly clear from examples I've found online of DID
how do I extend this to include multiple control and variable cities? I have a few cities for both those that underwent lockdown and those that didnt. (I understand the assumption I'm making with DID that the variables would follow the same trends as controls if there were no interventions[lockdowns]).

Thanks!

👍︎ 6

💬︎

👤︎ u/RidiculousMonster

📅︎ Nov 07 2020

🚨︎ report

How do I find the best explanatory variable using Regression Analysis?

I have a hypothesis that one particular variable (x1) plays a significant role in determining 'y'. How can I use linear regression analysis to best identify the best explanatory variable?

👍︎ 5

💬︎

👤︎ u/beaninacan__

📅︎ Oct 01 2020

🚨︎ report

Converting year variable to time indices for panel data analysis

I've a data set comprising of voter turnout across countries. I have certain other variables as well. Now, since election years do not necessarily conicide across countries, how do I change this year variable into a time index?

👍︎ 2

💬︎

👤︎ u/sanket39

📅︎ Jan 23 2021

🚨︎ report

Have variables that are ranges and having trouble using them for Stat. Analysis

I have data on 1000 different PC games and I need to use the number of owners as a variable in a regression. The problem is that owners is an estimated range written as, for example, "10,000,000 .. 20,000,000" in each cell for that column. Is there a way to parse or edit this so that it’s an actual range that excel recognizes and not a string (if that makes sense)? I would hate to have to go one by one and change the values directly...

👍︎ 3

💬︎

👤︎ u/supesdupes420

📅︎ Oct 07 2020

🚨︎ report

Boris Johnson 'privately accepts' up to 50,000 annual Covid deaths as an acceptable level. i understands Downing Street will consider a cost-benefit analysis on both saving lives and effect of deaths on the UK economy before implementing contingency plans for further lockdowns inews.co.uk/news/boris-jo…

👍︎ 267

💬︎

👤︎ u/CaravanOfDeath

📅︎ Aug 27 2021

🚨︎ report

[Question] Correlation analysis for one binary variable and one ordinal variable

Hi y'all!

I have a question: if I have a binominal variable (e.g., outgoing vs. introvert) and one ordinal variable (e.g., comfort level of doing something, on a Likert scale of 1-5), is there a way to test the correlation between them? I don't think either Pearson or Spearman works here.

Any help is appreciated. Thanks!

👍︎ 2

💬︎

👤︎ u/sammilol7

📅︎ Oct 19 2020

🚨︎ report

Question about variables and correlation analysis

Hello everybody!

I need some help with a statistics assignment regarding variables and linear correlation.

My first question is which index would you choose if you wanted to create a policy that supports the financially weaker social classes: the standard deviation of income or the 15th percentile of income? (I think it is the second one but I am not sure)

In the link (http://www.the-crises.com/wp-content/uploads/2010/12/gini-index-usa.jpg) there is a chart that shows the evolution of the Gini index in the US. What correlation coefficient would you think better describes the correlation between the values of the Gini index and the years after World War Two: a positive, a negative, one close to zero, or two coefficients? The reason I am having trouble with this is because it doesnt specify the number of years after the war so I am assuming it is until 2009 so i chose the last option.

Do you think my answers are correct or would you choose something different?

Thanks for the help!

👍︎ 2

💬︎

👤︎ u/tzortzinak95

📅︎ Dec 03 2020

🚨︎ report

Regression Analysis: Using Dummy/Conditional/Boolean-like Variables youtu.be/T7NaDtg-4Gs

👍︎ 2

💬︎

👤︎ u/FCFF35

📅︎ Jan 07 2021

🚨︎ report

Digital Foundry: What Is VRS And Is It A Next-Gen Game-Changer? Variable Rate Shading Analysis! youtube.com/watch?v=YMf2G…

👍︎ 57

💬︎

👤︎ u/Lulcielid

📅︎ Apr 30 2020

🚨︎ report

Analysis of 3 categorical variables

I have 3 categorical variables, 2 of which are binary and one has 4 categories. What kind of test I can do that can tell me that distribution of one binary variable is different (or not) at different levels of 4-level variable by the other binary variable? Is log linear the answer?

👍︎ 3

💬︎

👤︎ u/Omar_Town

📅︎ Mar 06 2021

🚨︎ report