A list of puns related to "Linear Regression"
I come from an academic background, with a solid stats foundation. The phrase 'machine learning' seems to have a much more narrow definition in my field of academia than it does in industry circles. Going through an introductory machine learning text at the moment, and I am somewhat surprised and disappointed that most of the material is stuff that would be covered in an introductory applied stats course. Is linear regression really an example of machine learning? And is linear regression, clustering, PCA, etc. what jobs are looking for when they are seeking someone with ML experience? Perhaps unsupervised learning and deep learning are closer to my preconceived notions of what ML actually is, which the book I'm going through only briefly touches on.
Linear regression models are easier to implement, do not require any complex statistics libraries (OLS only requires basic matrix operations), need much less training data, can be interpreted and improved much easier and are less likely to overfit. They can approximate any non linear relationships with polynomial regression. Using some very basic OLS regression on market data to forecast furture market direction in matlab shows some very promising results. I understand machine learning is useful when you dont have a clear list of features but with algo trading you have so many features you can use that have clear statistical power (ta indicators, moving averages, past x values ext) it seems using deep learning for trading is like throwing away all the knowledge you already have and trying to reinventing it.
I've been studying linear regression using the Kutner, Nachtsheim & Neter book and found myself confused about these 2 tests they mention:
F-test for regression relation: https://ibb.co/xfHwNyn
F-test for lack of fit: https://ibb.co/PCSnJrB
Apart from their respective test statistic expression, I don't understand exactly how their hypothesis set differ, thus regarding them as exchangeable.
Can you please help me to understand?
I am reading an applied statistic's book and I am still confused on the matter of when to add interaction terms to a model.
In the book, they present the following example
Dependent variable is health (scale 0-10) and independant variable is overweight (yes/no). Several covariates are investigated as potential confounders. One of these is gender.
The book says, if you are interested to know if gender influences the relationship between health and overweight, then you should add gender to the model. The regression coefficient turns out to be insiginificant.
Then the book continues and says if you wish to investigate possible effect modification, you can do this by adding an interaction term to the potential effect modifier in the model. "Suppose there is interest in the question whether the relationship between overweight and quality of life is dfferent for males and females, to investigate this possible modification, the interaction between gender and overweight is added to model."
It does look like men overweight men have a different regression coefficient than overweight women (ends up not being significant in the example).
What confuses me, is that both ways are investigating the effect of gender on health and overweight, yet one is added normally to the model and the other as interaction term. The formulation for the "reasoning" seems to be exactly the same, yet the method is different.
Could someone explain where exactly is the difference and does this mean that an interaction term between gender and the independant variable of interest must always be investigated?
Should you then also look at interaction terms between gender and age, gender and city, etc. (the other covariates which were investigated as potential confounders).
Thanks in advance.
Iβm tinkering with a new system and would like to bounce some ideas.
Edit: one of many, thanks for the engagement i really appreciate it. So im a complete dummy at coding but i can hack shit up ok. How is linear regression used vs say stochastic? What does an example look like in execution compared to something a discretionary trader would use. I have about 50 or so different strategies that i have developed that i trade, mostly variations of the same. Examples would be mean reversion, trend following and some scalping tactics. Also alot of tape reading in discretionary trading. I have no idea how to test an idea with options i cant find the data to backtest a strat even if i had it written out in code.
Some other questions is how would i test and write out things like a gap fill, opening range break, those seem pretty straight forward but is there a template out there that i can kinda hack apart and start messing with the variables? Options data doesnt seem easy to come by i'm also curious about execution speed, There are times that i get some news before the volume comes in so that would lead me to think that the algos are waiting for a confirmation or something is at play there. Also there are times where i can see a vol spike but pretty small overall before the volume takes off so it seems like algos are triggering more algos then retail/prop jumps in. I've also observed some weird things with option pricing so i really have questions for days and just trying to peer inside you guy's world.
Again thanks for the feed back I really appreciate all the incite.
I am trying to predict the probability of a new post-harvest in a linear regression. The data points to a single post-harvest with a median of one post-harvest, and we are looking for the probability that the median is 0.7. For the mean, I have to look at the post-harvest data in the columns total', and
average', and the probabilities are calculated for the post-harvest and the median.
I am trying to predict the probability of a new post-harvest in a linear regression by looking at the post-harvest data in the columns total', and
average', and the probabilities are calculated for the post-harvest and the median.
β
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pd.layers import LoadData
import tensorflow as tf
model = tf.keras.Model(f.keras.Columns[:,:,:,:,])
df = pd.DataFrame(df)
model.fit(df)
model.fit(df)
df = pd.DataFrame(df)
df.fit(x)
df.fit(y)
df.fit(z)
df.fit(n)
df.add(np.min(df, df.norm(df.x), df.y))
df.add(np.max(df, df.y), df.norm(df.y, df.z)))
df.set_weights(weight=df.weight, df.min_weight=df.min_weight)
df.set_model(train=df.model(df)
Hello. I am a computer science student working on an independent project, and I am looking for someone who can point me in the right direction. What I'm trying to do seems fairly straightforward, but I'm having trouble finding the right statistical tools because I don't know enough to know the name of what I'm looking for.
Here's the problem I'm trying to solve:
In the housing market, each house will have a number of variables, such as number of bedrooms, number of bathrooms, square footage, and market price. Given a large sample of this kind of data, how can I then predict the market price of a new house if know all of its other variables except market price?
Presumably, the relationship would be non-linear, so should I be looking for non-linear regression analysis? I also looked at principal component analysis, but I don't really understand how that's different. Also, if anyone has a recommendation of a Java library which includes these functions, that would be amazing. Thank you.
I am currently working on a basic machine learning project which is revolved around predicting house prices given several different features about the house. Some of these features are of type float instead of type int. Examples of this are bathrooms (can be 1.5), floors (there are houses with 1.5 and 2.5 floors in this dataset), and bedrooms (can be 1.5,2.5... in this dataset).
I was looking at other similar projects online and came across this:
"Remember that, it is essential to change float types to integer types because linear regression is supported only on integer type variables. It can be converted using the βastypeβ function in python."
Finally, my question is, do I have to convert floats to integers for a linear regression machine learning model or can I use floats? I want to use floats because I feel like by converting to int type, I will lose a lot of important data. Ex- 1 bedroom house and 1.5 bedroom house will be the same after the conversion.
I am about to complete the IBM Data Science certificate, and I feel that I don't have a solid grasp of Regression, especially the maths behind it.
Are there any good, reputable, in-depth courses that focus specifically on regression, ideally using Python? I don't want to have a general overview again, I am looking for a course with depth.
Thanks!
Title explains it but here's more detail. I've got a number of variables that I would like to use to predict multiple outcome/dependent variables. I assume running separate univariate multiple regressions for each dependent is a bad idea. I believe a multivariate multiple regression is what I'd need, however I may be wrong. How can I calculate the sample size needed apriori?
I've done some reading but I've gotten irritated/frustrated so I'm taking a bit to cool down/post here.
Hello, I have gathered some data on a pendulum that looks like the following:
Number of full Periods n | Time T(n) [s] |
---|---|
3 Periods | 5.34 |
5 Periods | 8.73 |
7 Periods | 12.07 |
9 Periods | 15.69 |
Now I want to calculate T for a single period. I tried it in two different ways:
[ T(3)/3 + T(5)/5 + T(7)/7 + T(9)/9 ] / 4 = 1.75 Seconds.
Intuitively I would think that the second method is the correct one but i can't figure out why linear regression wouldn't work in that case. Any solutions to my problem would be appreciated!
Hello guys, i need to run a linear model where the independent variable is a stock (ETF precisely) price during a period of 6 months, daily. I need to put all the variables that, according to economics theory, are revelant to explain movement of stock prices. Could you suggest me what are these?
Thank you so much
Would it be fitting to have an R2 value for a quadratic? Excel gives a value for this, but I was under the impression you could only use R2 values in linear regression. What about R values, I think that one is ok to use with non linear regression right?
Hi again!
I've created a a simple linear regression model with Y as response and X as a predictor. I have to answer how well my model explains the variability (in percentage) in Y.
So I used summary but I am not sure on which number I should look at, adjusted R-squared?
F STATISTIC - MSR/MSE=RSS/k/SSE/n-k-1 When calculating MSE why do we take denominator as n-k-1 why not just k as done in MSR.
Does anyone have any good recommendations for books on applied linear models/regression that would be acessible to undergrads with minimal math backgrounds? I'm talking students who have taken some type of intro stats course but might not even have much mastery of calculus, let alone matrix algebra. I've seen some books that technically don't require calculus, but still have complicated algebraic derivations that I think many students would struggle with due to a general lack of mathematical maturity. I expect many will still be largely in the high school mindset that math/stats means you plug numbers into formulas and compute other numbers.
I'm looking for something that can get at the concepts and applications of linear models and start pushing students to transition to a more mature mindset about statistics but doesn't totally throw them into the deep end right off the bat.
Thanks in advance!
I have come across various resources which cite the below as assumptions of linear regression.
Linearity , Independence of errors, Homoscedasticity, Autocorrelation, Normality (of error)
My question is, when we diagnose the model fit of a linear regression, is there a particular order of importance for checking these assumption? Some say Normality is the least we should worry about.
So according to me, the order of importance is as follows (1 being most important, 5 least important)
What do you think ?
Hello everyone I'm new to stats and I just need clarification on 2 ideas that I can't understand if anyone could explain them to me I would greatly appreciate it.
Hello r/learnprogramming community. I love machine learning and I've been part of this community for a while, I've seen a number of people talking about machine learning so, I've decided that I would make an accessible machine learning course that I believe beginners should be able to implement. For example, I've started with this simple Introduction to Machine Learning: Linear Regression. This guide is written in Python. Please feel free to let me know if you have any questions, I'll be around for a bit and I'll be happy to answer them in the comments below!
I am attempting to develop a model to predict rates of internal capture of jobs using features related to housing characteristics. The dependent variable, the rate of internal capture, has a binomial distribution . So, should I simply standardize all variables and use standard linear regression (yes, I know normal distribution is not a requirement for linear reg) or do I need to use a binomial regression? Is a binomial regression intuitively the same thing as a linear regression? Also, if we trying to predict something continuous (like rate), do we need to make group variables into binary groups as to be Bernoulli processes? Thanks for your help!
So I wont lie, quant has never been my strong suit. However Marky Meldrum has provided me with a smooth passage in understanding the quant section...until I reached Regression. This makes no sense to me and Im having a hard time finishing the video on it because the basics dont make any sense to me whatsoever. Anyone has any tips ? Perhaps a simpler explanation on youtube or something. Also a quick question, how hard/easy is FRA compared to Quant? I plan on doing that as soon as I finish regression
While performing diagnostic checks on a linear regression model, should one check for autocorrelation when there is heteroscedasticity? I came across some materials wherein they say, heteroscedasticity implies no autocorrelation and hence there is no need to run a separate test to check auto correlation.
Is my understanding correct ?
Would it be fitting to have an R2 value for a quadratic? Excel gives a value for this, but I was under the impression you could only use R2 values in linear regression. What about R values, I think that one is ok to use with non linear regression right?
Title explains it but here's more detail.
I've got a number of variables that I would like to use to predict multiple outcome/dependent variables. I assume running separate univariate multiple regressions for each dependent is a bad idea. I believe a multivariate multiple regression is what I'd need, however I may be wrong.
How can I calculate the sample size needed apriori? I've done some reading but I've gotten irritated/frustrated so I'm taking a bit to cool down/post here.
Need someone who can do my homework
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.