Why do data scientists refer to traditional statistical procedures like linear regression and PCA as examples of machine learning?

I come from an academic background, with a solid stats foundation. The phrase 'machine learning' seems to have a much more narrow definition in my field of academia than it does in industry circles. Going through an introductory machine learning text at the moment, and I am somewhat surprised and disappointed that most of the material is stuff that would be covered in an introductory applied stats course. Is linear regression really an example of machine learning? And is linear regression, clustering, PCA, etc. what jobs are looking for when they are seeking someone with ML experience? Perhaps unsupervised learning and deep learning are closer to my preconceived notions of what ML actually is, which the book I'm going through only briefly touches on.

πŸ‘︎ 321
πŸ’¬︎
πŸ‘€︎ u/darkness1685
πŸ“…︎ Jan 13 2022
🚨︎ report
Is there any point in using machine learning models when you have linear regression?

Linear regression models are easier to implement, do not require any complex statistics libraries (OLS only requires basic matrix operations), need much less training data, can be interpreted and improved much easier and are less likely to overfit. They can approximate any non linear relationships with polynomial regression. Using some very basic OLS regression on market data to forecast furture market direction in matlab shows some very promising results. I understand machine learning is useful when you dont have a clear list of features but with algo trading you have so many features you can use that have clear statistical power (ta indicators, moving averages, past x values ext) it seems using deep learning for trading is like throwing away all the knowledge you already have and trying to reinventing it.

πŸ‘︎ 105
πŸ’¬︎
πŸ‘€︎ u/OSfrogs
πŸ“…︎ Dec 11 2021
🚨︎ report
[Q] In multiple linear regression: What's the difference between F-tests for lack of fit and for regression relation?

I've been studying linear regression using the Kutner, Nachtsheim & Neter book and found myself confused about these 2 tests they mention:

F-test for regression relation: https://ibb.co/xfHwNyn

F-test for lack of fit: https://ibb.co/PCSnJrB

Apart from their respective test statistic expression, I don't understand exactly how their hypothesis set differ, thus regarding them as exchangeable.

Can you please help me to understand?

πŸ‘︎ 21
πŸ’¬︎
πŸ“…︎ Jan 10 2022
🚨︎ report
[Q] When to add an interaction term to linear regression model

I am reading an applied statistic's book and I am still confused on the matter of when to add interaction terms to a model.

In the book, they present the following example
Dependent variable is health (scale 0-10) and independant variable is overweight (yes/no). Several covariates are investigated as potential confounders. One of these is gender.

The book says, if you are interested to know if gender influences the relationship between health and overweight, then you should add gender to the model. The regression coefficient turns out to be insiginificant.

Then the book continues and says if you wish to investigate possible effect modification, you can do this by adding an interaction term to the potential effect modifier in the model. "Suppose there is interest in the question whether the relationship between overweight and quality of life is dfferent for males and females, to investigate this possible modification, the interaction between gender and overweight is added to model."

It does look like men overweight men have a different regression coefficient than overweight women (ends up not being significant in the example).

What confuses me, is that both ways are investigating the effect of gender on health and overweight, yet one is added normally to the model and the other as interaction term. The formulation for the "reasoning" seems to be exactly the same, yet the method is different.

Could someone explain where exactly is the difference and does this mean that an interaction term between gender and the independant variable of interest must always be investigated?

Should you then also look at interaction terms between gender and age, gender and city, etc. (the other covariates which were investigated as potential confounders).

Thanks in advance.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/pashtun92
πŸ“…︎ Jan 01 2022
🚨︎ report
Anybody running a linear regression model

I’m tinkering with a new system and would like to bounce some ideas.

Edit: one of many, thanks for the engagement i really appreciate it. So im a complete dummy at coding but i can hack shit up ok. How is linear regression used vs say stochastic? What does an example look like in execution compared to something a discretionary trader would use. I have about 50 or so different strategies that i have developed that i trade, mostly variations of the same. Examples would be mean reversion, trend following and some scalping tactics. Also alot of tape reading in discretionary trading. I have no idea how to test an idea with options i cant find the data to backtest a strat even if i had it written out in code.

Some other questions is how would i test and write out things like a gap fill, opening range break, those seem pretty straight forward but is there a template out there that i can kinda hack apart and start messing with the variables? Options data doesnt seem easy to come by i'm also curious about execution speed, There are times that i get some news before the volume comes in so that would lead me to think that the algos are waiting for a confirmation or something is at play there. Also there are times where i can see a vol spike but pretty small overall before the volume takes off so it seems like algos are triggering more algos then retail/prop jumps in. I've also observed some weird things with option pricing so i really have questions for days and just trying to peer inside you guy's world.

Again thanks for the feed back I really appreciate all the incite.

πŸ‘︎ 59
πŸ’¬︎
πŸ‘€︎ u/FloridaMann_kg
πŸ“…︎ Nov 16 2021
🚨︎ report
How to predict the probability of a new post-harvest in a linear regression?

I am trying to predict the probability of a new post-harvest in a linear regression. The data points to a single post-harvest with a median of one post-harvest, and we are looking for the probability that the median is 0.7. For the mean, I have to look at the post-harvest data in the columns total', andaverage', and the probabilities are calculated for the post-harvest and the median.

I am trying to predict the probability of a new post-harvest in a linear regression by looking at the post-harvest data in the columns total', andaverage', and the probabilities are calculated for the post-harvest and the median.

​

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pd.layers import LoadData
import tensorflow as tf

model = tf.keras.Model(f.keras.Columns[:,:,:,:,])
df = pd.DataFrame(df)


model.fit(df)

model.fit(df)


df = pd.DataFrame(df)


df.fit(x)

df.fit(y)

df.fit(z)

df.fit(n)


df.add(np.min(df, df.norm(df.x), df.y))

df.add(np.max(df, df.y), df.norm(df.y, df.z)))


df.set_weights(weight=df.weight, df.min_weight=df.min_weight)

df.set_model(train=df.model(df)
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/abstract_void_bot
πŸ“…︎ Jan 14 2022
🚨︎ report
Should I use non-linear regression modeling?

Hello. I am a computer science student working on an independent project, and I am looking for someone who can point me in the right direction. What I'm trying to do seems fairly straightforward, but I'm having trouble finding the right statistical tools because I don't know enough to know the name of what I'm looking for.

Here's the problem I'm trying to solve:

In the housing market, each house will have a number of variables, such as number of bedrooms, number of bathrooms, square footage, and market price. Given a large sample of this kind of data, how can I then predict the market price of a new house if know all of its other variables except market price?

Presumably, the relationship would be non-linear, so should I be looking for non-linear regression analysis? I also looked at principal component analysis, but I don't really understand how that's different. Also, if anyone has a recommendation of a Java library which includes these functions, that would be amazing. Thank you.

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Malchar2
πŸ“…︎ Jan 05 2022
🚨︎ report
Can I use features of type float in a linear regression machine learning model?

I am currently working on a basic machine learning project which is revolved around predicting house prices given several different features about the house. Some of these features are of type float instead of type int. Examples of this are bathrooms (can be 1.5), floors (there are houses with 1.5 and 2.5 floors in this dataset), and bedrooms (can be 1.5,2.5... in this dataset).

I was looking at other similar projects online and came across this:

"Remember that, it is essential to change float types to integer types because linear regression is supported only on integer type variables. It can be converted using the β€˜astype’ function in python."

Finally, my question is, do I have to convert floats to integers for a linear regression machine learning model or can I use floats? I want to use floats because I feel like by converting to int type, I will lose a lot of important data. Ex- 1 bedroom house and 1.5 bedroom house will be the same after the conversion.

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/rztxx
πŸ“…︎ Jan 01 2022
🚨︎ report
In depth course on regression (linear/non-linear) [E]

I am about to complete the IBM Data Science certificate, and I feel that I don't have a solid grasp of Regression, especially the maths behind it.

Are there any good, reputable, in-depth courses that focus specifically on regression, ideally using Python? I don't want to have a general overview again, I am looking for a course with depth.

Thanks!

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/jacques_413
πŸ“…︎ Jan 04 2022
🚨︎ report
Statistics Help: How can I find sample size required for Multivariate Multiple Linear Regression

Title explains it but here's more detail. I've got a number of variables that I would like to use to predict multiple outcome/dependent variables. I assume running separate univariate multiple regressions for each dependent is a bad idea. I believe a multivariate multiple regression is what I'd need, however I may be wrong. How can I calculate the sample size needed apriori?

I've done some reading but I've gotten irritated/frustrated so I'm taking a bit to cool down/post here.

πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/HugsForThugs1
πŸ“…︎ Dec 24 2021
🚨︎ report
Finding the Period of a Pendulum using linear regression (does this work?)

Hello, I have gathered some data on a pendulum that looks like the following:

Number of full Periods n Time T(n) [s]
3 Periods 5.34
5 Periods 8.73
7 Periods 12.07
9 Periods 15.69

Now I want to calculate T for a single period. I tried it in two different ways:

  1. Linear Regression of T on n: This yields T(1) =1.86 Seconds. [seems a little to high...]
  2. calculating each Period individually and taking the average:

[ T(3)/3 + T(5)/5 + T(7)/7 + T(9)/9 ] / 4 = 1.75 Seconds.

Intuitively I would think that the second method is the correct one but i can't figure out why linear regression wouldn't work in that case. Any solutions to my problem would be appreciated!

πŸ‘︎ 2
πŸ’¬︎
πŸ“…︎ Jan 12 2022
🚨︎ report
Independent variables for linear regression model (stock price)

Hello guys, i need to run a linear model where the independent variable is a stock (ETF precisely) price during a period of 6 months, daily. I need to put all the variables that, according to economics theory, are revelant to explain movement of stock prices. Could you suggest me what are these?

Thank you so much

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/leosomma
πŸ“…︎ Dec 30 2021
🚨︎ report
Interpretating Linear Regression results: Reference to comment section for the help!!
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/dx1sy
πŸ“…︎ Dec 21 2021
🚨︎ report
Is this enough homoscedasticity for linear regression?
πŸ‘︎ 33
πŸ’¬︎
πŸ‘€︎ u/malachai926
πŸ“…︎ Nov 21 2021
🚨︎ report
Using R and R2 values in non-linear regressions (quadratic)

Would it be fitting to have an R2 value for a quadratic? Excel gives a value for this, but I was under the impression you could only use R2 values in linear regression. What about R values, I think that one is ok to use with non linear regression right?

πŸ‘︎ 4
πŸ’¬︎
πŸ“…︎ Jan 10 2022
🚨︎ report
Linear Regression Model - variability

Hi again!

I've created a a simple linear regression model with Y as response and X as a predictor. I have to answer how well my model explains the variability (in percentage) in Y.

So I used summary but I am not sure on which number I should look at, adjusted R-squared?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Moonsea96
πŸ“…︎ Jan 07 2022
🚨︎ report
How true is this? Do Economists only use Linear Regression in data modeling? v.redd.it/8uql22tz6mb81
πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/nownerds123
πŸ“…︎ Jan 14 2022
🚨︎ report
Very basic of linear regression. My first article. Please have a look and give inputs .
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/StandardFlat2561
πŸ“…︎ Dec 21 2021
🚨︎ report
Linear Regression

F STATISTIC - MSR/MSE=RSS/k/SSE/n-k-1 When calculating MSE why do we take denominator as n-k-1 why not just k as done in MSR.

πŸ‘︎ 8
πŸ’¬︎
πŸ“…︎ Dec 09 2021
🚨︎ report
[Q] Books on applied linear models/regression for undergraduates with minimal math backgrounds

Does anyone have any good recommendations for books on applied linear models/regression that would be acessible to undergrads with minimal math backgrounds? I'm talking students who have taken some type of intro stats course but might not even have much mastery of calculus, let alone matrix algebra. I've seen some books that technically don't require calculus, but still have complicated algebraic derivations that I think many students would struggle with due to a general lack of mathematical maturity. I expect many will still be largely in the high school mindset that math/stats means you plug numbers into formulas and compute other numbers.

I'm looking for something that can get at the concepts and applications of linear models and start pushing students to transition to a more mature mindset about statistics but doesn't totally throw them into the deep end right off the bat.

Thanks in advance!

πŸ‘︎ 23
πŸ’¬︎
πŸ‘€︎ u/FairPlayWes
πŸ“…︎ Nov 18 2021
🚨︎ report
[Question] Is there a particular order of importance when it comes to checking the assumptions of Linear Regression?

I have come across various resources which cite the below as assumptions of linear regression.

Linearity , Independence of errors, Homoscedasticity, Autocorrelation, Normality (of error)

My question is, when we diagnose the model fit of a linear regression, is there a particular order of importance for checking these assumption? Some say Normality is the least we should worry about.

So according to me, the order of importance is as follows (1 being most important, 5 least important)

  1. Linearity 2. Independence of errors 3. Homoscedasticity 4. Autocorrelation 5. Normality

What do you think ?

πŸ‘︎ 29
πŸ’¬︎
πŸ‘€︎ u/venkarafa
πŸ“…︎ Nov 25 2021
🚨︎ report
[Q] Regarding my Linear Regression Model.

Hello everyone I'm new to stats and I just need clarification on 2 ideas that I can't understand if anyone could explain them to me I would greatly appreciate it.

  1. Why does adding an extra variable to my regression decrease my R^2 dramatically and this is not the model's adjusted R^2 but just R^2. I was under the impression that R^2 increases regardless and R^2 adjusted is the one that decreases if the extra estimator isn't adding much to the model.
  2. By adding a variable (beta_3) my beta_1 coefficient (the price of a commodity )went from -1200 to -5000 which is a massive change, what does that say? I'm confused because I can't tell if adding the variable is making my model biased or if it's the absence of the variable that's making it biased.
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Sinsiski
πŸ“…︎ Dec 07 2021
🚨︎ report
A quick take on options (Scientist perspective). You want to reduce the number of variables not create additional. In any model we try a reduction in variables through linear regression. In other words, if buy, hold, drs has been the way then keep it.
πŸ‘︎ 84
πŸ’¬︎
πŸ‘€︎ u/Gorilli0naire
πŸ“…︎ Nov 16 2021
🚨︎ report
A simple introduction to Machine Learning: Linear Regression

Hello r/learnprogramming community. I love machine learning and I've been part of this community for a while, I've seen a number of people talking about machine learning so, I've decided that I would make an accessible machine learning course that I believe beginners should be able to implement. For example, I've started with this simple Introduction to Machine Learning: Linear Regression. This guide is written in Python. Please feel free to let me know if you have any questions, I'll be around for a bit and I'll be happy to answer them in the comments below!

πŸ‘︎ 95
πŸ’¬︎
πŸ‘€︎ u/help-me-grow
πŸ“…︎ Nov 04 2021
🚨︎ report
[Question] Binomial regression v linear with binomial variables

I am attempting to develop a model to predict rates of internal capture of jobs using features related to housing characteristics. The dependent variable, the rate of internal capture, has a binomial distribution . So, should I simply standardize all variables and use standard linear regression (yes, I know normal distribution is not a requirement for linear reg) or do I need to use a binomial regression? Is a binomial regression intuitively the same thing as a linear regression? Also, if we trying to predict something continuous (like rate), do we need to make group variables into binary groups as to be Bernoulli processes? Thanks for your help!

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/pinus_mugo
πŸ“…︎ Dec 05 2021
🚨︎ report
Linear regression models indicated that celebrity worship was associated with lower performance on the cognitive tests bmcpsychology.biomedcentr…
πŸ‘︎ 10
πŸ’¬︎
πŸ‘€︎ u/nevergirls
πŸ“…︎ Jan 07 2022
🚨︎ report
Learning about basic regression and am confused on how the math logic in the bottom two lines makes sense. Also why do we assume that yi = xi for all i, wouldn’t that mean that the data is perfectly linear?
πŸ‘︎ 12
πŸ’¬︎
πŸ‘€︎ u/BTDGoat
πŸ“…︎ Dec 21 2021
🚨︎ report
What to buy? LOOK! Bag some TLOS as it is starting to show uptrend reversal on its 4h tf chart. We can see here that it already had its 3-wave drops. Normally happens after this is a reversal. And last candle is above the middle linear regression channel. Next target would be at $0.76 - 0.78 - 0.83.
πŸ‘︎ 11
πŸ’¬︎
πŸ‘€︎ u/No_Geologist_1826
πŸ“…︎ Dec 13 2021
🚨︎ report
t-test, when testing the estimated regression coefficients, β€œwhy” that have been decided to be done by t-test, based on what informations? I’ve read that we use t-test whenever we want to compare the means of 2 groups, how this does apply to Linear Regression Coefficient? Thank you.
πŸ‘︎ 16
πŸ’¬︎
πŸ“…︎ Nov 16 2021
🚨︎ report
[Code Link in Comment] Linear and Polynomial Regression understanding and visualization in Jupyter Notebook v.redd.it/cq2lc7fejk281
πŸ‘︎ 75
πŸ’¬︎
πŸ‘€︎ u/samrat1714
πŸ“…︎ Nov 29 2021
🚨︎ report
Linear regression is exhausting to understand

So I wont lie, quant has never been my strong suit. However Marky Meldrum has provided me with a smooth passage in understanding the quant section...until I reached Regression. This makes no sense to me and Im having a hard time finishing the video on it because the basics dont make any sense to me whatsoever. Anyone has any tips ? Perhaps a simpler explanation on youtube or something. Also a quick question, how hard/easy is FRA compared to Quant? I plan on doing that as soon as I finish regression

πŸ‘︎ 42
πŸ’¬︎
πŸ‘€︎ u/agriospanther
πŸ“…︎ Oct 12 2021
🚨︎ report
[Question] In a linear regression model, does heteroscedasticity imply no autocorrelation?

While performing diagnostic checks on a linear regression model, should one check for autocorrelation when there is heteroscedasticity? I came across some materials wherein they say, heteroscedasticity implies no autocorrelation and hence there is no need to run a separate test to check auto correlation.

Is my understanding correct ?

πŸ‘︎ 24
πŸ’¬︎
πŸ‘€︎ u/venkarafa
πŸ“…︎ Nov 18 2021
🚨︎ report
R and R2 values in non linear regression (quadratics)

Would it be fitting to have an R2 value for a quadratic? Excel gives a value for this, but I was under the impression you could only use R2 values in linear regression. What about R values, I think that one is ok to use with non linear regression right?

πŸ‘︎ 4
πŸ’¬︎
πŸ“…︎ Jan 10 2022
🚨︎ report
Statistics Help: How can I find sample size required for Multivariate Multiple Linear Regression

Title explains it but here's more detail.

I've got a number of variables that I would like to use to predict multiple outcome/dependent variables. I assume running separate univariate multiple regressions for each dependent is a bad idea. I believe a multivariate multiple regression is what I'd need, however I may be wrong.

How can I calculate the sample size needed apriori? I've done some reading but I've gotten irritated/frustrated so I'm taking a bit to cool down/post here.

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/HugsForThugs1
πŸ“…︎ Dec 24 2021
🚨︎ report
[NEED HELP] on Linear Regression assignment

Need someone who can do my homework

πŸ‘︎ 2
πŸ’¬︎
πŸ“…︎ Dec 17 2021
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.