Predicting Chances of Winning 2v2 Games with Non-scoreboard Statistics (Logistic Regression)

To better understand what skills are needed to improve and rank up in competitive doubles I decided to create logistic regressions with non-scoreboard statistics using data found on ballchasing.com. Logistic regression models can be used to predict binary dependent variables. This regression’s dependent variable is β€œResult” (Win or Loss) and the independent variables are listed below. The models are meant to determine what variables, aside from goals, assists, saves, and shots affect a teams’ chance of winning in doubles. To provide relevant information to players of different skill levels I decided to split the data into 4 skill groups, gold 1-3, platinum 1-3, diamond 1-3, and champ 1-3. As expected, the importance of independent variables changed as the ranks increased. Each data set contains 300 games, which I believe is more than sufficient to properly reflect the habits and abilities of each rank category.

Independent variables included in the regressions

- Shooting %

- Time defensive half

- Boost per minute

- Average boost amount

- Time on the ground

- Total distance traveled

- Amount of boost collected

- Amount of boost stolen

- Time with 0 boost

- Time with 100 boost

- Demos inflicted

- Demos taken

- Time slow speed

- Time boosting

- Time supersonic

- Powerslide count

- Amount used while supersonic

Understanding The Regression Output

To start, direct your attention to the independent variables displayed on the bottom rows. If the related coefficient is positive, then the higher the variable value is, the higher team’s chance of success will be. If the coefficient is negative, the lower the variable value is, the higher team’s chance of winning will be. Next, look at the related p-value for each variable. Since we are using a 95% confidence interval, the alpha is 0.05. Meaning that any p-value lower than 0.05 is statistically significant to a team’s chance of winning. All the independent variables displayed on the models are statistically significant. However, it should be noted that some are more significant than others. The closer the p-value is to 0, the more significant it is.

Gold Model

https://preview.redd.it/0zg81nirx6781.png?width=656&format=png&auto=webp&s=69a8d0bba703986dda5608fc6a0573754770c759

Gold Model Interpretation

To win Doubles games in Gold, teams should…

- Stop wasting boost while supersonic

- Improve their shooting accuracy

- Get the ball out of thei

... keep reading on reddit ➑

πŸ‘︎ 60
πŸ’¬︎
πŸ‘€︎ u/Ryan_Reddit7
πŸ“…︎ Dec 23 2021
🚨︎ report
What are the steps you take when training a Logistic Regression model with highly imbalanced data?

My dependent variable is binary, with about 85% of the data being the negative class and 15% being the positive one. My base logistic regression model (no tuning or weighting) predicts largely the negative class, as expected. In addition, I'd also like to case on 3 separate treatment groups but I'm not sure how to go about doing that either.

I've looked at which features the base model found most important, but that hasn't really gotten me anywhere.

I'm new to statistics / data science / machine learning and I'd appreciate hearing some approaches that this community recommends. Thanks!

πŸ‘︎ 74
πŸ’¬︎
πŸ‘€︎ u/JosephKavalier
πŸ“…︎ Nov 27 2021
🚨︎ report
Multinomial logistic Regression: stata/SE failed to allocate matrix

I'm getting the below message from stata/se when I run multinomial logistic regression. The number of independent variables is 1024 - surely that is below the 11,000 limit of SE for rows/columns in a matrix ??? Or is it simply that there are too many variables ?? thanks

You have attempted to create a matrix with too many rows or columns or attempted to fit a model with too many variables.

You are using Stata/SE which supports matrices with up to 11000 rows or columns.

See limits for how many more rows and columns Stata/MP can support.

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/apj2600
πŸ“…︎ Jan 10 2022
🚨︎ report
Can a Logistic Regression Model handle nulls?

Hey all,

I'm creating a logistic regression model with a lot of categorical data. Unfortunately, the data I am using comes from many sources so there a bunch of combinations of filled and unfilled fields. Every single row contains at least one null value. I don't feel comfortable imputing considering it would be some sort of mode imputation and i'd be filling up the data set from anywhere from 60-20% of that respective fields mode.

I've considered doing an imputation with another ML method but I am unfamiliar with that.

So, does Logistic Regression require no null values? If so, what is the efficacy of populating missing values with values based on the probability of obtaining one of the unique values? i.e some sort of weighted fill?

Should I scrap the model entirely with this many missing values? Any work arounds?

For more context, I am trying to build a model which predicts whether a customer might churn. Thanks!

πŸ‘︎ 2
πŸ’¬︎
πŸ“…︎ Jan 05 2022
🚨︎ report
[Q] Alternatives to a logistic regression when independent variables are not mutually exclusive?

Context: In a magical mystery land, there are a list of 20 laws that fairies can apply whenever they encounter a law breaking incident. Each time they enforce the law they can apply multiple laws if applicable, which can lead to a few outcomes (eg., get fined/not get fined, get on record/not get on record, banishment/no banishment etc). Since there are 20 laws, the deity of the land would like to use metrics to inform them what laws are redundant and how to eliminate or combine different laws.

Specific Problem: Would like to examine the relationship between: (IV) whether or not a law co-occurs with another law and the degree to which co-occurrences lead to, specifically, (DV) banishment or no banishment (interested in that one outcome). Since multiple laws can be applied at the same time (not-mutually exclusive), I'm assuming that a logistical regression may not be applicable in this scenario despite the outcome variable being binary. Was wondering it there are alternative models that can examine this relationship?

Thank you thank you!

EDIT: I've taken university level statistics courses as a background and have applied things like ANOVAs and basic regression models, so only have some understanding of basic concepts

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/thenoisewall
πŸ“…︎ Jan 08 2022
🚨︎ report
[Q] Am I going from logistic regression coefficients to percentages correctly?

Let's say I run a logistic regression. The intercept comes back as 1. I also have a binary predictor (lets call it experience vs no experience) with a coefficient of .5.

How can I word this in terms of expected percent success?

e^1 = 2.72

e^.5 = 1.65

So, the base rate of success is:

2.72 / (1 + 2.72) = 73%

The increase in the base rate of success with experience is:

1.65 / (1 + 1.65) = 62%

I'm not sure how to translate this last finding into a difference in expected chance of success? Would it simply be:

Chance of success with experience =

e^(1 + .5) = 4.48
4.48 / (1 + 4.48) = 82%

While the chance of success with no experience is the base rate (63%). Meaning that those with experience have an increase of 19 percentage points in terms of expected success?

πŸ‘︎ 14
πŸ’¬︎
πŸ‘€︎ u/UnderwaterDialect
πŸ“…︎ Jan 04 2022
🚨︎ report
logistic regression extremely slow on pytorch on gpu vs sklearn cpu

hello friends,

im trying to train a DNN on a dataset with 100k features and 300k entries. i want to predict about 30 categories ( its tfidf vectors of text dataset)

to start i wanted to train just simple logistic regression to compare the speed the the sklearn logistic regression implementation.

https://gist.github.com/ziereis/bed30cd4db4b14e72b78d9777aa994ab

here is my implementation of the logistic regression and the train loop.

Am i doing something terribly wrong or why does training in pytorch takes a day and in sklearn it takes 5 minutes ?

i have a 5600x cpu and a 3070 as gpu if thats relevant

any help is appreciated, thanks

πŸ‘︎ 8
πŸ’¬︎
πŸ‘€︎ u/thomas999999
πŸ“…︎ Dec 22 2021
🚨︎ report
10 Unique Machine Learning Interview Questions on Logistic Regression analyticsarora.com/10-uni…
πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/Sarjetion
πŸ“…︎ Jan 05 2022
🚨︎ report
Help with Power Analysis for Mixed-Model Multinomial Logistic Regression - Required Sample Size

I am trying to do a power analysis using G*Power to determine the necessary sample size, but not entirely sure how to go about it. If this is the right start, I am stuck on how to do the Odds Ratio. I would like to detect an effect size where the subjects choose the treatment (a repellent) less than half as often as the control.

I was also confused about the x dist and pretty much everything after that.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/gowiththephloem
πŸ“…︎ Dec 28 2021
🚨︎ report
Vaccine effectiveness and the Test Negative Case Control: Vaccine Effectiveness Turns Negative After Five Months When Analyzed Using Multivariable Logistic Regression (MLR) bartram.substack.com/p/va…
πŸ‘︎ 14
πŸ’¬︎
πŸ‘€︎ u/stickdog99
πŸ“…︎ Nov 13 2021
🚨︎ report
I want to recreate the logistic regression model shown, but the am unsure how to code the response variables as 1 for damaged and 0 for undamaged to create a single new response variable. What code is needed for this? I'm trying to learn the software a little better
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/lyns2456
πŸ“…︎ Dec 09 2021
🚨︎ report
One winner with logistic regression

Is there an easy way to use logistic regression to predict one winner from a dataset?

I have a dataset containing MLB team data from 1998-2019 and want to use logistic regression to try and predict which team won the World Series for a given year but I’m not sure how to tell the model to only choose one winner per year. Any help would be appreciated!

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/sjsjzuaj
πŸ“…︎ Nov 17 2021
🚨︎ report
Trying to do logistic regression using python, but getting this error even though my dataset has the same sample size. reddit.com/gallery/r81xm4
πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/-justsomeone-
πŸ“…︎ Dec 03 2021
🚨︎ report
Alternative to simple ANOVA - multilevel logistic regression?

Hi! I am interested in learning a bit more about regression analysis and after looking at some data I had an idea, but I'm not sure if it is valid or how to approach it, so I'm looking for some feedback or advice. I would normally just create a mean accuracy score and analyze this using some repeated ANOVA but I think it is quite crude to just use the means, so want to explore a more sophisticated approach.

The design is as follows:

  • 30 subjects take part in a cognitive task
  • the task involves 3 types of condition (easy/medium/hard)
  • for each of these conditions, there are 50 trials (total of 150)
  • each trial has a binary ground truth (present/absent)
  • all participants complete every trial & condition (i.e. it is a within-subjects design)
  • participants make a binary response on each trial and follow it up with a rating of how confident they are (sure/maybe/guess)
  • they are told to respond quickly and RT is also collected

Predictor Variables:

  • response class (0,1)
  • confidence (0,1,2)
  • RT (continuous, likely heavily right-skewed, probably not normal)
  • subject (???)

Outcome:

  • ground truth (0,1)

My question then is, can we predict the ground truth for each trial by using RESPONSE_CLASS, CONFIDENCE, RT, & SUBJECT? Since each trial can be either correct or incorrect, i'm assuming some sort of logistic regression, but the fact that it is within-individuals means that this needs to be factored in somehow, but I'm not sure how. Would appreciate any ideas for this!

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/_siggy__
πŸ“…︎ Nov 28 2021
🚨︎ report
Text classification via keywords of news articles via Google news API - bayes vs. logistic regression vs. ???

Hello Everyone

Based on a set of keywords, I am using the Google News API to collect news articles. The newspaper3k python lib then gives me summaries and keywords for those articles.

This works fairly well, but I am of course getting false positives.
For example
-one of my keywords is "pi" (as in Raspberry Pi), and I get hits on Magnum PI (the TV show)
-another is "docker", and I get hits on Docker Street (which I think is in Australia--also a football team).

I have added the idea of "anti-keywords", where if an article has my keyword "python", but /also/ has the anti-keyword/phrase "reticulated python" (like the snake), I ignore it.
This also works pretty well, but I'd like to further decrease my false positives and maybe learn something in the process. :-)

What is a good way to do this? I've been trying to research Bayes and logistic regression, but don't quite have my head wrapped around it. I think its just text classification. I think I want to drop stopwords, lemmitize, and then pass the summary/keywords/url to an algo, perhaps along with the keyword I am matching against. I then maybe get a score back? Then decide based on the score?

I've got a Redis docker container ready to go for data persistence..

I don't think this is just a simple spam/ham issue. Of a group of articles with "python", I might want some but not others, based on the context...

Can anyone provide guidance?

TIA

our_sole

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/our_sole
πŸ“…︎ Dec 22 2021
🚨︎ report
[Article] Spatial Prediction and Digital Mapping of Soil Texture Classes in a Floodplain Using Multinomial Logistic Regression

[Article] Spatial Prediction and Digital Mapping of Soil Texture Classes in a Floodplain Using Multinomial Logistic Regression

doi: https://doi.org/10.1007/978-3-030-85577-2_55

link: https://link.springer.com/chapter/10.1007/978-3-030-85577-2_55#citeas

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/Warm-Ad-2025
πŸ“…︎ Nov 14 2021
🚨︎ report
Does it make sense to binary code for all fields in logistic regression model?

I'm attempting to encode for variable feature weight thresholds based on pre-defined tier groupings.

I've ran a log regression model on my raw data fields using a formula for my dependent binary variable of Purchased/Not Purchased (0,1), with values in other columns 'Estimated Salary', 'Age', 'Gender'(Gender I remapped Female/Male Categorical values to 0/1 as so:

https://preview.redd.it/djh2o6rv9vx71.png?width=608&format=png&auto=webp&s=69e531ae1443e57ff9338bd76f8a4380abb490f3

And ran the logit model which gave me this model with an accuracy rate of 86% at predicting Purchased/Not Purchased:

https://preview.redd.it/4xox2xwj9vx71.png?width=786&format=png&auto=webp&s=5675c7287807883f186a0145546c9462fb69bdf1

I've since tried to re-work variables based on threshold distributions by one-hot encoding Age and Salary Variables based on clusters defined in K-means model as so:

https://preview.redd.it/zxb9zvugavx71.png?width=1220&format=png&auto=webp&s=a8e8185228811a685bd1261b9e56c8b69b68af3f

Now, when I attempt to run the model I get a MLE optimization failed to converge error. And a model that spits out nan for the Intercept field.

https://preview.redd.it/q3emlrkubvx71.png?width=840&format=png&auto=webp&s=36c401238e706d822b90c39f57604ea70dd0b109

I don't even know if this makes sense to do it this way. My hunch is that the Purchase values (as my dependent variable) are not encoded the same as the rest of the columns and perhaps that's the cause for error? Perhaps it doesn't make sense to encode this way for multiple categorical features and would be better suited towards mean or frequency encoding, of which I planned to test out on the predefined Age and Salary Groups and see if the model spits better results.

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/Purple-Ad-3492
πŸ“…︎ Nov 06 2021
🚨︎ report
Ideas for avoiding overfitting in simple logistic regression

I have a simple logistic regression equivalent classifier (I got it from online tutorials):

class MyClassifier(nn.Module):

    def __init__(self, num_labels, vocab_size):

        super(MyClassifier, self).__init__()
        self.num_labels = num_labels
        self.linear = nn.Linear(vocab_size, num_labels)

    def forward(self, input_):
        
        return F.log_softmax(self.linear(input_), dim=1)

Single there is only one layer, using dropout is not one of the options to reduce overfitting. My parameters and the loss/optimization functions are:

learning_rate = 0.01
num_epochs = 5

criterion = nn.CrossEntropyLoss(weight = class_weights)
optimizer = optim.Adam(model.parameters(), lr = learning_rate)

I need to mention that my training data is imbalanced, that's why I'm using class_weights.

My training epochs are returning me these (I compute validation performance at every epoch as for the tradition):

Total Number of parameters:  98128
Epoch 1
train_loss : 8.941093041900183 val_loss : 9.984430663749626
train_accuracy : 0.6076273690389963 val_accuracy : 0.6575908660222202
==================================================
Epoch 2
train_loss : 8.115481783001984 val_loss : 11.780701822734605
train_accuracy : 0.6991507896001001 val_accuracy : 0.6662275931342518
==================================================
Epoch 3
train_loss : 8.045773667609911 val_loss : 13.179592760197878
train_accuracy : 0.7191923984562909 val_accuracy : 0.6701144928772814
==================================================
Epoch 4
train_loss : 8.059769958938631 val_loss : 14.473802320314771
train_accuracy : 0.731468294135531 val_accuracy : 0.6711249543086926
==================================================
Epoch 5
train_loss : 8.015543553590438 val_loss : 15.829670974340084
train_accuracy : 0.7383795859902959 val_accuracy : 0.6727273308589589
==================================================

Plots are:

https://preview.redd.it/1z09912otew71.png?width=1159&format=png&auto=webp&s=d2a37e322198e7c2337c00210dd990e9994867d4

The validation loss tells me that we're overfitting, right? How can I prevent that from happening, so I can trust the actual classification results this trained model returns me?

πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/AcademicAlien
πŸ“…︎ Oct 29 2021
🚨︎ report
If I run a logistic regression for x and y, but switch the order in the regression formula, how does the relationship change?

For example:

glm( y ~ x) but instead I run glm (x ~ y). Will these be in inverse of each other?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/chckenfngr
πŸ“…︎ Nov 27 2021
🚨︎ report
Why do we need regularisation (L2 or L1 norm) in logistic regression?

As I was revising through my logistic regression notes and came around the loss minimization interpretation of logistic regression which is:

argmin(w) log(1 + exp(-Zi)) + 1/2lambda||w||^2 where Zi = Yi.Wi.Xi summation i : 1->n

I know that, the L2 regularisation as used in the above optimization function is used to find a balance between a good seperating hyperplane (decision surface) and weight coefficients that are not too large (tending to infinity) to be overestimated. I can't seem to intuitively understand as to how regularisation is working to balance the weight coefficients to avoid overfitting/underfitting? Also I might be having a misunderstanding here but in the loss function optimization part of the expression, if we consider that we are not using any regularisation, then ideally to minimise the loss function, For points that are correctly seperated, the weights corresponding to features should tend to infinity such the value of Zi tends to infinity which results in log(1 + exp(-Zi)) tending to 0 so we are minimizing the sum over correctly classified points but for the same plane with infinitely big weights, if a point comes out to be incorrectly classified it's loss function value will tend to infinity which makes it working against the optimisation problem. So accordingly the weights should get readjusted to smaller values, such that the sum of loss is minimized, without the need of a regularisation term. So I am really very confused as do we even need regularisation in logistic regression, if yes, how regularisation term in the expression is working towards balancing the weights?

πŸ‘︎ 8
πŸ’¬︎
πŸ‘€︎ u/MrDeepThought2
πŸ“…︎ Dec 01 2021
🚨︎ report
[Q] Definition of a unit in logistic regression.

I am following a guide on: stats.idre.ucla.edu/r/dae/logit-regression/

They talk about unit change/ odds but never define a unit. Is a unit in this case a whole number or a decimal or what?

(Sorry about the random subreddit in the URL, I don’t know how to prevent that on mobile)

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/OccasionBest7706
πŸ“…︎ Nov 15 2021
🚨︎ report
[Q] Validity of logistic regression with underlying continuous IVs after pairing/binning?

[Still seeking guidance] Howdy, all- logistic regression with pairing/binning question here.

The data: I have a dataset of objects with two continuous, known-value IVs and one binary, unknown-value DV. I can only take a limited (~2000) number of samples for the data, which is heavily skewed towards lower values for the IVs (higher values are rare).

The goal: The goal is to determine each IV's relationship with the DV, and whether one IV has a stronger relationship than the other.

Current approach: My current approach is: pair objects in bins; i.e.

  • consider a bin of [0, 0.1]. Get 20 objects in this bin by doing the following:
  1. Take an object with the first IV within [0, 0.1].
  2. Find an associated object with a similar second IV score.
  3. Repeat this 10 times.
  • consider a bin of [0, 0.2]. Get 20 objects by... etc.
  • Repeat until [1.1, 1.2] (so ~240 pairs of objects) to be tested with 4 people each for redundancy.

Then perform logistic regression on the underlying continuous IVs to determine each IV's influence, and perform a paired T-test to determine which influence is stronger.

Is this a valid use of logistic regression? I am worried about the binning and pairing violating something. The primary reason I'm setting it up this way is that I'm worried I'll miss effects at the higher end of the IV values if I randomly sample.

Should I just be completely randomly sampling (eschewing binning and pairing) and hoping I catch the higher-end effects regardless?

Thank you all for any help on this. Am happy to clarify or formalize anything if it makes the question clearer.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/DEMcKnight
πŸ“…︎ Nov 23 2021
🚨︎ report
Binary logistic regression: can you include a continuous independent variable that is a part of the definition of the dependent variable?

Let’s say a continuous variable M (scale 1 to 500) is how the dependent variable X is defined. That is: if M is =/<50 then variable X = 1 and if M >50 then X = 0.

I am running a (multivariable) binary logistic regression to figure out the associations between potential predictors and the increase/decrease % of X happening (i.e., X = 1).

I understand that it is self-explanatory that if M is lower, then X is more likely to happen (will happen). And vice versa if M is higher. But I would still like to include M in the model because I want to report the % decrease/increase of X happening if someone has M = 80 rather than M = 350. Is this stupid? Redundant? Inappropriate?

I understand that there a several assumptions that have to be met before you can perform a binary logistic regression. I know that correlation between continuous independent variables (multicollinearity) can be a problem, but in this case the independent variable and the dependent variable are strongly associated and not two or more independent variables. I could not find any information on this particular scenario anywhere else, hence, the question.

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/assi9guts
πŸ“…︎ Nov 18 2021
🚨︎ report
How can I create a logistic regression sigmoidal (s shaped) curve like in the picture using SPSS? I searched the whole internet, couldn’t find any tutorials about it. My depend variable is binary, independent is continuous.
πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/Yeekess
πŸ“…︎ Nov 25 2021
🚨︎ report
Stepwise Logistic Regression

Does anyone know how I would go about performing stepwise logistic regression? I can perform stepwise linear regression, but am having trouble with the name value pairing when using stepwiseglm to build my logistic model.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/Jules_Delgado
πŸ“…︎ Dec 13 2021
🚨︎ report
Ordinal logistic regression - Amount of independent variables

Hello everyone,

I am planning on doing several ordinal logistic regressions on my Likert-scale dataset concerning assessments of different aspects of digitalization.

Which problems may I generally run into when im using too many independent variables to explain a dependent variable? Can this lead to β€žoverfittingβ€œ and how can I find out how many/which variables make sense to use together?

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/cedced892
πŸ“…︎ Oct 28 2021
🚨︎ report
Linfa release 0.5.0 - now with nearest neighbor search, OPTICS, multinomial logistic regression and many other improvements rust-ml.github.io/linfa/n…
πŸ‘︎ 30
πŸ’¬︎
πŸ‘€︎ u/bytesnake
πŸ“…︎ Oct 21 2021
🚨︎ report
Logistic Regression from scratch with (naive) BFGS?

Since sklearn uses Quasi-Newton methods to iteratively converge for weights, I was wondering if there was a resource that uses simple BFGS algorithm to optimize weights?

Sklearn provides different (more efficient) variants whereas I am looking for a more approachable solution, so sklearn's github is not an option for me.

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/AchieveOrDie
πŸ“…︎ Nov 11 2021
🚨︎ report
[Q] Question about logistic regression coefficients

Am I correct that the coefficient in logistic regression for a binary predictor can be considered a "log odds ratio"? I'm reading a paper about converting different effect sizes and it refers to a log odds ratio. I believe this is just the coefficient in a logistic regression, but wanted to be sure.

πŸ‘︎ 26
πŸ’¬︎
πŸ‘€︎ u/UnderwaterDialect
πŸ“…︎ Oct 18 2021
🚨︎ report
[Q] Logistic regression with rare events

If I have 100,000 observations, with 1,500 events and 3 predictor variables, will I run into any issues with my model under predicting? Or would 1,500 rare events contain enough information to negate that?

Edit: I fit the model as is without any regularization or data augmentation. It seemingly produced plausible probabilities, especially when the probabilities were sampled many times (I.e Monte Carlo simulation). Thanks, everyone.

πŸ‘︎ 13
πŸ’¬︎
πŸ‘€︎ u/AdIntelligent9764
πŸ“…︎ Sep 24 2021
🚨︎ report
Predicting Chances of Winning 2v2 Games with Non-Scoreboard Statistics (Logistic Regression)

To better understand what skills are needed to improve and rank up in competitive doubles I decided to create logistic regressions with non-scoreboard statistics using data found on ballchasing.com. Logistic regression models can be used to predict binary dependent variables. This regression’s dependent variable is β€œResult” (Win or Loss) and the independent variables are listed below. The models are meant to determine what variables, aside from goals, assists, saves, and shots affect a teams’ chance of winning in doubles. To provide relevant information to players of different skill levels I decided to split the data into 4 skill groups, gold 1-3, platinum 1-3, diamond 1-3, and champ 1-3. As expected, the importance of independent variables changed as the ranks increased. Each data set contains 300 games, which I believe is more than sufficient to properly reflect the habits and abilities of each rank category.

Independent variables included in the regressions

- Shooting %

- Time defensive half

- Boost per minute

- Average boost amount

- Time on the ground

- Total distance traveled

- Amount of boost collected

- Amount of boost stolen

- Time with 0 boost

- Time with 100 boost

- Demos inflicted

- Demos taken

- Time slow speed

- Time boosting

- Time supersonic

- Powerslide count

- Amount used while supersonic

Understanding The Regression Output

To start, direct your attention to the independent variables displayed on the bottom rows. If the related coefficient is positive, then the higher the variable value is, the higher team’s chance of success will be. If the coefficient is negative, the lower the variable value is, the higher team’s chance of winning will be. Next, look at the related p-value for each variable. Since we are using a 95% confidence interval, the alpha is 0.05. Meaning that any p-value lower than 0.05 is statistically significant to a team’s chance of winning. All the independent variables displayed on the models are statistically significant. However, it should be noted that some are more significant than others. The closer the p-value is to 0, the more significant it is.

Gold Model

https://preview.redd.it/zgsnothcl8781.png?width=656&format=png&auto=webp&s=c65741aae103049890a3b3cc4945faee671e565d

Gold Model Interpretation

To win Doubles games in Gold, teams should…

- Stop wasting boost while supersonic

- Improve their shooting accuracy

- Get the ball out of their de

... keep reading on reddit ➑

πŸ‘︎ 17
πŸ’¬︎
πŸ‘€︎ u/Ryan_Reddit7
πŸ“…︎ Dec 23 2021
🚨︎ report
Logistic regression in R

how would I change the model the coefficients to be probability once I fit the logistic model normally?

https://preview.redd.it/vmkw4lqglt281.png?width=2880&format=png&auto=webp&s=496625636eafcf0851dbd60c438f1feae8736415

πŸ‘︎ 5
πŸ’¬︎
πŸ“…︎ Nov 30 2021
🚨︎ report
[Article] Spatial Prediction and Digital Mapping of Soil Texture Classes in a Floodplain Using Multinomial Logistic Regression

[Article] Spatial Prediction and Digital Mapping of Soil Texture Classes in a Floodplain Using Multinomial Logistic Regression

doi: https://doi.org/10.1007/978-3-030-85577-2_55

link: https://link.springer.com/chapter/10.1007/978-3-030-85577-2_55

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Warm-Ad-2025
πŸ“…︎ Nov 15 2021
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.