26 Hilarious Mean squared error Puns

Using root mean squared error with lead times for safety stock ??

Demand variability isn’t an issue but supply lead time is. Could actual vs expected supply lead times be used in place of actual demand vs forecast in the RMSE formula to generate a safety stock need??

👍︎ 9

💬︎

👤︎ u/sdcrne

📅︎ Dec 22 2021

🚨︎ report

If the arithmetic mean is the least squared sum of errors, then what is the geometric mean?

Hello r/learnmath, I was thinking about this: If the arithmetic mean is the number which the dataset accumulates around (i.e. is the least squared sum of errors), then what is the geometric mean?

I've read that it's the point of least distance from any of the numbers within a dataset.

Imagine a number increasing by 50%, 60%, and 70%. So, the geometric mean is the number the "increases" accumulate around. However, how can one think of this intuitively? I can't imagine "increases" accumulating, but imagining numbers in a dataset accumulating around a certain number is easy.

👍︎ 4

💬︎

👤︎ u/DamnatioAdCicadas

📅︎ Dec 09 2021

🚨︎ report

[Q] When do we use Mean Squared Prediction Error? Difference between MSE and MSPE?

What is the difference between MSE ( Mean Squared Error) and MSPE (Mean Squared Prediction Error) ? Do we use MSPE for classification and MSE for regression? Can someone with experience please elaborate with example?

👍︎ 7

💬︎

👤︎ u/binary1ogic

📅︎ Nov 10 2021

🚨︎ report

Does the mean squared error suffer from the curse of dimensionality?

In many ML related algorithms such as clustering, the l2 distance becomes too meaningless as a distance measure when the dimensionality is too high.

The mean squared error is half of the squared l2 distance between outputs and targets. Does using MSE as a loss function suffer from the curse of dimensionality? If yes, how does the curse of dimensionality manifest when training a model using MSE as the loss between very high dimensional outputs and targets? Is there a rule of thumb for how high the dimensionality can be before problems arise? I am aware that the cosine similarity offers an alternative, i'm just asking to understand this better.

👍︎ 3

💬︎

👤︎ u/tfhwchoice

📅︎ Aug 29 2021

🚨︎ report

Tutorial on how to calculate the (root) mean squared error (MSE & RMSE) in R

Hey, I've created a tutorial on how to calculate the (root) mean squared error (MSE & RMSE) in the R programming language: https://statisticsglobe.com/root-mean-squared-error-in-r

👍︎ 6

💬︎

👤︎ u/JoachimSchork

📅︎ May 25 2021

🚨︎ report

[Q] Struggling to understand how to differentiate the mean squared error (MSE) equation

I was just trying to follow the steps of optimizing multilinear regression using the normal equation, when I came upon this equivalence:

> ∂J/∂θ = ∂/∂θ [ (Xθ - y)^(T)(Xθ - y) ] > > ∂J/∂θ = 2X^(T)Xθ - 2X^(T)y

... where J(θ) is the MSE objective function, X is a matrix, and both θ and y are vectors.

Can someone show me the steps for how to get from the first to the second equation? I understand calculus, and I understand linear algebra, but my understanding of vector/matrix calculus (i.e., the intersection of calc and lin alg) is weak. In particular, how transposition interacts with differentiation is kind of a mystery. So I was following along just fine until I hit the step above, when I got lost.

By the way, for reference, this is the source whose steps I was consulting.

👍︎ 3

💬︎

👤︎ u/synthphreak

📅︎ Dec 09 2020

🚨︎ report

If in Genetic Algorithm, fitness function= Root mean squared error. Then how to calculate fitness values ?

👍︎ 2

💬︎

👤︎ u/sidcasticly_yours

📅︎ Nov 01 2020

🚨︎ report

[University: Models - Errors] Mean squared error and its order of magnitude

I am performing a linear regression and I have X data: 5-4600, Y data 0-280 and I get a mean squared error ~15000, or for the same X data and Y-data: 7-270 I get a ~2600 error. Is this realistic? I have calculations ahead of me, but I don't believe it a bit, I'm worried about these great values.
So can a relative error (aproximate error) be used here? Can I divide this mean square error by ... well, what, the average value of Y? Is this acceptable?

👍︎ 13

💬︎

👤︎ u/Zacny_Los

📅︎ Jun 03 2020

🚨︎ report

Question on mean squared error

My understanding of mse is that in a nutshell it is used to tell your model that the more a prediction is off, the exponentially more it affects your loss negatively. I think this result a lot of times in your model caring exponentially more about large values than small values. Is this thinking correct? For my problem i do want large values to be exponentially more important than smaller values.

My other question about mse is how are values less than 1 treated? If a prediction if off by less than 1 (which is most common because the bounds of output is usually 0,1) if we squared a value less than 1 we would get a smaller and smaller value.

Which loss function is MOST sensitive to outliers?

👍︎ 7

💬︎

👤︎ u/Yogi_DMT

📅︎ Oct 08 2019

🚨︎ report

In this video, I calculate Mean Square Displacement (MSD)(https://en.wikipedia.org/wiki/Mean_squared_error) of an atom and then its diffusion coefficient from the slope of the MSD vs time curve. The dump files shown were generated by NVT simulation on LAMMPS. youtu.be/W8TS_NucopA

👍︎ 5

💬︎

👤︎ u/msh_thakur

📅︎ Jul 04 2020

🚨︎ report

Using loss "mean_squared_error" vs "binary_crossentropy" in simple autoencoders

I have read the article on building the simple autoencoder in keras blog as
building autoencoder
when compiling the model i used loss ='binary_crossentropy' which didnt goes well gives high error. But when used 'mean_squared_error' in just 1 epoch give loss of 0.02........ . Is its ok.

also in decoded representation in input representation, do i have to always use the activation of sigmoid in simple autoencoder model.

# "encoded" is the encoded representation of the input
encoded = Dense(encoding_dim, activation='relu')(input_img)
# "decoded" is the encoded representation of the input
decoded = Dense(784, activation='sigmoid')(encoded)

Thankyou for answer in advance.

👍︎ 4

💬︎

👤︎ u/__mathematics

📅︎ Feb 22 2020

🚨︎ report

Why is Mean Squared Error used? youtu.be/CeBe2wLI8x0

👍︎ 56

💬︎

👤︎ u/VCubingX

📅︎ Dec 16 2018

🚨︎ report

How to go from "mean squared error" to "mean of squares minus square of mean"?

I'm reading http://www.inference.org.uk/mackay/itila/book.html :

One page 1 it makes the jump from the first algorithm given below to the second without any explanations. My own experiments lead me to believe they are equivalent expressions, but I don't know how to prove this.

mean((s_i - mean(s)) ^ 2)

mean(s^2) - mean(s)^2

where s is a set of values, mean is the mean function, and s_i represents each individual item in s.

The first equation is simply the "mean squared error", a.k.a. variance.

The second equation always seems to come up with the same answer when I perform the computation, but I don't know how to algebraically prove that they are equivalent.

👍︎ 2

💬︎

👤︎ u/Buttons840

📅︎ Jan 09 2020

🚨︎ report

When fitting a non-linear model, should I aim to minimize the sum of squared errors (SSE) or root mean square error (RMSE)?

More importantly, will the difference between model fits from either approach blow up with a larger sample set?

👍︎ 2

💬︎

👤︎ u/ubersienna

📅︎ Oct 10 2019

🚨︎ report

Why is Kullback Leibler divergence a good way to compare two probability distributions? I'm unable to understand intuition behind the advantage/appropriateness it has over say, mean squared error or cross entropy.

Let's say you have a bag, with 100 balls and with 4 colors of balls, red, white, green, blue.

The actual probabilities for each color is

[0.36438797, 0.12192962, 0.19189483, 0.32178759]

But let's say my model comes up with two predictions for this distribution as follows, with the KLD and MSE values given

P(x) : [0.37300551, 0.12188121, 0.18509246, 0.32002081] KLD: 0.0002 MSE: 0.688

Q(x) : [0.33014522, 0.03053611, 0.30264458, 0.33667409] KLD: 0.1028 MSE: 0.1459

Why is P(x) better than Q(x) or vice versa ?

👍︎ 15

💬︎

👤︎ u/bizzarebrains

📅︎ Sep 19 2018

🚨︎ report

[D] Cross-entropy vs. mean-squared error loss

Suppose I have a label [0, 1, 0, 0] and predictions [.1, .7, .1, .1] and [.01, .7, .01, .28]

Cross-entropy would score these two predictions as the same, while MSE would punish the 2nd one more.

Isn't the 2nd prediction "worse" because it wrongly gave a pretty high score to a wrong class (.28)? So why use cross-entropy as opposed to MSE?

👍︎ 28

💬︎

👤︎ u/ME_PhD

📅︎ May 11 2018

🚨︎ report

Why do autoencoders use binary_crossentropy loss and not mean squared error?

Crosspost from a tweet of mine that is yet to be answered

Can someone please explain to me why all the autoencoder tutorials I can find use binary_crossentropy loss and not mean squared error, surely it’s a regression problem not classification or am I missing something?

👍︎ 3

💬︎

👤︎ u/geek_ki01100100

📅︎ Feb 23 2019

🚨︎ report

Do we need momentum for Mean Squared Error?

For a regression problem we have used the mean squared error, do we need adam Optimizer? I mean, for this, gradient descent might work like charm because we don't need momentum for mse because its loss function is convex. Am I right? Or is this confusing me?

👍︎ 2

💬︎

👤︎ u/SanjivGautamOfficial

📅︎ Jun 12 2019

🚨︎ report

Residual standard error vs mean of squared residuals?

I’ve come across the first term before in linear regression but the second one is something I got as an output via a random forest, trying to compare models but don’t know what to use to compare

👍︎ 5

💬︎

👤︎ u/Frogad

📅︎ Jun 25 2019

🚨︎ report

[P] Why Mean Squared Error and L2 regularization? A probabilistic justification. aoliver.org/why-mse

👍︎ 41

💬︎

👤︎ u/avitaloliver

📅︎ Mar 23 2017

🚨︎ report

[D] Question: Denormalize Mean Squared Error Output on Normalized Dataset?

Say I had a deep neural network, but the label inputs (continuous variable) to the neural network are normalized such that they fall within 0 and 1. I train my neural network with loss function mean squared error; Do I report the value of the denormalized MSE value?

👍︎ 4

💬︎

👤︎ u/Ayruai

📅︎ Feb 10 2019

🚨︎ report

On the Large-Sample Bias, Variance, and Mean Squared Error of the Conventional Noncentrality Parameter Estimator of Covariance Structure Models tandfonline.com/doi/abs/1…

👍︎ 2

💬︎

👤︎ u/TrannyPornO

📅︎ Feb 09 2019

🚨︎ report

minimization of least squared errors in system identification

Attached here is a screenshot of the Underactuated Robotics course at MIT:

https://preview.redd.it/k18bpx399n381.png?width=2333&format=png&auto=webp&s=c934b32100b597b21596b202c620f544b5316bb2

Apparently, y[n] is the predicted output and y_n is the measured output. However, why are minimizing the least squared errors between these two quantities? Shouldn't we minimize the least squared errors between the measured output and the measured input to and identify a plant?

👍︎ 11

💬︎

👤︎ u/Terminator_233

📅︎ Dec 05 2021

🚨︎ report

Great R-squared/Adjusted-R-Squared, awful RMSE or Mean Absolute Percent Error

Sorry guys this is more of a theory/concept question rather than an actual coding example.

I have built a couple of models with plain old OLS regression. They typically achieve adj-R2 of at least .95-.97. This to me is pretty damn good.

However, when I apply to a training setting set, the RMSE or MAPE are generally bad or at least given the requirements of the model. I'm getting anywhere between 25-50% Mean Absolute Percent Error.

Yes, I have tried doing some sorts of cross validation and mixing the testing and training sets together so its not biased against a given training/test set. However, that doesn't really help.

Just looking for some guidance or possible explanations on I have what I understand would be a great adj-r-squared, but such poor predictive ability. Really theories on why this might be happening.

Also, are my expectations too high, is this normal? What should I be saying to my coworkers? It just smells fishy to me that I am missing something...

Any wisdom is appreciated!!!

👍︎ 7

💬︎

👤︎ u/sports89

📅︎ Dec 21 2015

🚨︎ report

[Q] What's the difference between cross-entropy and mean-squared-error loss functions?

When training neural networks one can often hear that cross entropy is a better cost function than mean squared error.

Why is it the case if both have the same derivatives and therefore lead to the same updates in weights?

👍︎ 18

💬︎

👤︎ u/__AndrewB__

📅︎ Sep 11 2015

🚨︎ report

Bank of Canada: Bootstrapping Mean Squared Errors of Robust Small-Area Estimators: Application to the Method-of-Payments Data (PDF) bankofcanada.ca/2018/06/s…

👍︎ 2

💬︎

👤︎ u/Central_Bank_Bot

📅︎ Jun 26 2018

🚨︎ report