38 Hilarious Multivariate Laplace distribution Puns

Predicting Multivariate Conditional Probability Distribution

I’m trying to make a model to predict a probability distribution conditional on a set of predictors. I’ve tried a Mixture Density Network, but the covariance doesn’t seem to be captured - the bivariate distributions just look like a two gaussians plopped on top of each other, no correlation. Is there a more appropriate model to use here? Or should an MDN work and I’m just implementing it wrong?

👍︎ 3

💬︎

👤︎ u/still_tyler

📅︎ Jan 14 2022

🚨︎ report

Are there generalizations of the familiar univariate or multivariate distributions (like the normal or multivariate normal distributions) in countably many (or higher) dimensions?

I did find this textbook that seems to suggest the answer is yes; http://www.statslab.cam.ac.uk/~nickl/Site/__files/FULLPDF.pdf

However, it's obviously quite technical; I'm currently an undergrad and I'm sure I haven't covered enough of the prerequisites to understanding typical advanced treatments of the topic. I just find the topic really interesting.

But is there a simplified overview of this that's more accessible? Is there anything particularly interesting that happens when we switch from finitely many dimensions to infinitely many, for probability distributions in particular?

👍︎ 7

💬︎

👤︎ u/VankousFrost

📅︎ Oct 31 2021

🚨︎ report

[Question] If all the variables in a multivariate normal distribution are independent, can you write the pdf of the distribution as the product of the pdfs of the univariate normal distributions

👍︎ 14

💬︎

👤︎ u/super_saiyan1500

📅︎ Oct 04 2021

🚨︎ report

[Q] What's the best software for modelling multivariate gaussian distributions?

👍︎ 8

💬︎

👤︎ u/bakedpotatos136

📅︎ Sep 01 2021

🚨︎ report

[D] Properties of the Multivariate Normal Distribution

I am trying to better understand the conditional and marginal distributions of the normal probability distribution function: https://online.stat.psu.edu/stat505/lesson/6/6.1

"Any distribution for a subset of variables from a multivariate normal, conditional on known values for another subset of variables, is a multivariate normal distribution."

Suppose I have data corresponding to 3 variables : Var_1 , Var_2 and Var_3. I am interested in predicting Var_3 using Var_1 and Var_2.

Suppose I fit a multivariate normal distribution to this data - doesn't the multivariate normal distribution have special properties such that the conditional distribution of any of the variables within the multivariate normal distribution will also form a normal distribution? Suppose I want to predict the value of Var_3 when Var_1 = a AND Var_2 = b.

Couldn't I just "fix" the values of the other two variables and construct a conditional distribution for the response variable Prob (Var_3 | Var_1 = a and Var_2 = b) ? Shouldn't "Prob (Var_3 | Var_1 = a and Var_2 = b) " have a normal distribution? Could I not then generate a distribution (e.g. histogram) of acceptable values of this response variable given the "fixed" values of the other two variables? I think I should be able to sample from Prob (Var_3 | Var_1 = a and Var_2 = b) given that I have chosen a multivariate normal distribution? Then, I could take the Expected Value of " Prob (Var_3 | Var_1 = a and Var_2 = b) " to answer my question? E.g when "Var_1 = a and Var_2 = b", Var_3 is most likely to be equal to "c"?

https://imgur.com/a/4aTDkR1

Would this be considered a "generative model"? Is this a correct strategy in general? Does it make mathematical sense?

Note: I know that I could just fit a regular regression model to this problem, but I am trying to better understand how probability distribution functions work.

Thanks

👍︎ 3

💬︎

👤︎ u/blueest

📅︎ Sep 23 2021

🚨︎ report

Median Absolute Deviation for the Laplace distribution

In Differential Privacy, the required noise addition is often achieved by sampling values from the Laplace distribution (ie, the 'Laplace mechanism').

This means we usually think about the average relative error of a counting query as: [var(Lap(scale))] / [number of records]. One thing to note is that if you divide both the scale and the number of records by the same amount (ie, applying a record partitioning or subsampling trick to reduce sensitivity), you shouldn't see any improvement in that error metric: [var(Lap(scale/k))] / ([number of records]/k) = [var(Lap(scale))] / [number of records].

However, I've been wondering whether variance (or even mean absolute difference) is actually the correct way to think about the impact of noise addition. The Laplace distribution is a long tail distribution and, for small scale parameters, it's sharply skewed towards the origin. Ie, lots of small noise values, a few very large noise values. This is interesting because these noisy query results feed into other algorithms that build models, post-process, etc, to produce a final privatized analytic or data product... and this post processing may be more or less tolerant of different distributions of added noise. For example, if most noise values are very small, and only a few randomly sampled values are very large, it can be possible to use publicly known properties of the data space to do smoothing and reduce the impact of the large noise values. While that same trick might not be successful with a less sharply skewed distribution of noise values, even if the average noise value stayed the same.

So hopefully that's enough interesting motivation to justify me posting a fairly mundane question to r/math: Does anyone know the equation for the median absolute difference of the Laplace distribution?

👍︎ 3

💬︎

👤︎ u/EmmyNoetherRing

📅︎ Oct 15 2021

🚨︎ report

ELI5: What is a multivariate normal distribution and what is it used for?

👍︎ 7

💬︎

👤︎ u/MrPeanutBeL

📅︎ Aug 11 2021

🚨︎ report

Fix a heavy-tail distribution for multivariate regression

I'm trying to expand my stats skills by predicting the basket share for a customer from a few predictor variables, and I want to do it the right way. I definitely do not want to grab the data, jam it through the algorithm without consideration (I'm using Excel), and belch the results.

The main problem I see at this point is that ~~my data is~~ (edit: updated to be "the residuals are" as I was imprecise in the original post) not normally distributed. When I look at the normal probability plot, it doesn't follow the 45-degree line. The distribution most closely resembles the heavy-tailedness example from this site.

Ok, assuming that this is a heavy-tail distribution, I don't know how to fix it. Googling how to fix a heavy tail distribution has led to a bunch of snarky, unhelpful "answers" on StackOverflow. I think I need to transform the variables somehow so that they are normally distributed. I did see a suggestion for a log transform.

This sorta intuitively makes sense. One variable I'm using is total sales, which has a few really large customers that may be considered outliers. Using a log transform dampens the effect of these big guys.

Aside from that, what else can I do? How do I determine if each predictor variable is normally distributed or not?

Thanks for any advice you can give.

👍︎ 8

💬︎

👤︎ u/babbocom

📅︎ Jun 14 2021

🚨︎ report

Is there a commonly-used multivariate distribution for a vector of binary random variables?

I'm in a Bayesian modeling situation that I imagine is quite common, but I can't seem to find a mainstream distribution for this circumstance. Apologies if this is what they call a newbie question.

I want to set a prior on a random variable C which is a vector of binary-valued variables. I have a good understanding of Cov(C_i, C_j) as well as Var(C_i). I feel like there should be a pretty standard way of encoding this into a prior but the distributions I've found such as Multivariate Bernoulli seem pretty scarcely used, at least in the sense that it isn't implemented in any of the common MCMC packages.

My instinct is just to implement it myself, but I feel like by virtue of it being uncommon there is likely a better all around choice. Could be a cognitive bias tho

What would you do in this circumstance? Is there a common choice?

👍︎ 3

💬︎

👤︎ u/shoegraze

📅︎ Aug 18 2021

🚨︎ report

[D] Confusion Surrounding the Multivariate Normal Distribution

I am trying to teach myself about the multivariate normal distribution and I am struggling to understand some basic things about it.

To show my confusion, I use the famous Iris Flower dataset (I will use the R programming language for some basic scripts). The Iris Flower dataset has 5 columns and 150 rows. Each row contains the measurements for an individual flower (i.e. there are 150 flowers). The columns contain the measurements of the "Petal Width", the "Petal Length", the "Sepal Length" , the "Sepal Width" and the "Type of Flower" (three types of flowers, categorical variable).

Suppose I Just take the Petal Length variable. I want to see if the Petal Length follows a (univariate) normal distribution. I think this can be easily done using different strategies (R code below):

#load the iris data and isolate the petal length
 data(iris) 
var1 = iris$Petal.Length 

 #visually check if the distribution of the petal length looks like a "bell curve" plot(density(var1)) 

 #look at the quantile-quantile
 plot qqnorm(var1)  

#use statistics (e.g. the shapiro-wilks test) to check for normality  
shapiro.test(var1)

#if the data is normally distributed, we can find out the mean and the variance
mean(var1)
var(var1)

Similarly, I can repeat this for the remaining variables in the iris data. However, this task becomes a lot more complicated when you consider the multivariate distribution of the iris data : https://en.wikipedia.org/wiki/Multivariate_normal_distribution . When dealing with the multivariate distribution, there is now a "vector of means" and a "variance-covariance matrix". This means that there are more complex relationships within the data - some parts of the data might have a normal distribution whereas some parts of the data might not be normally distributed.

After spending some time researching how to determine if a dataset follows a multivariate distribution , I found out about something called the Mardia test, which apparently uses the "skewness" and the "kurtosis" to determine if the data is normally distributed (high skewness and high kurtosis means the data is not normally distributed). I tried running the following code in R to perform the Marida test on the iris data:

library(MVN)
 data(iris)
 data = iris[,-5] 
result = mvn(data) 
result

The results of this are confusing. I am not sure

... keep reading on reddit ➡

👍︎ 67

💬︎

👤︎ u/ottawalanguages

📅︎ Jan 23 2021

🚨︎ report

[REDDIT DOES NOT RECOMMEND THIS!] (So of COURSE you should see it!) To use LaPlace Distribution and Normal Distribution to trade bitcoin and other volatile assets like cardano and other cryptocurrencies, then Watch This educational Video! youtube.com/watch?v=3zRcT…

👍︎ 2

💬︎

👤︎ u/janediscovers

📅︎ Aug 31 2021

🚨︎ report

If the paper doesn't mention anything about the assumption of the type of distribution for different covariates, is it safe to assume they're assuming normal distribution? And in what cases should one assume, for example, Laplace instead of normal distribution?

I was just looking at a machine learning package for Python/R called 'Prophet' by Facebook which is making a bit of noise in the machine learning/data science world due to its simplicity, especially in Python. Here's a summary:

Time series algorithm, automatically standardises time and predicted variable
Automatically de-trending on three components -- long term (over the entire data?), monthly/weekly/days of the week, and holidays (comes with holidays data for different regions)
By default, it fits 25 linear or logistic models over the first 80% time of the data
By default, assumes Laplace distribution for the covariates

Prior to reading this, it always bothered me when papers don't mention what type of distribution each of the covariates are, unless of course, graphically shown individually (so stuff like Poisson distribution is easier to tell). Now that I'm reading above linked article, it took me by surprise that the package assumes Laplace distribution for all the covariates by default.

So then my question is two-fold. Is it ok to assume normal distribution if the paper doesn't mention anything for each of the covariates (some papers mention them for few to many covariates, and I think some of them may be obvious so no need to mention, though maybe not obvious to all)? On top of this, why would one choose to assume Laplace instead of normal distribution? What would be the advantages/disadvantages of such decision, and what would be the result on the estimates/errors/bias?

👍︎ 8

💬︎

👤︎ u/jinnyjuice

📅︎ May 15 2021

🚨︎ report

Multivariate Calculus teacher (calc III) gives us problems that involve the Laplace operator. For my own benefit, what can you tell me about this operator and how does it relate to these problems?

u(x,y) = 3x^(2)y -y^(3). Show that u_xx + u_yy = 0. (∆u = 0).
u(x,y) = ln(x^2 + y^(2)). Show that ∆u = 0.
Derive the Laplace operator for polar coordinates (using other words). Basically show that the bottom is true for u(x,y) when [;x = r\cos(\theta), y = r\sin(\theta);].

I am not planning on using this operator or any of these techniques on the test or quizzes, I am just wondering what these problems say about this operator and any interesting properties that one should know about it.

👍︎ 2

💬︎

👤︎ u/actualmfa

📅︎ Oct 07 2013

🚨︎ report

Somewhere between cdf and pdf for multivariate normal distribution

Hi there!

Lets say we for instance have a 3-dimensional normal distribution:

(X1, X2, X3) ~ N(mean, cov)

The pdf is:

f(x1, x2, x3) = P(X1 = x1, X2 = x2, X3 = x3)

and the cdf is:

F(x1, x2, x3) = P(X1 <= x1, X2 <= x2, X3 <= x3)

To calculate the cdf and pdf of a multivariate normal distribution is readily available in most programming languages.

What I need however, is something more flexible. I need a function which I will call Flex, that can take both single possible values of a variable (like a pdf does) and a range of possible values (like a cdf does). Here is an example:

Flex(0.5 <=> 0.9, 0.2, 0.3 <=> 0.7) = P(0.5 <= X1 < 0.9, X2 = 0.2, 0.3 <= X3 <= 0.7)

In this example, I want X1 and X3 to be in a certain interval possible of values, and X2 to have exactly the value 0.2.

Now, I realize that the resulting value is not that interpretable, but that does not matter. I have a gaussian mixture model, and I need to figure out which cluster an observation is most likely to belong in, when what we know about the observation is something like:

x1 is somewhere between 0.2 and 0.7, x2 is exactly 0.3 and so on.

If anyone know if something like this exists, or if it is possible to implement, that would be very helpful!

👍︎ 3

💬︎

👤︎ u/allegedlygoodlooking

📅︎ Mar 04 2021

🚨︎ report

Partial derivative of multivariate gaussian cumulative distribution

Hi, I want to take the partial derivative of this multivariate gaussian cumulative distribution function with respect to beta_1 (which is a single element of the beta vector). X_1 is a n times z matrix, X_2 is a p times z matrix, beta is a z times 1 vector , H is a p times n matrix, F is a p times 1 vector and T is a symmetric, positive-definite p times p matrix. In the univariate case the solution is straightforward with the chain rule, but I'm a bit struggling with the generalized chain rule in this case.

https://preview.redd.it/48vd98m2ong61.png?width=296&format=png&auto=webp&s=d2ab5a60151339408c958f2c16adfa9eec4f661b

👍︎ 21

💬︎

👤︎ u/Margaux408

📅︎ Feb 10 2021

🚨︎ report

Gaussian Distribution, Gaussian Multivariate, Gaussian Surface, Gauss Elimination and so on...

👍︎ 851

💬︎

👤︎ u/ExperiencedSoup

📅︎ Jun 30 2020

🚨︎ report

Multivariate normal distribution

Can someone please explain to me how this works?

Suppose i have 3 variables: earnings, savings, debt

I have these 3 variables recorded for 1000 people.

How do you check if this data is normally distributed? For a single variable, I could use the kolmogorov-smirnov test. But how would you check if these 3 variables are jointly normally distributed?
Assuming that this data is normally distributed, how do you calculate the joint multivariate normal distribution of this data? For a single variable, assuming a normal distribution, i could take all the observations:

Mu = sum(xi)/n .. for all values of n

Sigma = sqrt((sum(xi-mu)^2) / n) .... for all values of n

But if there are 3 variables:

Mu vector: (mu1, mu2, mu3)

Sigma-covariance matrix: (sig11, sig12, sig13, sig21, sig22, sig23, sig31, sig32, sig33)

Is this how you would define the multivariate distribution for this example?

does the concept of z score still apply here? Foe a given point (x1, x2, x3) could you define a z-vector : (z1, z2, z3) and take the norm of this vector z = sqrt((z1^2 + z2^2 + z3^2)) ... and define how far this point is away from the center of the multivariate distribution, thus more likely this point is an outlier?

Thanks

👍︎ 9

💬︎

👤︎ u/blueest

📅︎ Jan 14 2021

🚨︎ report

in Euclidean space defined by multivariate normal distribution, fraction of points inside n-ball tangent to point p

What fraction of all points in a Euclidean space lie within (rather than outside of) the n-ball whose center is the orogin and which is tangent to the point p represented by Cartesian coordinates:

vector(θ) =(θ^1, θ^2, θ^3, θ^4, θ^5, ... θ^n )

representing sigmas in the multivariate normal distribution in n dimensions as illustrated at the top of the link?

👍︎ 3

💬︎

👤︎ u/aputnamist

📅︎ Feb 14 2021

🚨︎ report

[ASAP] Effect of Linker Distribution in the Photocatalytic Activity of Multivariate Mesoporous Crystals

Journal of the American Chemical SocietyDOI: 10.1021/jacs.0c09015

Belén Lerma-Berlanga, Carolina R. Ganivet, Neyvis Almora-Barrios, Sergio Tatay, Yong Peng, Josep Albero, Oscar Fabelo, Javier González-Platas, Hermenegildo García, Natalia M. Padial, and Carlos Martí-Gastaldo

https://ift.tt/2Lhfb9T

👍︎ 2

💬︎

👤︎ u/TomisMeMyselfandI

📅︎ Jan 12 2021

🚨︎ report

In Variational Autoencoders, does the generative model generates samples from latent variables which are sampled from a multivariate distribution? If yes, then is this similar in case of GANs?

👍︎ 7

💬︎

👤︎ u/HTKasd

📅︎ May 10 2020

🚨︎ report

Understanding Maths and Intuition for Multivariate Gaussian Distribution | Machine Learning Fundamentals youtu.be/6W2mkOfzitk

👍︎ 27

💬︎

👤︎ u/prakhar21

📅︎ Aug 10 2020

🚨︎ report

Understanding Maths and Intuition for Multivariate Gaussian Distribution | Machine Learning Fundamentals youtu.be/6W2mkOfzitk

👍︎ 18

💬︎

👤︎ u/prakhar21

📅︎ Aug 10 2020

🚨︎ report

[Q] Why does simple Bayesian regression (i.e. y = Bx + c) not use a multivariate distribution prior for intercept and slope? Why are intercepts and slopes only treated as correlated in hierarchical/pooled/random models when modelling within-group variation?

In hierarchical / pooled models there is the intuition that slope and intercept are often correlated within each sub-group.
So why does "simple" regression, i.e one intercept and one slope (and no random intercepts) treat the intercept and slope as uncorrelated?

In other words, why do Bayesian approaches model regression as (for example):

y ~ normal(mean, sigma)
mean = intercept + slope* x
intercept ~ normal(0,10)
slope ~normal(0,10)
sigma ~ exp(1)

instead of :

y ~ normal(mean, sigma)
mean = intercept + slope* x
[intercept, slope] ~ multivariatenormal(.....)
sigma ~ exp(1)

Thanks

👍︎ 25

💬︎

👤︎ u/gmgmgmgmgmgm

📅︎ Nov 25 2019

🚨︎ report

[Probabilities] Finding a conditional distribution in multivariate case

Hi! So I know this question is pretty obvious to some, but suppose we know the distribution of
P(a = i, b = j, c = k) for i, j, k \in {0, 1}. Furthermore suppose we know the marginal distribution of P_a that happens to be strictly positive. If we want to calculate the conditional distribution b and c given a, so P_{b, c | a}, we can just simply divide each value in our original joint distribution with the corresponding marginal distribution value of a, right?

👍︎ 2

💬︎

👤︎ u/wabhabin

📅︎ Sep 26 2020

🚨︎ report

Interpolation between multivariate Gaussian distributions

Hey all, Is there a way to statistically interpolate between different multivariate Gaussian distributions? I think for mean vectors linear interpolation might work, but not sure for the covariance matrices. At the most basic level, given two distributions and two "weights" adding up to 1, I would like to find out the "weighted mixture" of two distributions. Can you point me relevant research areas or papers? Thank you.

👍︎ 3

💬︎

👤︎ u/cheeky_bastard__

📅︎ Feb 20 2020

🚨︎ report

Separate mixture of multivariate normal distributions

1.How to separate mixture of two or more multivariate distributions.

2.In a multivariate sample data, which is mixture of many distributions, has some categorical columns as well, in that case how to separate them.

👍︎ 5

💬︎

👤︎ u/eyeswideshhh

📅︎ Apr 25 2020

🚨︎ report

Addendum: Multivariate Distribution Estimate of Prause et al

👍︎ 5

💬︎

👤︎ u/Attacksquad2

📅︎ Mar 15 2020

🚨︎ report

Example of Normal (Laplace-Gauss) Distribution in the gym

👍︎ 367

💬︎

👤︎ u/totalinfonet

📅︎ Jul 23 2019

🚨︎ report

Understanding Multivariate Gaussian Distribution | Machine Learning Fundamentals youtu.be/6W2mkOfzitk

👍︎ 6

💬︎

👤︎ u/prakhar21

📅︎ Aug 09 2020

🚨︎ report

Understanding Maths and Intuition behind Multivariate Gaussian Distribution | Machine Learning Fundamentals youtu.be/6W2mkOfzitk

👍︎ 2

💬︎

👤︎ u/prakhar21

📅︎ Aug 10 2020

🚨︎ report

Use inverse matrix gamma distribution as prior for covariance matrix of multivariate normal (in Python)

Hi, I'm trying to reimplement the Bayesian model from this paper. They mention in the Supplemental Information that they assume a multivariate prior on the weights -- I know how to deal with the mean vector, but they say that "The covariance matrix is defined by an Inverse-Gamma distribution with the two hyperparameters (a, b). The simulation sets the initial values of the two hyperparameters as (a0 = 1, b0 = 5)." I'm trying to do this in PyMC3, and I don't see how to define the covariance matrix with this distribution (is the inverse-wishart really what I want?)? I would also give PyStan a shot if someone knew how to do this there. This is my first foray into Bayesian modeling, so any help would be hugely appreciated.

👍︎ 35

💬︎

👤︎ u/squirreltalk

📅︎ Jun 09 2019

🚨︎ report

[University Grade Statistics: Multivariate Statistics] I need derive distributions in multivariate statistics

Don't get me wrong guys, I made some progress but I need to get full mark.

https://preview.redd.it/caea33ipy0g51.png?width=677&format=png&auto=webp&s=e50a499696bacb5d2e7abca102f712b8b65415a7

https://preview.redd.it/nolfp0cqy0g51.png?width=654&format=png&auto=webp&s=6aa759469670a039e5f21d5bba2ba3e8f28cd00e

https://preview.redd.it/ulx0q69ry0g51.png?width=592&format=png&auto=webp&s=06540ab6cf5336f9b5f937a89da8f8fa9d50b48f

👍︎ 2

💬︎

👤︎ u/yu2tu

📅︎ Aug 09 2020

🚨︎ report

[OC] Visualization of 1D slices of 2D Multivariate Normal Distribution v.redd.it/833o2cu1pva21

👍︎ 62

💬︎

👤︎ u/EricJEarley

📅︎ Jan 17 2019

🚨︎ report

[OC] Visualization of 1D slices of 2D Multivariate Normal Distribution (v2) v.redd.it/0dgo57eue0b21

👍︎ 42

💬︎

👤︎ u/EricJEarley

📅︎ Jan 17 2019

🚨︎ report

Multivariate normal distribution

Can someone please explain to me how this works?

Suppose i have 3 variables: earnings, savings, debt

I have these 3 variables recorded for 1000 people.

How do you check if this data is normally distributed? For a single variable, I could use the kolmogorov-smirnov test. But how would you check if these 3 variables are jointly normally distributed?
Assuming that this data is normally distributed, how do you calculate the joint multivariate normal distribution of this data? For a single variable, assuming a normal distribution, i could take all the observations:

Mu = sum(xi)/n .. for all values of n

Sigma = sqrt((sum(xi-mu)^2) / n) .... for all values of n

But if there are 3 variables:

Mu vector: (mu1, mu2, mu3)

Sigma-covariance matrix: (sig11, sig12, sig13, sig21, sig22, sig23, sig31, sig32, sig33)

Is this how you would define the multivariate distribution for this example?

does the concept of z score still apply here? Foe a given point (x1, x2, x3) could you define a z-vector : (z1, z2, z3) and take the norm of this vector z = sqrt((z1^2 + z2^2 + z3^2)) ... and define how far this point is away from the center of the multivariate distribution, thus more likely this point is an outlier?

Thanks

👍︎ 2

💬︎

👤︎ u/blueest

📅︎ Jan 14 2021

🚨︎ report

Multivariate Normal Distributions

I am trying to teach myself about the multivariate normal distribution and I am struggling to understand some basic things about it.

To show my confusion, I use the famous Iris Flower dataset (I will use the R programming language for some basic scripts). The Iris Flower dataset has 5 columns and 150 rows. Each row contains the measurements for an individual flower (i.e. there are 150 flowers). The columns contain the measurements of the "Petal Width", the "Petal Length", the "Sepal Length" , the "Sepal Width" and the "Type of Flower" (three types of flowers, categorical variable).

Suppose I Just take the Petal Length variable. I want to see if the Petal Length follows a (univariate) normal distribution. I think this can be easily done using different strategies (R code below):

#load the iris data and isolate the petal length  
data(iris)  
var1 = iris$Petal.Length    

#visually check if the distribution of the petal length looks like a "bell curve" plot(density(var1))   

 #look at the quantile-quantile
  plot qqnorm(var1)  

  #use statistics (e.g. the shapiro-wilks test) to check for normality  
 shapiro.test(var1) 
 #if the data is normally distributed, we can find out the mean and the variance
 mean(var1) 
var(var1)

Similarly, I can repeat this for the remaining variables in the iris data. However, this task becomes a lot more complicated when you consider the multivariate distribution of the iris data : https://en.wikipedia.org/wiki/Multivariate_normal_distribution . When dealing with the multivariate distribution, there is now a "vector of means" and a "variance-covariance matrix". This means that there are more complex relationships within the data - some parts of the data might have a normal distribution whereas some parts of the data might not be normally distributed.

After spending some time researching how to determine if a dataset follows a multivariate distribution , I found out about something called the Mardia test, which apparently uses the "skewness" and the "kurtosis" to determine if the data is normally distributed (high skewness and high kurtosis means the data is not normally distributed). I tried running the following code in R to perform the Marida test on the iris data:

library(MVN)
  data(iris)  
data = iris[,-5] 
 result = mvn(data) 
 result

The results of this are confusing. I

... keep reading on reddit ➡

👍︎ 3

💬︎

👤︎ u/ottawalanguages

📅︎ Jan 23 2021

🚨︎ report

Multivariate Normal Distribution

If we have X_1,X_2,...,X_n where all of them are univariate normal distributed and we also assume that they are independent, is the vector X=(X_1,...,X_n) multivariate normal distributed?

👍︎ 2

💬︎

👤︎ u/Constant_Pitch801

📅︎ Nov 23 2020

🚨︎ report

In Variational Autoencoders, does the generative model generates samples from latent variables which are sampled from a multivariate distribution? If yes, then is this similar in case of GANs?

👍︎ 9

💬︎

👤︎ u/HTKasd

📅︎ May 10 2020

🚨︎ report