Predicting Multivariate Conditional Probability Distribution

I’m trying to make a model to predict a probability distribution conditional on a set of predictors. I’ve tried a Mixture Density Network, but the covariance doesn’t seem to be captured - the bivariate distributions just look like a two gaussians plopped on top of each other, no correlation. Is there a more appropriate model to use here? Or should an MDN work and I’m just implementing it wrong?

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/still_tyler
πŸ“…︎ Jan 14 2022
🚨︎ report
Are there generalizations of the familiar univariate or multivariate distributions (like the normal or multivariate normal distributions) in countably many (or higher) dimensions?

I did find this textbook that seems to suggest the answer is yes; http://www.statslab.cam.ac.uk/~nickl/Site/__files/FULLPDF.pdf

However, it's obviously quite technical; I'm currently an undergrad and I'm sure I haven't covered enough of the prerequisites to understanding typical advanced treatments of the topic. I just find the topic really interesting.

But is there a simplified overview of this that's more accessible? Is there anything particularly interesting that happens when we switch from finitely many dimensions to infinitely many, for probability distributions in particular?

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/VankousFrost
πŸ“…︎ Oct 31 2021
🚨︎ report
[Question] If all the variables in a multivariate normal distribution are independent, can you write the pdf of the distribution as the product of the pdfs of the univariate normal distributions
πŸ‘︎ 14
πŸ’¬︎
πŸ‘€︎ u/super_saiyan1500
πŸ“…︎ Oct 04 2021
🚨︎ report
[Q] What's the best software for modelling multivariate gaussian distributions?
πŸ‘︎ 8
πŸ’¬︎
πŸ‘€︎ u/bakedpotatos136
πŸ“…︎ Sep 01 2021
🚨︎ report
[D] Properties of the Multivariate Normal Distribution

I am trying to better understand the conditional and marginal distributions of the normal probability distribution function: https://online.stat.psu.edu/stat505/lesson/6/6.1

"Any distribution for a subset of variables from a multivariate normal, conditional on known values for another subset of variables, is a multivariate normal distribution."

Suppose I have data corresponding to 3 variables : Var_1 , Var_2 and Var_3. I am interested in predicting Var_3 using Var_1 and Var_2.

Suppose I fit a multivariate normal distribution to this data - doesn't the multivariate normal distribution have special properties such that the conditional distribution of any of the variables within the multivariate normal distribution will also form a normal distribution? Suppose I want to predict the value of Var_3 when Var_1 = a AND Var_2 = b.

Couldn't I just "fix" the values of the other two variables and construct a conditional distribution for the response variable Prob (Var_3 | Var_1 = a and Var_2 = b) ? Shouldn't "Prob (Var_3 | Var_1 = a and Var_2 = b) " have a normal distribution? Could I not then generate a distribution (e.g. histogram) of acceptable values of this response variable given the "fixed" values of the other two variables? I think I should be able to sample from Prob (Var_3 | Var_1 = a and Var_2 = b) given that I have chosen a multivariate normal distribution? Then, I could take the Expected Value of " Prob (Var_3 | Var_1 = a and Var_2 = b) " to answer my question? E.g when "Var_1 = a and Var_2 = b", Var_3 is most likely to be equal to "c"?

https://imgur.com/a/4aTDkR1

Would this be considered a "generative model"? Is this a correct strategy in general? Does it make mathematical sense?

Note: I know that I could just fit a regular regression model to this problem, but I am trying to better understand how probability distribution functions work.

Thanks

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/blueest
πŸ“…︎ Sep 23 2021
🚨︎ report
Median Absolute Deviation for the Laplace distribution

In Differential Privacy, the required noise addition is often achieved by sampling values from the Laplace distribution (ie, the 'Laplace mechanism').

This means we usually think about the average relative error of a counting query as: [var(Lap(scale))] / [number of records]. One thing to note is that if you divide both the scale and the number of records by the same amount (ie, applying a record partitioning or subsampling trick to reduce sensitivity), you shouldn't see any improvement in that error metric: [var(Lap(scale/k))] / ([number of records]/k) = [var(Lap(scale))] / [number of records].

However, I've been wondering whether variance (or even mean absolute difference) is actually the correct way to think about the impact of noise addition. The Laplace distribution is a long tail distribution and, for small scale parameters, it's sharply skewed towards the origin. Ie, lots of small noise values, a few very large noise values. This is interesting because these noisy query results feed into other algorithms that build models, post-process, etc, to produce a final privatized analytic or data product... and this post processing may be more or less tolerant of different distributions of added noise. For example, if most noise values are very small, and only a few randomly sampled values are very large, it can be possible to use publicly known properties of the data space to do smoothing and reduce the impact of the large noise values. While that same trick might not be successful with a less sharply skewed distribution of noise values, even if the average noise value stayed the same.

So hopefully that's enough interesting motivation to justify me posting a fairly mundane question to r/math: Does anyone know the equation for the median absolute difference of the Laplace distribution?

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/EmmyNoetherRing
πŸ“…︎ Oct 15 2021
🚨︎ report
ELI5: What is a multivariate normal distribution and what is it used for?
πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/MrPeanutBeL
πŸ“…︎ Aug 11 2021
🚨︎ report
Fix a heavy-tail distribution for multivariate regression

I'm trying to expand my stats skills by predicting the basket share for a customer from a few predictor variables, and I want to do it the right way. I definitely do not want to grab the data, jam it through the algorithm without consideration (I'm using Excel), and belch the results.

The main problem I see at this point is that my data is (edit: updated to be "the residuals are" as I was imprecise in the original post) not normally distributed. When I look at the normal probability plot, it doesn't follow the 45-degree line. The distribution most closely resembles the heavy-tailedness example from this site.

Ok, assuming that this is a heavy-tail distribution, I don't know how to fix it. Googling how to fix a heavy tail distribution has led to a bunch of snarky, unhelpful "answers" on StackOverflow. I think I need to transform the variables somehow so that they are normally distributed. I did see a suggestion for a log transform.

This sorta intuitively makes sense. One variable I'm using is total sales, which has a few really large customers that may be considered outliers. Using a log transform dampens the effect of these big guys.

Aside from that, what else can I do? How do I determine if each predictor variable is normally distributed or not?

Thanks for any advice you can give.

πŸ‘︎ 8
πŸ’¬︎
πŸ‘€︎ u/babbocom
πŸ“…︎ Jun 14 2021
🚨︎ report
Is there a commonly-used multivariate distribution for a vector of binary random variables?

I'm in a Bayesian modeling situation that I imagine is quite common, but I can't seem to find a mainstream distribution for this circumstance. Apologies if this is what they call a newbie question.

I want to set a prior on a random variable C which is a vector of binary-valued variables. I have a good understanding of Cov(C_i, C_j) as well as Var(C_i). I feel like there should be a pretty standard way of encoding this into a prior but the distributions I've found such as Multivariate Bernoulli seem pretty scarcely used, at least in the sense that it isn't implemented in any of the common MCMC packages.

My instinct is just to implement it myself, but I feel like by virtue of it being uncommon there is likely a better all around choice. Could be a cognitive bias tho

What would you do in this circumstance? Is there a common choice?

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/shoegraze
πŸ“…︎ Aug 18 2021
🚨︎ report
[D] Confusion Surrounding the Multivariate Normal Distribution

I am trying to teach myself about the multivariate normal distribution and I am struggling to understand some basic things about it.

To show my confusion, I use the famous Iris Flower dataset (I will use the R programming language for some basic scripts). The Iris Flower dataset has 5 columns and 150 rows. Each row contains the measurements for an individual flower (i.e. there are 150 flowers). The columns contain the measurements of the "Petal Width", the "Petal Length", the "Sepal Length" , the "Sepal Width" and the "Type of Flower" (three types of flowers, categorical variable).

Suppose I Just take the Petal Length variable. I want to see if the Petal Length follows a (univariate) normal distribution. I think this can be easily done using different strategies (R code below):

#load the iris data and isolate the petal length
 data(iris) 
var1 = iris$Petal.Length 

 #visually check if the distribution of the petal length looks like a "bell curve" plot(density(var1)) 

 #look at the quantile-quantile
 plot qqnorm(var1)  

#use statistics (e.g. the shapiro-wilks test) to check for normality  
shapiro.test(var1)

#if the data is normally distributed, we can find out the mean and the variance
mean(var1)
var(var1)

Similarly, I can repeat this for the remaining variables in the iris data. However, this task becomes a lot more complicated when you consider the multivariate distribution of the iris data : https://en.wikipedia.org/wiki/Multivariate_normal_distribution . When dealing with the multivariate distribution, there is now a "vector of means" and a "variance-covariance matrix". This means that there are more complex relationships within the data - some parts of the data might have a normal distribution whereas some parts of the data might not be normally distributed.

After spending some time researching how to determine if a dataset follows a multivariate distribution , I found out about something called the Mardia test, which apparently uses the "skewness" and the "kurtosis" to determine if the data is normally distributed (high skewness and high kurtosis means the data is not normally distributed). I tried running the following code in R to perform the Marida test on the iris data:

library(MVN)
 data(iris)
 data = iris[,-5] 
result = mvn(data) 
result

The results of this are confusing. I am not sure

... keep reading on reddit ➑

πŸ‘︎ 67
πŸ’¬︎
πŸ‘€︎ u/ottawalanguages
πŸ“…︎ Jan 23 2021
🚨︎ report
[REDDIT DOES NOT RECOMMEND THIS!] (So of COURSE you should see it!) To use LaPlace Distribution and Normal Distribution to trade bitcoin and other volatile assets like cardano and other cryptocurrencies, then Watch This educational Video! youtube.com/watch?v=3zRcT…
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/janediscovers
πŸ“…︎ Aug 31 2021
🚨︎ report
If the paper doesn't mention anything about the assumption of the type of distribution for different covariates, is it safe to assume they're assuming normal distribution? And in what cases should one assume, for example, Laplace instead of normal distribution?

I was just looking at a machine learning package for Python/R called 'Prophet' by Facebook which is making a bit of noise in the machine learning/data science world due to its simplicity, especially in Python. Here's a summary:

  • Time series algorithm, automatically standardises time and predicted variable

  • Automatically de-trending on three components -- long term (over the entire data?), monthly/weekly/days of the week, and holidays (comes with holidays data for different regions)

  • By default, it fits 25 linear or logistic models over the first 80% time of the data

  • By default, assumes Laplace distribution for the covariates

Prior to reading this, it always bothered me when papers don't mention what type of distribution each of the covariates are, unless of course, graphically shown individually (so stuff like Poisson distribution is easier to tell). Now that I'm reading above linked article, it took me by surprise that the package assumes Laplace distribution for all the covariates by default.

So then my question is two-fold. Is it ok to assume normal distribution if the paper doesn't mention anything for each of the covariates (some papers mention them for few to many covariates, and I think some of them may be obvious so no need to mention, though maybe not obvious to all)? On top of this, why would one choose to assume Laplace instead of normal distribution? What would be the advantages/disadvantages of such decision, and what would be the result on the estimates/errors/bias?

πŸ‘︎ 8
πŸ’¬︎
πŸ‘€︎ u/jinnyjuice
πŸ“…︎ May 15 2021
🚨︎ report
Multivariate Calculus teacher (calc III) gives us problems that involve the Laplace operator. For my own benefit, what can you tell me about this operator and how does it relate to these problems?
  1. u(x,y) = 3x^(2)y -y^(3). Show that u_xx + u_yy = 0. (βˆ†u = 0).

  2. u(x,y) = ln(x^2 + y^(2)). Show that βˆ†u = 0.

  3. Derive the Laplace operator for polar coordinates (using other words). Basically show that the bottom is true for u(x,y) when [;x = r\cos(\theta), y = r\sin(\theta);].

I am not planning on using this operator or any of these techniques on the test or quizzes, I am just wondering what these problems say about this operator and any interesting properties that one should know about it.

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/actualmfa
πŸ“…︎ Oct 07 2013
🚨︎ report
Somewhere between cdf and pdf for multivariate normal distribution

Hi there!

Lets say we for instance have a 3-dimensional normal distribution:

(X1, X2, X3) ~ N(mean, cov)

The pdf is:

f(x1, x2, x3) = P(X1 = x1, X2 = x2, X3 = x3)

and the cdf is:

F(x1, x2, x3) = P(X1 <= x1, X2 <= x2, X3 <= x3)

To calculate the cdf and pdf of a multivariate normal distribution is readily available in most programming languages.

What I need however, is something more flexible. I need a function which I will call Flex, that can take both single possible values of a variable (like a pdf does) and a range of possible values (like a cdf does). Here is an example:

Flex(0.5 <=> 0.9, 0.2, 0.3 <=> 0.7) = P(0.5 <= X1 < 0.9, X2 = 0.2, 0.3 <= X3 <= 0.7)

In this example, I want X1 and X3 to be in a certain interval possible of values, and X2 to have exactly the value 0.2.

Now, I realize that the resulting value is not that interpretable, but that does not matter. I have a gaussian mixture model, and I need to figure out which cluster an observation is most likely to belong in, when what we know about the observation is something like:

x1 is somewhere between 0.2 and 0.7, x2 is exactly 0.3 and so on.

If anyone know if something like this exists, or if it is possible to implement, that would be very helpful!

πŸ‘︎ 3
πŸ’¬︎
πŸ“…︎ Mar 04 2021
🚨︎ report
Partial derivative of multivariate gaussian cumulative distribution

Hi, I want to take the partial derivative of this multivariate gaussian cumulative distribution function with respect to beta_1 (which is a single element of the beta vector). X_1 is a n times z matrix, X_2 is a p times z matrix, beta is a z times 1 vector , H is a p times n matrix, F is a p times 1 vector and T is a symmetric, positive-definite p times p matrix. In the univariate case the solution is straightforward with the chain rule, but I'm a bit struggling with the generalized chain rule in this case.

https://preview.redd.it/48vd98m2ong61.png?width=296&format=png&auto=webp&s=d2ab5a60151339408c958f2c16adfa9eec4f661b

πŸ‘︎ 21
πŸ’¬︎
πŸ‘€︎ u/Margaux408
πŸ“…︎ Feb 10 2021
🚨︎ report
Gaussian Distribution, Gaussian Multivariate, Gaussian Surface, Gauss Elimination and so on...
πŸ‘︎ 851
πŸ’¬︎
πŸ‘€︎ u/ExperiencedSoup
πŸ“…︎ Jun 30 2020
🚨︎ report
Multivariate normal distribution

Can someone please explain to me how this works?

Suppose i have 3 variables: earnings, savings, debt

I have these 3 variables recorded for 1000 people.

  1. How do you check if this data is normally distributed? For a single variable, I could use the kolmogorov-smirnov test. But how would you check if these 3 variables are jointly normally distributed?

  2. Assuming that this data is normally distributed, how do you calculate the joint multivariate normal distribution of this data? For a single variable, assuming a normal distribution, i could take all the observations:

Mu = sum(xi)/n .. for all values of n

Sigma = sqrt((sum(xi-mu)^2) / n) .... for all values of n

But if there are 3 variables:

Mu vector: (mu1, mu2, mu3)

Sigma-covariance matrix: (sig11, sig12, sig13, sig21, sig22, sig23, sig31, sig32, sig33)

Is this how you would define the multivariate distribution for this example?

  1. does the concept of z score still apply here? Foe a given point (x1, x2, x3) could you define a z-vector : (z1, z2, z3) and take the norm of this vector z = sqrt((z1^2 + z2^2 + z3^2)) ... and define how far this point is away from the center of the multivariate distribution, thus more likely this point is an outlier?

Thanks

πŸ‘︎ 9
πŸ’¬︎
πŸ‘€︎ u/blueest
πŸ“…︎ Jan 14 2021
🚨︎ report
in Euclidean space defined by multivariate normal distribution, fraction of points inside n-ball tangent to point p

What fraction of all points in a Euclidean space lie within (rather than outside of) the n-ball whose center is the orogin and which is tangent to the point p represented by Cartesian coordinates:

vector(ΞΈ) =(ΞΈ^1, ΞΈ^2, ΞΈ^3, ΞΈ^4, ΞΈ^5, ... ΞΈ^n )

representing sigmas in the multivariate normal distribution in n dimensions as illustrated at the top of the link?

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/aputnamist
πŸ“…︎ Feb 14 2021
🚨︎ report
[ASAP] Effect of Linker Distribution in the Photocatalytic Activity of Multivariate Mesoporous Crystals

Journal of the American Chemical SocietyDOI: 10.1021/jacs.0c09015

Belén Lerma-Berlanga, Carolina R. Ganivet, Neyvis Almora-Barrios, Sergio Tatay, Yong Peng, Josep Albero, Oscar Fabelo, Javier González-Platas, Hermenegildo García, Natalia M. Padial, and Carlos Martí-Gastaldo

https://ift.tt/2Lhfb9T

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/TomisMeMyselfandI
πŸ“…︎ Jan 12 2021
🚨︎ report
In Variational Autoencoders, does the generative model generates samples from latent variables which are sampled from a multivariate distribution? If yes, then is this similar in case of GANs?
πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/HTKasd
πŸ“…︎ May 10 2020
🚨︎ report
Understanding Maths and Intuition for Multivariate Gaussian Distribution | Machine Learning Fundamentals youtu.be/6W2mkOfzitk
πŸ‘︎ 27
πŸ’¬︎
πŸ‘€︎ u/prakhar21
πŸ“…︎ Aug 10 2020
🚨︎ report
Understanding Maths and Intuition for Multivariate Gaussian Distribution | Machine Learning Fundamentals youtu.be/6W2mkOfzitk
πŸ‘︎ 18
πŸ’¬︎
πŸ‘€︎ u/prakhar21
πŸ“…︎ Aug 10 2020
🚨︎ report
[Q] Why does simple Bayesian regression (i.e. y = Bx + c) not use a multivariate distribution prior for intercept and slope? Why are intercepts and slopes only treated as correlated in hierarchical/pooled/random models when modelling within-group variation?

In hierarchical / pooled models there is the intuition that slope and intercept are often correlated within each sub-group.
So why does "simple" regression, i.e one intercept and one slope (and no random intercepts) treat the intercept and slope as uncorrelated?

In other words, why do Bayesian approaches model regression as (for example):

y ~ normal(mean, sigma)
mean = intercept + slope* x
intercept ~ normal(0,10)
slope ~normal(0,10)
sigma ~ exp(1)

instead of :

y ~ normal(mean, sigma)
mean = intercept + slope* x
[intercept, slope] ~ multivariatenormal(.....)
sigma ~ exp(1)

Thanks

πŸ‘︎ 25
πŸ’¬︎
πŸ‘€︎ u/gmgmgmgmgmgm
πŸ“…︎ Nov 25 2019
🚨︎ report
[Probabilities] Finding a conditional distribution in multivariate case

Hi! So I know this question is pretty obvious to some, but suppose we know the distribution of
P(a = i, b = j, c = k) for i, j, k \in {0, 1}. Furthermore suppose we know the marginal distribution of P_a that happens to be strictly positive. If we want to calculate the conditional distribution b and c given a, so P_{b, c | a}, we can just simply divide each value in our original joint distribution with the corresponding marginal distribution value of a, right?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/wabhabin
πŸ“…︎ Sep 26 2020
🚨︎ report
Interpolation between multivariate Gaussian distributions

Hey all, Is there a way to statistically interpolate between different multivariate Gaussian distributions? I think for mean vectors linear interpolation might work, but not sure for the covariance matrices. At the most basic level, given two distributions and two "weights" adding up to 1, I would like to find out the "weighted mixture" of two distributions. Can you point me relevant research areas or papers? Thank you.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/cheeky_bastard__
πŸ“…︎ Feb 20 2020
🚨︎ report
Separate mixture of multivariate normal distributions

1.How to separate mixture of two or more multivariate distributions.

2.In a multivariate sample data, which is mixture of many distributions, has some categorical columns as well, in that case how to separate them.

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/eyeswideshhh
πŸ“…︎ Apr 25 2020
🚨︎ report
Addendum: Multivariate Distribution Estimate of Prause et al
πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/Attacksquad2
πŸ“…︎ Mar 15 2020
🚨︎ report
Example of Normal (Laplace-Gauss) Distribution in the gym
πŸ‘︎ 367
πŸ’¬︎
πŸ‘€︎ u/totalinfonet
πŸ“…︎ Jul 23 2019
🚨︎ report
Understanding Multivariate Gaussian Distribution | Machine Learning Fundamentals youtu.be/6W2mkOfzitk
πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/prakhar21
πŸ“…︎ Aug 09 2020
🚨︎ report
Understanding Maths and Intuition behind Multivariate Gaussian Distribution | Machine Learning Fundamentals youtu.be/6W2mkOfzitk
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/prakhar21
πŸ“…︎ Aug 10 2020
🚨︎ report
Use inverse matrix gamma distribution as prior for covariance matrix of multivariate normal (in Python)

Hi, I'm trying to reimplement the Bayesian model from this paper. They mention in the Supplemental Information that they assume a multivariate prior on the weights -- I know how to deal with the mean vector, but they say that "The covariance matrix is defined by an Inverse-Gamma distribution with the two hyperparameters (a, b). The simulation sets the initial values of the two hyperparameters as (a0 = 1, b0 = 5)." I'm trying to do this in PyMC3, and I don't see how to define the covariance matrix with this distribution (is the inverse-wishart really what I want?)? I would also give PyStan a shot if someone knew how to do this there. This is my first foray into Bayesian modeling, so any help would be hugely appreciated.

πŸ‘︎ 35
πŸ’¬︎
πŸ‘€︎ u/squirreltalk
πŸ“…︎ Jun 09 2019
🚨︎ report
[University Grade Statistics: Multivariate Statistics] I need derive distributions in multivariate statistics

Don't get me wrong guys, I made some progress but I need to get full mark.

https://preview.redd.it/caea33ipy0g51.png?width=677&format=png&auto=webp&s=e50a499696bacb5d2e7abca102f712b8b65415a7

https://preview.redd.it/nolfp0cqy0g51.png?width=654&format=png&auto=webp&s=6aa759469670a039e5f21d5bba2ba3e8f28cd00e

https://preview.redd.it/ulx0q69ry0g51.png?width=592&format=png&auto=webp&s=06540ab6cf5336f9b5f937a89da8f8fa9d50b48f

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/yu2tu
πŸ“…︎ Aug 09 2020
🚨︎ report
[OC] Visualization of 1D slices of 2D Multivariate Normal Distribution v.redd.it/833o2cu1pva21
πŸ‘︎ 62
πŸ’¬︎
πŸ‘€︎ u/EricJEarley
πŸ“…︎ Jan 17 2019
🚨︎ report
[OC] Visualization of 1D slices of 2D Multivariate Normal Distribution (v2) v.redd.it/0dgo57eue0b21
πŸ‘︎ 42
πŸ’¬︎
πŸ‘€︎ u/EricJEarley
πŸ“…︎ Jan 17 2019
🚨︎ report
Multivariate normal distribution

Can someone please explain to me how this works?

Suppose i have 3 variables: earnings, savings, debt

I have these 3 variables recorded for 1000 people.

  1. How do you check if this data is normally distributed? For a single variable, I could use the kolmogorov-smirnov test. But how would you check if these 3 variables are jointly normally distributed?

  2. Assuming that this data is normally distributed, how do you calculate the joint multivariate normal distribution of this data? For a single variable, assuming a normal distribution, i could take all the observations:

Mu = sum(xi)/n .. for all values of n

Sigma = sqrt((sum(xi-mu)^2) / n) .... for all values of n

But if there are 3 variables:

Mu vector: (mu1, mu2, mu3)

Sigma-covariance matrix: (sig11, sig12, sig13, sig21, sig22, sig23, sig31, sig32, sig33)

Is this how you would define the multivariate distribution for this example?

  1. does the concept of z score still apply here? Foe a given point (x1, x2, x3) could you define a z-vector : (z1, z2, z3) and take the norm of this vector z = sqrt((z1^2 + z2^2 + z3^2)) ... and define how far this point is away from the center of the multivariate distribution, thus more likely this point is an outlier?

Thanks

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/blueest
πŸ“…︎ Jan 14 2021
🚨︎ report
Multivariate Normal Distributions

I am trying to teach myself about the multivariate normal distribution and I am struggling to understand some basic things about it.

To show my confusion, I use the famous Iris Flower dataset (I will use the R programming language for some basic scripts). The Iris Flower dataset has 5 columns and 150 rows. Each row contains the measurements for an individual flower (i.e. there are 150 flowers). The columns contain the measurements of the "Petal Width", the "Petal Length", the "Sepal Length" , the "Sepal Width" and the "Type of Flower" (three types of flowers, categorical variable).

Suppose I Just take the Petal Length variable. I want to see if the Petal Length follows a (univariate) normal distribution. I think this can be easily done using different strategies (R code below):

#load the iris data and isolate the petal length  
data(iris)  
var1 = iris$Petal.Length    

#visually check if the distribution of the petal length looks like a "bell curve" plot(density(var1))   

 #look at the quantile-quantile
  plot qqnorm(var1)  

  #use statistics (e.g. the shapiro-wilks test) to check for normality  
 shapiro.test(var1) 
 #if the data is normally distributed, we can find out the mean and the variance
 mean(var1) 
var(var1) 

Similarly, I can repeat this for the remaining variables in the iris data. However, this task becomes a lot more complicated when you consider the multivariate distribution of the iris data : https://en.wikipedia.org/wiki/Multivariate_normal_distribution . When dealing with the multivariate distribution, there is now a "vector of means" and a "variance-covariance matrix". This means that there are more complex relationships within the data - some parts of the data might have a normal distribution whereas some parts of the data might not be normally distributed.

After spending some time researching how to determine if a dataset follows a multivariate distribution , I found out about something called the Mardia test, which apparently uses the "skewness" and the "kurtosis" to determine if the data is normally distributed (high skewness and high kurtosis means the data is not normally distributed). I tried running the following code in R to perform the Marida test on the iris data:

library(MVN)
  data(iris)  
data = iris[,-5] 
 result = mvn(data) 
 result 

The results of this are confusing. I

... keep reading on reddit ➑

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/ottawalanguages
πŸ“…︎ Jan 23 2021
🚨︎ report
Multivariate Normal Distribution

If we have X_1,X_2,...,X_n where all of them are univariate normal distributed and we also assume that they are independent, is the vector X=(X_1,...,X_n) multivariate normal distributed?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Constant_Pitch801
πŸ“…︎ Nov 23 2020
🚨︎ report
In Variational Autoencoders, does the generative model generates samples from latent variables which are sampled from a multivariate distribution? If yes, then is this similar in case of GANs?
πŸ‘︎ 9
πŸ’¬︎
πŸ‘€︎ u/HTKasd
πŸ“…︎ May 10 2020
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.