[D] What's the difference between a posterior distribution, predictive distribution and a posterior predictive distribution?

I'm trying to understand the difference between these three terms and everytime I read an answer online, I'm back to square one.

πŸ‘︎ 16
πŸ’¬︎
πŸ‘€︎ u/pleasedontsayhey
πŸ“…︎ Dec 12 2021
🚨︎ report
Trying to convert a model into a distribution? Struggling to form likelihood and consequently Bayesian posterior distribution reddit.com/gallery/rdyr3y
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Local-Mess5683
πŸ“…︎ Dec 11 2021
🚨︎ report
How to calculate Bayesian posterior for Gaussian distribution

Hi everyone,

I wonder if you are aware of python or R code that can help me with understanding and implementation Bayesian posterior updating of Gaussian distribution.

Thanks

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/Kamran_A
πŸ“…︎ Aug 23 2021
🚨︎ report
[Q] How to extract posterior distribution of two-sided hypothesis computed using hypothesis() from brms?

I want to extract the posterior distribution of a two sided hypothesis that i computed using the hypothesis() function from BRMS, on a brm() model.

Any help appreciated!

Thanks in advance!

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/city-of-the-rain
πŸ“…︎ Jul 21 2021
🚨︎ report
[Q] Updating Time Series Posterior distribution

I have a distribution of probability densities by Hour and Minute of the day. I want to be able to update the distribution based on new information. The question that I am ultimately trying to answer is: The estimated wait time at 5:00 PM is 30 min what is the new posterior distribution if 5 people had a wait time of 38 min?

If the density for exactly 5:00 PM is updated, I would expect it to update the density for times in close proximity to 5 PM and decay as the duration increases.

The end goal would be to estimate the posterior distribution for each mapped location. Is a Bayesian approach the right way to approach this problem? If so, could you point me to some resources that would allow me to research how to update the posterior using feedback?

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/da_chosen1
πŸ“…︎ Apr 22 2021
🚨︎ report
Computing a posterior distribution of a Bernoulli likelihood

Dear learnmath people,

my task is to compute a posterior distribution of a Bernoulli likelihood. I choose a conjugate prior. You can see my solution here. The exercise in question is exercise 6.3 on page 222 from Mathematics for Machine Learning book (freely available).

Have I done this correctly?

πŸ‘︎ 2
πŸ’¬︎
πŸ“…︎ Jun 04 2021
🚨︎ report
How the posterior predictive distribution is derived?

Here's what I think:

Suppose we have a posterior distribution with a range of values for p. Now, to form the posterior predictive distribution, we take the values of p and for every one of them we run simulations and glean observations (sampling distributions). The average of these observations is the posterior predictive distribution. Like in this picture https://ibb.co/9VkTKLk

Have I got this right?

I'm reading a book called Statistical Rethinking and the author also uses R code to teach the material. To calculate the predictions, he says: https://ibb.co/FxSWkmt

I'm not sure if I understand the code w <- rbinom( 1e4 , size=9 , prob=samples ), like exactly what it does. Guess this is tied to my first problem.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/MJORH
πŸ“…︎ Mar 14 2021
🚨︎ report
For a posterior distribution of a probability with credibility intervals ranging from almost 0 to almost 1, isn’t it incorrect to say we know nothing about the probability of an event?

Let’s assume the posterior heaps at, say, 0.6, with credible intervals ranging from 0.10 to 0.95. My conclusion given this posterior is that the event is more likely than not but that there is considerable uncertainty such that I am not confident I could accurately predict the outcome of the event. The long-run probability however would predict that the event is more likely than not. By your estimation, is this a correct interpretation?

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/Frogmarsh
πŸ“…︎ Dec 14 2020
🚨︎ report
[Q] Can I shortcut model-fitting across a posterior distribution of linear model parameters to get the mean and SD of expected outcomes to save time?

I am trying to fit a TONNE of linear models. My dataset consists of 813 species, and for each species I have a posterior distribution of linear model parameters from some binomial GLMs for 25 variables. These posteriors are defined as a matrix of 26 means (intercept + 25 dependent variables) and a 26x26 covariance matrix.

I then have a bunch of matrices of new dependent variables (scenarios of reconfigurations of habitat for a bunch of locations). These are ~57000x25 matrices (~57000 locations, 25 variables). Fitting the model then gives me ~57000 model outputs, which I then sum to obtain a single number for the model (the expected number of sites suitable for a species). What I am interested in is the mean and SD (allowing me to calculate confidence intervals) of this value across the posterior of parameter estimates.

I can sample sets of parameter estimates from this distribution (say 1000 samples) and then fit the model to each of those 1000 samples and then calculate the mean and SD of the outcomes - this mean and SD is what I am, ultimately, interested in, allowing me to calculate confidence intervals for that mean model estimate. That is 1000 linear model calculations.

The problem is that for each model (i.e. set of 1000 sampled parameters) I have 861 sets of dependent variables to obtain predictions from, so one model will involve 1000 * 861 = 861000 calculations. I THEN have 813 of these sets of parameters, so in total 699993000 models to fit. That is a TONNE of calculations, and given my ability with R and available hardware it is prohibitively time-consuming to fit them all like this.

What I am interested in is this: is there a way to shortcut the 1000 models step of this process so that I can obtain the expected mean and SD of model outcomes from my posterior of parameter estimates?

I am not sure I have explained this very well, so here is an example of something I have tried.

Full model-fitting approach. 1) I sample 1000 sets of parameters from the posterior of parameter estimates. 2) I fit my model using these 1000 sets of parameters, and get 1000 predictions. 3) I take the mean and standard deviation of these 1000 outputs, and I get mean = 21184.36, SD = 1512.7882 (therefore mean + SD = 22697.15 and mean - SD = 19671.58)

My attempt at a shortcut approach (note - this is CLEARLY wrong!) 1) Take the 1000 sets of parameters from the full model-fitting approach. 2) Calculate the mean of each parameter and the SD of each parameter. 3) Calcu

... keep reading on reddit ➑

πŸ‘︎ 14
πŸ’¬︎
πŸ‘€︎ u/Apes_Ma
πŸ“…︎ Nov 16 2020
🚨︎ report
[Q] Bayesian inference overfitting posterior distribution

I am trying to implement bayesian inference to update the normal distribution of a process month over month as more data becomes available. I am running into an issue where relatively few new observations are completely overpowering the priors (example shown below). I would think that having vastly more prior observations would make the distribution slower to move/converge.

I assume I am missing some key concept here, but if not is there some better method and/or tuning parameter to prevent massive shifts in the distribution for few observations.

Example:

Prior Likelihood Posterior
Number Observations 601 3 604
ΞΌ 0.98 5 4.97
Οƒ 17.40 1.73 1.41
Ο„ .0033 .17 .50

The posterior is generated using the following the following bayesian update equations found here: Normal Distribution Known Variance

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/Thatsunbelizeable
πŸ“…︎ Nov 24 2020
🚨︎ report
Why posterior distribution is important?

In the context of machine learning, given a dataset, when we try to fit a model usually, it comes down to finding the estimates(using MLE for example). Lately, I have been reading about The Metropolis algorithm. It's easy to understand that we can find the posterior distribution, but what after that? , also why do we need that in the first place? , Is this why all the frequentists don't like the Bayesian approach? , Any specific advantages in finding the distribution over finding the estimate?

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/saladking99
πŸ“…︎ Aug 24 2020
🚨︎ report
Gaussian Processes for representing posterior distribution of time series prediction?

Imagine you're doing time series prediction with a Bayesian approach, using for instance a neural network + MCMC. Your prediction result is therefore a probability distribution in the space of functions of time in the near-future, from which you know how to sample.

What's more, it is known that Gaussian Processes can be used to describe probability distributions over the space of functions.

This gives me the intuition that one could do the following things:

  1. Sample predicted functions using your NN + MCMC algorithm; presumably this is expensive to do, so you want to do it once.
  2. Using this sample, approximate the predicted distribution using Gaussian Processes, yielding a compact result such as a parameterized kernel function + maybe some 'anchor points'.
  3. Continue working with this Gaussian Process approximation of your posterior distribution, which is presumably more efficient to use.

Has this strategy been pursued? Do you think it's sensible?

(The background I have on this topic is what you might find in Bishop's PRML and MacKay's ITILA)

πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/vvvvalvalval
πŸ“…︎ Dec 23 2019
🚨︎ report
Bayesian Q: What is a good reference for conjugate prior/posterior distributions?

I am interested in slice sampling and Gibbs sampling to estimate parameters for a complex HMM, and I am seeking a good reference on conjugate prior/posterior relationships to set up the samplers. I'm looking for something beyond the depth of the wiki page on the topic. What's your favorite go-to to look up conjugate priors?

πŸ‘︎ 12
πŸ’¬︎
πŸ‘€︎ u/Economist_hat
πŸ“…︎ Aug 14 2018
🚨︎ report
[Q] Understanding posterior distribution

Reference

In the numerator, am I just multiplying two PDFs together? I've read online in some places that the f(x | theta) is not just the PDF of each of the samples, but is a likelihood function instead. For example, if my samples were distributed with U[0, theta], and my prior distribution was U[0, 1], would the numerator be pdf of U[0, theta] times pdf of U[0, 1]?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/michael1999wang
πŸ“…︎ Apr 07 2020
🚨︎ report
Gaussian Processes for representing posterior distribution of time series prediction? /r/learnmachinelearning/c…
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/vvvvalvalval
πŸ“…︎ Dec 23 2019
🚨︎ report
How do I form a posterior distribution, beginner Bayes problem

Venous thromboembolic (VTE) is a documented global health burden. Temporary immobilisation of lower limb injury in a plaster cast or fitted boot is an important cause of potentially preventable VTE. Recent evidence suggests that thromboprophylaxis with anticoagulant drugs can reduce the risk of VTE but can only be justified if the benefits outweigh the risk and is cost-effective relative to standard of care.

A systematic review was conducted. Studies were eligible for inclusion if they met the following criteria: a) randomised controlled trial (RCT) which included a measurement of VTE; b) adults (aged over 16 years) requiring temporary immobilisation for an isolated lower limb injury. Three RCTs were identified comparing prophylaxis with low molecular weight heparin (LMWH) against standard of care (Table 1).

http://puu.sh/CdHCW/ed4ce718d0.png

Task

Generate the posterior distribution for the effect of LMWH versus standard of care and the predictive distribution for the effect in a new study.

Write a short report that describes your method of analysis and results, including any limitations and recommendations.

I'm familiar with the concepts of Bayes statistics however I'm not sure how to apply them in this case

If anyone has any advice on how to get started how links to any useful textbooks or papers it would be greatly appreciated.

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/Sohcratees
πŸ“…︎ Dec 07 2018
🚨︎ report
Our recollection of dreams is a Bayesian process; we only recall a posterior distribution of our actual dreams.
πŸ‘︎ 3
πŸ’¬︎
πŸ“…︎ Apr 15 2019
🚨︎ report
[DIscussion] Why are the prior and posteriors modelled as a Gaussian Distribution in Variational Autoencoders ?

Hey

I just studied about variational autoencoders and more specifically the probability distribution based explanation from here https://jaan.io/what-is-variational-autoencoder-vae-tutorial/ . I couldn't understand that we only have with us X so how can we assume P(Z) and Q(Z|X) to be Gaussian where X is data & Z are latent features.

Is the following intuition correct -- We have X and we want to find a model that is similar to P(X). Z is the factors that X depends on which we do not know about. So to find about P(Z) we use an encoder network but why then during test time we use P(Z) and not Q(Z|X) doesn't it represent more about the data ?

πŸ‘︎ 10
πŸ’¬︎
πŸ‘€︎ u/amil123123
πŸ“…︎ Jun 12 2018
🚨︎ report
Neural network: predicting a continuous but non-normal output, and obtaining its posterior distribution

I would like the output layer of my neural network to output the posterior distribution of y conditional on x, where y cannot be assumed to be normally distributed conditional on x (but could instead be a mixture of normals, or have a mass point somewhere). I not only care about having good point estimates for y, but also need to be able to infer confidence bounds.

I was thinking of bucketing y (into 50 or so buckets; y has a lower and an upper bound, which helps) and using a softmax activation. This gives me a nice probability distribution for 'free', but it feels inefficient. For example the standard loss function (categorical cross entropy) will fail to take into account the fact that predicting 'bucket 10' when the truth is 'bucket 9' is not as bad as predicting 'bucket 10' when the truth is 'bucket 3'.

Am I missing a simple, standard way of handling this problem?

πŸ‘︎ 24
πŸ’¬︎
πŸ‘€︎ u/timcar
πŸ“…︎ Sep 05 2016
🚨︎ report
Posterior Auricular artery | Course | Relations | Branches & Distribution youtu.be/zUY_XR2kTs8
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/geethahari29
πŸ“…︎ Jul 25 2019
🚨︎ report
Pseudo-extended MCMC: A proposed approach for sampling from multi-modal posterior distributions

https://arxiv.org/pdf/1708.05239.pdf

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/AllezCannes
πŸ“…︎ Feb 15 2019
🚨︎ report
How to do MCMC sampling on the posterior predictive distribution created by Prophet Library (python)

I'm using the Prophet package in Python 3.6 to evaluate the effects of a campaign on sales, product margin, and other ecommerce variables. I am training a model on daily data from the pre-period before the intervention (holding out a subset of the data at the end before the intervention and validating that it makes good predictions on that period and has reasonably calibrated uncertainty intervals), and then using it to forecast the counterfactual of how sales would have trended without the intervention accounting for trend/seasonality/holidays.

Someone has advised me to take MCMC draws from the posterior predictive distribution of the counterfactual sales trend and cumulate the actuals against those to get a distribution of lift attributable to the campaign. However, I am completely lost as to how to do this, and need some serious help. I tried looking at pyMc library, and it sorta went over my head.

If using prophet to make a prediction, how would I then get MCMC samples from the predicted distribution?

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/neuroguy6
πŸ“…︎ Jul 13 2018
🚨︎ report
Should the Kullback-Leibler divergence of the posterior distribution relative to the prior distribution be smaller than the prior relative to the posterior?

Let me clarify what I mean. I know the KL divergence is asymmetric. Following the paper A Bayesian Characterization of Relative Entropy I'm toying around with a few models I ran in stan. Let the prior = P , and the posterior = Q for shorthand.

As I understand it the KL(Q ||P) tells us the gain in information for describing a variable with Q instead of P. Yet, every instance of the KL divergence I see has it that the KL(Q||P) is smaller than KL(P||Q). I know they are asymmetric, but if the divergence is the gain in information, shouldn't the KL(Q||P) (representing the gain in information in the posterior relative to the prior) be the larger value?

πŸ‘︎ 8
πŸ’¬︎
πŸ‘€︎ u/ProfWiki
πŸ“…︎ Aug 16 2018
🚨︎ report
How do you know distribution should be called the posterior (in a VAE)?

In a VAE or regular autoencoder, we have an input x, which we map to the latent space represented as z, and then we decode it into a reconstruction of input x, as x'. So we want the AE to learn P(z | x) and P(x' | z) right?

I was reading a paper and they called P(z | x) the posterior. That makes sense, but how do you know when to assign that name to a distribution. Is P(x' | z) also a posterior? But P(x | z) (the reverse of P(z | x)) must not be a posterior then?

I'm just confused about how one assigns these namings.

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/rasen58
πŸ“…︎ Dec 08 2017
🚨︎ report
Question about Bayesian Inference, Posterior Distribution

I have a posterior probability of $p_i$ which is based on a Beta prior and some data from a binomial distribution:

I have another procedure:

$P(E)=\prod_{i \in I} p_i^{k_i}(1-p_i)^{1-k_i}$

which gives me the probability of a specific event of successes and failures for the set of $I$ in a model. Given the posterior distribution for $p_i$, how do I find P(E)?

UPDATE: I think the issue may be the notation. $P(E)$ should actually be $P(E|p_1,...,....p_i,....,p_{|I|})$. Then if we are looking for the marginal probability, $P(E)$, then we need to solve for $\int ... \int_{0}^{1} P(E|p_1,...,....p_i,....,p_{|I|}P(p_1,...,....p_i,....,p_{|I|}) dp_i$. Because the $p_i$'s are all independent, we can probably simplify the question a lot.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/newperson77777777
πŸ“…︎ Jul 22 2018
🚨︎ report
Explanation of Posterior and Prior Distributions?

I'm having trouble grasping the "distribution" part. I understand how the inferencing process works and how we use a prior to find a posterior and then continue repeating the process. I was hoping for an explanation as to how a posterior distribution is represented (is it just a graph with points on it?) and how a posterior is different from a posterior distribution (is it just one vs . many?).

I'm kind of shooting in the dark here but, would it be correct to imagine a posterior distribution is just ALL the recorded posterior values throughout the inferencing process? If this is the case, how is it represented on a graph (what are the x and y axes?). Are the values just held in arrays and not displayed visually at all? Just trying to wrap my head around what it is.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/Koalchemy
πŸ“…︎ Jun 04 2018
🚨︎ report
[University Probability] (X-post /r/probabilitytheory) Understanding the derivation of a posterior distribution

Hi everyone,

I'm reading [this paper] (https://papers.nips.cc/paper/3208-probabilistic-matrix-factorization.pdf), and... well the Equation 3 has me confused. It's supposed to be the definition of the log posterior distribution based on Equations 1 & 2. I've actually derived Equation 3, but while ignoring a term. Here's what I did.

  1. I applied the rules of conditional probability to come up with the following expression:

P(U, V | R, Οƒ^2, Οƒα΅₯^2, Οƒα΅€^2 ) = P(R, U, V | Οƒ^2, Οƒα΅₯^2, Οƒα΅€^2 ) / P (R | Οƒ^2, Οƒα΅₯^2, Οƒα΅€^2 ).

  1. I used conditional independence to transform the numerator to: P(R | U, V, Οƒ^2 ) * P(V | Οƒα΅₯^2 ) * P(U | Οƒα΅€^2 ).

So the problem I'm having is that if I evaluate the (log version of the) above equation, I end up with exactly the equation in Equation 3.

But of course, that's completely ignoring the denominator term. I'm thinking, therefore, that the denominator is somehow 1, but I'm not sure how to prove it.

Can someone give me a bit of guidance?

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/millenniumpianist
πŸ“…︎ Jun 06 2017
🚨︎ report
Variance in Posterior Distribution and Sample Size

I'm currently working through McElreath's "Statistical Rethinking", which is a fantastic book on (not only) Bayesian statistics. (If you are interested in Bayesian statistics and are looking for a great introduction, read it!)

There is one thing I was wondering about and couldn't quite find a solution: The general notion is (from what I understood), that with increasing sample size the likelihood will outweigh the prior and the posterior will become a narrower estimation of the parameter, i.e. -- all other things being equal -- with increasing sample size the variance of the posterior will decrease.

Is there any case where this is not true? Some special cases of priors or likelihood functions or data, where the variance of the posterior distribution does not change or even increases?

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/neurotroph
πŸ“…︎ Oct 08 2016
🚨︎ report
What exactly makes a posterior beta distribution more precise?

So I've been doing a bit of work centred around the beta distribution and have a model, with two sets of different parameters. Say, Beta(A,B) and Beta(C,D). I have the mean, median and mode of said models, but have no idea of interpreting how precise/the more precise beta model.

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/madaraishida
πŸ“…︎ Dec 20 2015
🚨︎ report
Bayesian - Posterior Distribution (Beta, Binomial)

If I'm trying to estimate a parameter (say theta), with a binomial distribution as the likelihood, and a uniform(0,1) distribution as the prior. Can I just use the binomial distribution as the posterior distribution as well?

I understand all the examples which transform the uniform prior into a Beta prior, which creates a Beta posterior. But I was wondering about an alternative.

Thanks in advance!

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/billum4
πŸ“…︎ Mar 16 2016
🚨︎ report
Testing Bayesian Concepts in R: using the Gaussian Conjugate Priors to compute the Posterior Distribution sandipanweb.wordpress.com…
πŸ‘︎ 18
πŸ’¬︎
πŸ‘€︎ u/SandipanDeyUMBC
πŸ“…︎ Jun 09 2017
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.