28 Hilarious Hierarchical Bayes Model Puns

Help me understand Sequential Binomial model/emperical Bayes and hierarchical Bayes

The title says it all. To me, sequential/emperical and hierarchical models all sound the same. I am looking only at binomial data. Please help me understand the differences with examples.

👍︎ 3

💬︎

👤︎ u/Number_8_

📅︎ Dec 08 2018

🚨︎ report

Opinions on DAOs as a model for non hierarchical governance of a coop?

👍︎ 29

💬︎

👤︎ u/HealthMotor8651

📅︎ Jan 04 2022

🚨︎ report

[1907.12378] Music Recommendations in Hyperbolic Space: An Application of Empirical Bayes and Hierarchical Poincaré Embeddings arxiv.org/abs/1907.12378

👍︎ 6

💬︎

👤︎ u/_joermungandr_

📅︎ Aug 01 2019

🚨︎ report

[Question] What are some good resources (Books, Courses, YouTube channels) to learn about Hierarchical models / Multilevel models.

What are some good resources (books, courses, YouTube channels) to learn about Hierarchical models / Multilevel models.

Any suggestions will be helpful.

👍︎ 51

💬︎

👤︎ u/venkarafa

📅︎ Nov 19 2021

🚨︎ report

Why do hierarchical models (or MCMC) improve individual-level estimates? Specifically, for BTYD models

Hi!

I am reading about the Pareto/GGG BTYD bayesian model, which represents an improvement on a previous model called Pareto/NBD.

I understand the rationale behind both methods and their assumptions, but Pareto/GGG claims to produce better estimates for individual-level parameters because of its use of MCMC. Quoting:

>To achieve the parameter estimation for the Pareto/GGG, we formulate a full hierarchical Bayesian model with hyperpriors for the heterogeneity parameters, then generate draws of the marginal posterior distributions using a Markov Chain Monte Carlo (MCMC) sampling scheme. This comes with additional computational costs and implementation complexity, compared with the maximum likelihood method available for Pareto/NBD, but we simultaneously gain the benefits of (1) estimated marginal posterior distributions rather than point estimates, (2) individual-level parameter estimates, and thus (3) straightforward simulations of customer-level metrics that are of managerial interest.

Which I don't understand.

I get that in hierarchical models you try to infer the behavior of a group (the heterogeneity params) given the observations you have at individual level (that is, each individual's behavior is a sample from the group/heterogeneity distribution), but why using MCMC to "generate draws of the marginal posterior distributions" help to improve the estimates for the individual level parameters?

I understand the meaning of "MCMC", but I am not well-versed on its workings, so if that's why I am not understanding this and someone could point me to a good learning resource where I can read about it (and that focuses on the claim on individual estimates), it'd be really helpful. I found this article (which I have not fully read) explaining a bit of the intuition on the second page:

>A hierarchical model may have parameters for each individual that describe each individual's tendencies, and the distribution of individual parameters within a group is modeled by a higher-level distribution with its own parameters that describe the tendency of the group. The individual level-level and group-level parameters are estimated simultaneously. Therefore, the estimate of each individual-level parameter is informed by all the other individuals via the estimate of the group-level distribution.

But it confuses me

... keep reading on reddit ➡

👍︎ 5

💬︎

👤︎ u/Silver_Book_938

📅︎ Dec 10 2021

🚨︎ report

[R] Hierarchical Transformers Are More Efficient Language Models

https://arxiv.org/abs/2110.13711

A team from Google, OpenAI, and University of Warsaw proposes a new Efficient Transformer architecture for language modeling, setting a new state-of-the-art on the imagenet32 for autoregressive models.

👍︎ 97

💬︎

👤︎ u/yohama8832

📅︎ Nov 04 2021

🚨︎ report

In science philosophy, what’s the relationship between Bayes’ theorem and the Hypothetico-deductive model?

When I asked a lecturer who taught an introductory class in scientific philosophy he suggested that they may have a common history of thought. To me, it looked like they are presented in a very similar manner.

👍︎ 3

💬︎

👤︎ u/Blueberry_grouse

📅︎ Nov 23 2021

🚨︎ report

Introductory paper on Hierarchical Bayes Multitask Learning?

I'm interested in doing a class project on multitask learning, but the dataset I want to use isn't large enough to do an approach using a neural net with multiple outputs. Therefore I'd like to do a Hierarchical Bayes approach where I impose a shared prior on the weights for all of the tasks. I was wondering if anyone has a recommendation for a simple, clear paper that discusses how to do this for regression, for example. The papers I've been able to find seem overly complex for what I'd like to do.

👍︎ 11

💬︎

👤︎ u/knownothingknowitall

📅︎ Oct 26 2014

🚨︎ report

Question about the role of hyper-parameters in hierarchical bayes.

Hi, r/askstatistics!

I'm reading Kevin Murphy's Machine Learning: A Probabilistic Perspective, and I have a question on the role of hyper-parameters in hierarchical bayes. I'll give an example to illustrate my question better:

Let's say that we're trying to estimate the cancer rate of city i that has N_i people, and its true cancer rate is r_i. Then, say that the amount of people who have cancer in city i at this point in time is a Binomial(N_i, r_i) random variable. Call this x_i.

Now, naively, we could simply estimate each r_i without using the information from the other cities by using the maximum likelihood estimators x_i/N_i. But from what Murphy says, we should use the information of the other cities' cancer rates to help infer each other.

Now, let's say we put a prior on our rates r_i. Say, a Beta(a, b) prior. Here, (a,b) is our hyper parameter. Murphy then goes on to say "Note that it is crucial that we infer (a,b) from the data; if we just clamp it to a constant, the r_i will be conditionally independent, and there will be no information flow between them. By contrast, by treating (a,b) as an unknown hidden variable, we allow the data-poor cities to borrow statistical strength from data-rich ones."

My question is simply, how does this work? I always thought the structure of hierarchical bayes was that, given the value of the hyper parameter, the parameters generated from that are conditionally independent. On the flip side, I understand that we want the data to influence the value the hyper parameter. For example, if a city is particularly smoggy, I want the hyper parameter to shift the weight of r_i's distribution towards a higher cancer rate. Moreover, I intuitively understand that I want to use the data from other smoggy cities to predict the cancer rate.

So, even though the r_i's may be conditionally independent given a fixed value of (a,b), they may not be independent when (a,b) is allowed to vary. How do we use the information from one city to infer the parameter r_i of another city?

How does hierarchical bayes do this? Why aren't all the parameters r_i independent of one another even with a data-inferred hyper parameter? If (a,b) is allowed to vary city-to-city, then it's a random varibale, and so it has hyper hyper parameters. What about those? Does the rabbit hole go on forever?

I hope my questions make sense. Please let me know if you need clarification.

👍︎ 5

💬︎

👤︎ u/WhenIntegralsAttack

📅︎ Sep 12 2016

🚨︎ report

‘HiClass’: A Python Package that Provides Implementations of Popular Machine Learning Models and Evaluation Metrics for Local Hierarchical Classification

Classification is the process of grouping items into categories. Classification problems can be naturally modeled hierarchically, typically in the tree or directed acyclic graph form (or some combination). These types of classifications range from musical genre categorization all the way down to identifying viral sequences within metagenomic data sets and diagnosing chest X-ray images using COVID-19 as an example.

A flat approach to tree classification is a methodology that completely ignores the hierarchy between classes, usually predicting only leaf nodes. Although this method can be used easily and quickly for some problems without hierarchical features, it becomes more difficult with multiple levels of grouping in mind because then you have decision trees or pruning needed on top of what would’ve been done by regular linear models. The importance of the hierarchy when training a model is often overlooked. Still, it has been shown to lead in consistently better predictive results; therefore its being used in the research.

Quick Read: https://www.marktechpost.com/2021/12/14/hiclass-a-python-package-that-provides-implementations-of-popular-machine-learning-models-and-evaluation-metrics-for-local-hierarchical-classification/

Paper: https://arxiv.org/pdf/2112.06560v1.pdf

Gitlab: https://gitlab.com/dacs-hpi/hiclass

https://preview.redd.it/js0473hk2m581.png?width=1292&format=png&auto=webp&s=f7a6e8a9738a406e66e60f6df845f45bd56803e1

👍︎ 3

💬︎

👤︎ u/ai-lover

📅︎ Dec 15 2021

🚨︎ report

Computer users organize their files into folders because that is the primary tool offered by operating systems. But this standard hierarchical model has many shortcomings. Organizing files around tagging instead, presents a number of advantages to consider nayuki.io/page/designing-…

👍︎ 35

💬︎

👤︎ u/gholemu

📅︎ Nov 08 2021

🚨︎ report

A team from the University of Warsaw(Poland), OpenAI and Google Research proposes Hourglass, a hierarchical transformer language model that operates on shortened sequences to alleviate transformers’ huge computation burdens syncedreview.com/2021/11/…

👍︎ 77

💬︎

👤︎ u/QuantumThinkology

📅︎ Nov 01 2021

🚨︎ report

[R] Warsaw U, OpenAI and Google’s Hourglass Hierarchical Transformer Model Outperforms Transformer Baselines

A team from the University of Warsaw, OpenAI and Google Research proposes Hourglass, a hierarchical transformer language model that operates on shortened sequences to alleviate transformers’ huge computation burdens.

Here is a quick read: Warsaw U, OpenAI and Google’s Hourglass Hierarchical Transformer Model Outperforms Transformer Baselines.

The paper Hierarchical Transformers Are More Efficient Language Models is on arXiv.

👍︎ 13

💬︎

👤︎ u/Yuqing7

📅︎ Nov 01 2021

🚨︎ report

[R] Warsaw U, OpenAI and Google’s Hourglass Hierarchical Transformer Model Outperforms Transformer Baselines /r/artificial/comments/qk…

👍︎ 9

💬︎

👤︎ u/hockiklocki

📅︎ Nov 01 2021

🚨︎ report

hierarchical transformers for language models

https://deepai.org/publication/hierarchical-transformers-are-more-efficient-language-models

👍︎ 3

💬︎

👤︎ u/loopy_fun

📅︎ Oct 28 2021

🚨︎ report

Hierarchical Transformers Are More Efficient Language Models by Piotr Nawrot et al. deepai.org/publication/hi…

👍︎ 2

💬︎

👤︎ u/deep_ai

📅︎ Oct 28 2021

🚨︎ report

Need to create a bridging table that models a simple hierarchical relationship between entities - what are best practices here?

So say I have several employee_ids.

They happen to work in a place where the hierarchy of relationships between them needs to be flexible.

I thought of making this table to store the nature of the relationship between them.

id	manager_id	subordinate_id
1	15	10
2	16	15
3	16	12

For example, in the above table, it shows that employee_id 15 is the manager of employee_id 10.

In turn, 16 is the manager of 15, meaning if I wanted to query for all the subordinates under employee_id 16, I should get 15, 10 (indirectly), and then also 12.

This is relevant because I have another table called tasks, that lists out the task_ids per employee.

E.g.

id	task_id	employee_id
1	1	10
2	2	15
3	3	16

I'm trying to make a portal where whenever a manager logs in, the portal shows them all the tasks assigned to them, as well as all the tasks assigned to any subordinate under them (directly or indirectly).

I can think of some issues in mind that I would have to carefully plan for with the above method e.g.:

Is there a way to have a SQL query recursively join one table against itself until some limit is reached?
If an employee is a manager of another employee, another row mustn't be added where that manager is now the subordinate of that same employee, e.g. something like this would be invalid and some kind of constraint would need to prevent it.

id	manager_id	subordinate_id
1	15	10
2	10	15

(The above is invalid and must be validated against!)

👍︎ 3

💬︎

👤︎ u/Lostwhispers05

📅︎ Sep 13 2021

🚨︎ report

Is this a meaningful way to model "hierarchical data"?

Suppose you have "postal codes" and you want to use them as a variable in your data. Postal codes are 6 characters long - if two postal codes share the first 5 characters in common, they are probably pretty similar to one another (i.e. located closer to one another). If two postal codes only share the first character, then they are probably pretty dissimilar. And if they share the first 3 characters, then they are expected to be somewhat similar to each other.

My question: suppose you want to use postal codes as a variable in a statistical model, but you are not sure whether to include all 6 characters, or the 5 characters, or the first 4 characters, etc. Each one of these might include some potentially useful information.

If you decide to use a random forest model (which is known to be robust to redundant variables), would it make sense to turn the postal code into 6 separate variables? E.g.

var_1 = 1st character var_2 = 1st and 2nd character var_3 = 1st, 2nd and 3rd character var_4 = 1st, 2nd, 3rd and 4th character var_5 = 1st, 2nd, 3rd, 4th and 5th character var_6 = 1st, 2nd, 3rd, 4th, 5th and 6th character

(note, there are other variables in the model as well, e.g. height, age, weight, etc).

This way, the statistical model itself could decide which length of the postal code variable results in the best outcome? Or is this approach bound to result in multicollinearity and overfitting?

Thanks

👍︎ 5

💬︎

👤︎ u/blueest

📅︎ Jun 25 2021

🚨︎ report

Bayes factors for linear mixed effects models

Hello,

I have to get Bayes factors for my model. I am very new to Bayesian analysis. I checked this page and found some useful packages, but it is very confusing: https://rstudio-pubs-static.s3.amazonaws.com/358672_09291d0b37ce43f08cf001cfd25c16c2.html

My model is like this: lmer(value ~ time * group * condition + (1 | id). So, it is generalized mixed effects model for repeated measures. Some useful information: value is response time, time has two times (pre/post intervention), group has three intervention groups (no control), condition has two values neutral and reward. By the way it is a within-subject study, not all participants had different interventions. But, we compare the intervention methods based on reaction times.

I have to get Bayes factors for intervention group comparison, with the outcome of reaction times based on two conditions, neutral and reward.

My question is which package should I use? It was a little bit complicated from the link. Any idea regarding the code?

Thanks in advance.

👍︎ 4

💬︎

👤︎ u/helloiambrain

📅︎ Aug 09 2021

🚨︎ report

"FIST: Hierarchical Few-Shot Imitation with Skill Transition Models", Hakhamaneshi et al 2021 {BAIR} arxiv.org/abs/2107.08981

👍︎ 3

💬︎

👤︎ u/gwern

📅︎ Aug 21 2021

🚨︎ report

Help with testing my research question with a hierarchical linear model (HLM)

Hi! I am interested in using HLM in testing when a studen reports a type of interaction (in-person or remote) and how it affects their reports of loneliness (reporting once a day for 14 days). I am only in undergrad and this material of HLM is still new to me. However, I would like to broaden my horizon and try using this method rather than a multiple regression, for example (nothing wrong with regression, I actually have a plan of how I could test it with that, I just want to push myself a bit because I think HLM could give me good results). Any suggestions/ guidance/ advice would be appreciated.

👍︎ 13

💬︎

👤︎ u/ok-jello-6286

📅︎ Apr 03 2021

🚨︎ report

Alternatives to naive Bayes models for finance.

So I have this strategy that uses naive Bayes model to link the features to prediction on sign of future price change. I make the assumption that the conditional probability of the features given the current price change is Gaussian and the features are independent of each other under this condition (i.e. naive Bayes). It seems profitable in backtests, over 50,000+ data points now (15 min samples till January 2017 to last Friday, so over 4 years data). And I have tried it live for a day and it seems to capture the "mood" of the market in a way. I thought maybe it's better to refine this method, just in case.

But there are plenty of other distributions/approaches to naive Bayes. Before I go brute forcing all of them, I wanted to ask if anyone has experience with this approach and if there are any pointers/known wisdom over what kind of naive Bayes approach is better?

Reading the wikipedia, I came across this part

> a comprehensive comparison with other classification algorithms in 2006 showed that Bayes classification is outperformed by other approaches, such as boosted trees or random forests.

Anyone tried those last two, i.e. boosted trees or random forests, for classification problem? Any notes, comments, ideas on how that works out?

There is also this bit from sklearn's page on naive Bayes:

> On the flip side, although naive Bayes is known as a decent classifier, it is known to be a bad estimator, so the probability outputs from predict_proba are not to be taken too seriously.

So that makes it imperative to try other methods.

👍︎ 17

💬︎

👤︎ u/digitalfakir

📅︎ Mar 06 2021

🚨︎ report

[Research paper] Hierarchical Topic Modelling Over Time

Hello Reddit,

I am proud to present you HTMOT for Hierarchical Topic Modelling Over Time. This paper proposes a novel topic model able to extract topic hierarchies while also modelling their temporality. Modelling time provide more precise topics by separating lexically close but temporally distinct topics while modelling hierarchy provides a more detailed view of the content of a document corpus.

https://arxiv.org/abs/2112.03104

The code is easily accessible on GitHub and a working interface provides the ability to navigate through the resulting topic tree with ease: https://github.com/JudicaelPoumay/HTMOT

👍︎ 18

💬︎

👤︎ u/Addicted_to_math

📅︎ Dec 07 2021

🚨︎ report

Emergent Neural Control in Hierarchical Models signifiedorigins.wordpres…

👍︎ 2

💬︎

👤︎ u/inboble

📅︎ Jul 29 2021

🚨︎ report

[Q] What are some good resources (Books, Courses, YouTube channels) to learn about Hierarchical models / Multilevel models.

What are some good resources (books, courses, YouTube channels) to learn about Hierarchical models / Multilevel models.

Any suggestions will be helpful.

👍︎ 22

💬︎

👤︎ u/venkarafa

📅︎ Nov 19 2021

🚨︎ report

[R] Warsaw U, OpenAI and Google’s Hourglass Hierarchical Transformer Model Outperforms Transformer Baselines

A team from the University of Warsaw, OpenAI and Google Research proposes Hourglass, a hierarchical transformer language model that operates on shortened sequences to alleviate transformers’ huge computation burdens.

Here is a quick read: Warsaw U, OpenAI and Google’s Hourglass Hierarchical Transformer Model Outperforms Transformer Baselines.

The paper Hierarchical Transformers Are More Efficient Language Models is on arXiv.

👍︎ 11

💬︎

👤︎ u/Yuqing7

📅︎ Nov 01 2021

🚨︎ report

[R] Warsaw U, OpenAI and Google’s Hourglass Hierarchical Transformer Model Outperforms Transformer Baselines

A team from the University of Warsaw, OpenAI and Google Research proposes Hourglass, a hierarchical transformer language model that operates on shortened sequences to alleviate transformers’ huge computation burdens.

Here is a quick read: Warsaw U, OpenAI and Google’s Hourglass Hierarchical Transformer Model Outperforms Transformer Baselines.

The paper Hierarchical Transformers Are More Efficient Language Models is on arXiv.

👍︎ 4

💬︎

👤︎ u/Yuqing7

📅︎ Nov 01 2021

🚨︎ report

Need to create a bridging table that models a simple hierarchical relationship between entities - what are best practices here?

So say I have several employee_ids.

They happen to work in a place where the hierarchy of relationships between them needs to be flexible.

I thought of making this table to store the nature of the relationship between them.

id	manager_id	subordinate_id
1	15	10
2	16	15
3	16	12

For example, in the above table, it shows that employee_id 15 is the manager of employee_id 10.

In turn, 16 is the manager of 15, meaning if I wanted to query for all the subordinates under employee_id 16, I should get 15, 10 (indirectly), and then also 12.

This is relevant because I have another table called tasks, that lists out the task_ids per employee.

E.g.

id	task_id	employee_id
1	1	10
2	2	15
3	3	16

I'm trying to make a portal where whenever a manager logs in, the portal shows them all the tasks assigned to them, as well as all the tasks assigned to any subordinate under them (directly or indirectly).

I can think of some issues in mind that I would have to carefully plan for with the above method e.g.:

Is there a way to have a SQL query recursively join one table against itself until some limit is reached?
If an employee is a manager of another employee, another row mustn't be added where that manager is now the subordinate of that same employee, e.g. something like this would be invalid and some kind of constraint would need to prevent it.

id	manager_id	subordinate_id
1	15	10
2	10	15

(The above is invalid and must be validated against!)

👍︎ 2

💬︎

👤︎ u/Lostwhispers05

📅︎ Sep 13 2021

🚨︎ report