A list of puns related to "Exponential family"
Backpropagation is the workhorse of deep learning, but unfortunately, it only works for continuous functions that are amenable to the chain rule of differentiation. Since discrete algorithms have no continuous derivative, deep networks with such algorithms as part of them cannot be effectively trained using backpropagation. This paper presents a method to incorporate a large class of algorithms, formulated as discrete exponential family distributions, into deep networks and derives gradient estimates that can easily be used in end-to-end backpropagation. This enables things like combinatorial optimizers to be part of a network's forward propagation natively.
OUTLINE:
0:00 - Intro & Overview
4:25 - Sponsor: Weights & Biases
6:15 - Problem Setup & Contributions
8:50 - Recap: Straight-Through Estimator
13:25 - Encoding the discrete problem as an inner product
19:45 - From algorithm to distribution
23:15 - Substituting the gradient
26:50 - Defining a target distribution
38:30 - Approximating marginals via perturb-and-MAP
45:10 - Entire algorithm recap
56:45 - Github Page & Example
Paper: https://arxiv.org/abs/2106.01798
Code (TF): https://github.com/nec-research/tf-imle
Code (Torch): https://github.com/uclnlp/torch-imle
(Arc ?, Interlude ?: Archcommander Varney)
(Note: Bargain Bin Superheroes is episodic; each part is self-contained. This story can be enjoyed without reading the previous sections.)
The National High Energy and Temperature Lab was abuzz. Professor Hale bustled into the main containment center, where the primordial plasma they'd been studying for the past ten years was evolving. He gave the Archcommander by his side a friendly nod as he passed.
"It's the most incredible thing," Professor Hale said. "The mass-energy equivalent just keeps going up exponentially! We're lucky the lateβor should I say earlyβAlexandre Hubert wasn't a particularly heavy man; it's all we can do to contain the Hubert particles, given how much energy they're emitting right now."
Archcommander Varney grunted. "Hubert particles, eh? Is that what you eggheads are calling them?"
Professor Hale nodded ruefully. "We scientists, er... we're not great at names. They're often descriptors more than anything."
Archcommander Varney eyed the HEaT Lab name tag on Professor Hale's lapel. "Well, I appreciate your honesty. You said they're emitting energyβcould we use them as power sources?"
Professor Hale hesitated. "Not... not yet. We... could try, but there are these discontinuous... jumps. It's impossible to track down everyone who has the Hubert geneβit's a good third of the population, by what we can tellβso we can't really control the rate at which the particles go back in time. We're expecting the Hubert particles to stabilize soon. But!" Professor Hale pointed to a large metal cylinder with several ominously-groaning pipes leading out from it. "In the meantime! We're getting the most fascinating data about high-energy particles; we actually think we've figured out how materializer-type superhumans work. At these energies, we can actually observe higher-dimensional motionβ"
Archcommander Varney held up a hand to cut him off. "I read as much in your report. You don't need to butter me up, Hale. Your department's grant has already been approved."
Professor Hale wilted slightly. "Iβwell, I wasn't after more money, Archcommander. It's simply fascinating howβ"
"Professor! Professor!" A flushed, out-of-breath assistant ran up to the two of them. Archcommander Varney gave him a disapproving look, which he ignored. "The Hubert particlesβthey'reβthe cosmological dating results came back. We've figured out what time period they're from."
"Oh?" Professor Ha
... keep reading on reddit β‘I'm wondering if you guys learned generalized linear model and exponential family?
I'm learning Stanford CS229 machine learning course (by Andrew Ng) on Youtube. I heard that it's quite different from the course on Coursera (also given by Andrew Ng) in that it involves a lot of mathematics details. I thought it was OK. And I actually had no problem in understanding the mathematics of linear regression and logistic regression. But I had a very hard time in understanding generalized linear model and exponential family and spent a lot of time in them. I have finally got the hang of it, but have also come to wonder if it's worth the time.
I graduated from college long time ago. I like mathematics, but my current skills are just enough to follow the mathematics details. I mean, I believe I can follow the discussion at most and will never be good enough to do something with the mathematics on my own. Given that fact, I wonder if it's still useful to pursue the mathematical details, or should I just take the course on Coursera instead?
I wonder how you guys do it?
BTW, I googled a lot to understand generalized linear model and exponential family and didn't find a lot of discussion on them on sites like reddit, quora, medium, etc. That gave me the impression that it's not required knowledge for machine learning engineers and hence the question.
Thanks for any suggestions.
I would like to understand the nature of measure, using the exponential family of probability distributions as a context (because I understand the latter well). I understand that we want equation (8.1) to integrate to 1 (this is the definition of a probability distribution). Therefore we set that integral to one, and take the e^(-A(n)) term out of the integral (since it is not a function of x), and rearrange to get equation (8.2).
My point is, I understand what's going on algebraically, but I have no idea what role the measure is playing conceptually. I have tried to learn measure theory many times (i.e., on my own), but I just don't seem to get what it is or its motivation. When the word measure is used, I have no idea what it refers to. The Lebesgue measure is supposed to be a generalization of length to sets that more complicated than intervals. I understand it somehow relates to probability. But what is a measure? In the particular context below, for example, what does it contribute to the description of the exponential family?
Source: https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/other-readings/chapter8.pdf
https://preview.redd.it/ose94uoav0d51.png?width=673&format=png&auto=webp&s=424d8cd84e7a03db15ea65c39050c07aca475a41
I am having trouble grasping this concept and what does the exponential family help me find sufficient statistics ? Thanks!
By family business, I mean my father's company.
We have 5 employees and I've been managing it for a year with my father.
The company is growing in a exponential rate since I started there a year ago and it's becoming too much info to handle. I'm at college too so my mind isn't at the healthiest state right now since I've been overthinking everything.
We do need to hire more people but I'm afraid in the future the workflow goes down and I can't pay everyone, or that people just sit around doing nothing during this "standby period"
Any advice?
I was looking for a python implementation of general exponential family PCA (binomial in particular) but I couldn't find anything. Maybe I was using the wrong search terms. Sklearn has a bunch of PCA variants but none seem to be what I'm looking for.
Does anyone know of an implementation?
Thanks.
From elementary probability course, the probability distributions such as Gaussian, Poisson or exponential all have a good motivation. After staring at the formula of the exponential family distribution for a long time, I still do not get any intuition.
Can anyone help me understand Why we need it in the first place? What are some advantages of modeling a response variable to be exponential family vs normal?
The millionaires have trust funds but why not the middle class? Compound interest has an exponential effect, so couldn't a lifestyle of moderate live-below-your-means habits result in your children never having to work? I suppose very few people do this because all or most of their saved money goes directly to their own retirement, so there's little left over for a separate fund.
Are there glaring financial reasons this wouldn't work? Could I start a fund now, contribute modestly but regularly, not spend it in retirement, and (after 60 years) make my family wealthy?
If yes, why don't people do this more often?
A natural parameter family is defined as follows
π(π₯|π)=β(π₯)ππ₯π(ππ(π₯)+π΄(π))p(x|Ξ·)=h(x)exp(Ξ·T(x)+A(Ξ·))where T: sufficient statistics A: log partition function.
We want to prove that the natural parameter space N given by
N={π:β«(ππ₯π(π΄(π)))<β}N={Ξ·:β«(exp(A(Ξ·)))<β}is convex.
The proof rests on holder inequality and is given here. I am attaching a picture for a quick reference I have looked at the definition of holder inequality. I am not really sure how the 1/π1/Ξ» and 1/1βπ1/1βΞ» are written in the denominator in eq 8.35 when applying the holder inequality in the proof given.
Also in Eq 8.36 how is the $e^{\lambda n^T T(x)} is discared for the integral.
Please help in explaining these things?
I'm pretty blown away by how re-parameterizing so many different distributions to an exponential family form gives you so many nice properties, and am curious about the historical development of this concept. I definitely don't think I would have come up with this, and I'm curious what the path towards its discovery looked like.
Also, are there "shitty" exponential families? Ie, distribution families defined prior to the discovery of the exponential family that have some of the nice properties, but not all?
Thanks.
So exponential families can be parameterized as $f(x|\theta)=h(x)exp(T(x)n(\theta)βA(n)).$
I'm trying to understand what the conditions are on $h(x)$, $T(x)$, and $n(\theta)$ are for $f(x|\theta)$ to be a valid pdf. (I'm ignoring A(n) for now, since its just a normalizing factor).
I know we need $\int_R h(x)exp(T(x)n(\theta)-A(n)) dx = 1$ overall, along with positivity, but are there any requirements on these four component functions individually?
I hit on this lecture while looking for Fisher Information matrix, closely related with MLE (Maximum Likelihood estimators). The narrator( yes, it's narrator) of this video is an esteemed professor at University of Calcutta. I managed to watch it for 5 mins.
This is the quality of education our universities are capable of imparting to our children. WTF is happening. Honestly, if I was the decision maker, I would NEVER have put this lecture online for the entire world to see as a mockery of our education system.
Hello all, I was wondering if someone could help me understand a concept that I just came across when reading a graphical model text by Wainwright and Jordan: http://www.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf
What I'm confused about is how the mean parameters paramaterize a distribution. Do they just replace the canonical parameters in the regular exponential family density function? This concept is introduced in section 3.4 of the text
I'm experimenting with Restricted Boltzmann Machines to do some unsupervised learning so I can have an actual generative model of some data I'm working with. Also useful for pre-training of deep belief nets or things like that.
Anyway, a lot of the data I'm working with is real-valued data. Sometimes it's integers between, say, 1 and 20, and sometimes it's real values between 0 and 1,000,000 that are approximately exponentially distributed. I say "real values," but really it's rounded to the nearest 100th. It's distributed according to a real distribution, for all intents and purposes, but it doesn't require a great deal of precision in terms of decimal points.
So I was toying with the idea of setting up a so-called "Exponential-Family Harmonium," but I would need to use some kind of weird Energy formulation with mixed data types, since some of my input vector coordinates are binary values, some are binomially distributed, and some are maybe exponential or Gaussian. But it occurred to me that I could convert all of my real-valued entries into binary digits and use a standard RBM.
My question: would that actually work? That is, would converting all non-binary numbers into binary (and then just using a normal Restricted Boltzmann Machine trained with Contrastive Divergence) work? What kind of internal representation of the data would my algorithm come up with? Would it actually make sense when I run the program backwards to generate new examples of data?
Has anyone tried this? If so, how did it work out?
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.