[D] Paper Explained - Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions (Video Walkthrough)

Backpropagation is the workhorse of deep learning, but unfortunately, it only works for continuous functions that are amenable to the chain rule of differentiation. Since discrete algorithms have no continuous derivative, deep networks with such algorithms as part of them cannot be effectively trained using backpropagation. This paper presents a method to incorporate a large class of algorithms, formulated as discrete exponential family distributions, into deep networks and derives gradient estimates that can easily be used in end-to-end backpropagation. This enables things like combinatorial optimizers to be part of a network's forward propagation natively.

OUTLINE:

0:00 - Intro & Overview

4:25 - Sponsor: Weights & Biases

6:15 - Problem Setup & Contributions

8:50 - Recap: Straight-Through Estimator

13:25 - Encoding the discrete problem as an inner product

19:45 - From algorithm to distribution

23:15 - Substituting the gradient

26:50 - Defining a target distribution

38:30 - Approximating marginals via perturb-and-MAP

45:10 - Entire algorithm recap

56:45 - Github Page & Example

Paper: https://arxiv.org/abs/2106.01798

Code (TF): https://github.com/nec-research/tf-imle

Code (Torch): https://github.com/uclnlp/torch-imle

👍︎ 8

💬︎

👤︎ u/ykilcher

📅︎ Nov 27 2021

🚨︎ report

[WP] It seemed like a perfect magical deal. When any child descended from you is born you grow younger by a single year. So you agree, planning on a big family and living to a ripe old age. Years later however you find yourself rapidly growing younger and regret not understanding exponential growth.

👍︎ 9k

💬︎

👤︎ u/Lorix_In_Oz

📅︎ Apr 15 2021

🚨︎ report

“I didn’t know my family grew more dense with time, I thought it peaked with me.” “Oh no, it’s been exponential growth.”

👍︎ 28

💬︎

👤︎ u/Sylva89

📅︎ Nov 10 2021

🚨︎ report

This family photo reminding us how out of control exponential population growth can get.

👍︎ 115

💬︎

👤︎ u/kabukistar

📅︎ Feb 22 2021

🚨︎ report

[WP] It seemed like a perfect magical deal. When any child descended from you is born you grow younger by a single year. So you agree, planning on a big family and living to a ripe old age. Years later however you find yourself rapidly growing younger and regret not understanding exponential growth.

Bargain Bin Superheroes

(Arc ?, Interlude ?: Archcommander Varney)

(Note: Bargain Bin Superheroes is episodic; each part is self-contained. This story can be enjoyed without reading the previous sections.)

The National High Energy and Temperature Lab was abuzz. Professor Hale bustled into the main containment center, where the primordial plasma they'd been studying for the past ten years was evolving. He gave the Archcommander by his side a friendly nod as he passed.

"It's the most incredible thing," Professor Hale said. "The mass-energy equivalent just keeps going up exponentially! We're lucky the late—or should I say early—Alexandre Hubert wasn't a particularly heavy man; it's all we can do to contain the Hubert particles, given how much energy they're emitting right now."

Archcommander Varney grunted. "Hubert particles, eh? Is that what you eggheads are calling them?"

Professor Hale nodded ruefully. "We scientists, er... we're not great at names. They're often descriptors more than anything."

Archcommander Varney eyed the HEaT Lab name tag on Professor Hale's lapel. "Well, I appreciate your honesty. You said they're emitting energy—could we use them as power sources?"

Professor Hale hesitated. "Not... not yet. We... could try, but there are these discontinuous... jumps. It's impossible to track down everyone who has the Hubert gene—it's a good third of the population, by what we can tell—so we can't really control the rate at which the particles go back in time. We're expecting the Hubert particles to stabilize soon. But!" Professor Hale pointed to a large metal cylinder with several ominously-groaning pipes leading out from it. "In the meantime! We're getting the most fascinating data about high-energy particles; we actually think we've figured out how materializer-type superhumans work. At these energies, we can actually observe higher-dimensional motion—"

Archcommander Varney held up a hand to cut him off. "I read as much in your report. You don't need to butter me up, Hale. Your department's grant has already been approved."

Professor Hale wilted slightly. "I—well, I wasn't after more money, Archcommander. It's simply fascinating how—"

"Professor! Professor!" A flushed, out-of-breath assistant ran up to the two of them. Archcommander Varney gave him a disapproving look, which he ignored. "The Hubert particles—they're—the cosmological dating results came back. We've figured out what time period they're from."

"Oh?" Professor Ha

... keep reading on reddit ➡

👍︎ 91

💬︎

👤︎ u/meowcats734

📅︎ Apr 15 2021

🚨︎ report

Generalized linear model and exponential family

I'm wondering if you guys learned generalized linear model and exponential family?

I'm learning Stanford CS229 machine learning course (by Andrew Ng) on Youtube. I heard that it's quite different from the course on Coursera (also given by Andrew Ng) in that it involves a lot of mathematics details. I thought it was OK. And I actually had no problem in understanding the mathematics of linear regression and logistic regression. But I had a very hard time in understanding generalized linear model and exponential family and spent a lot of time in them. I have finally got the hang of it, but have also come to wonder if it's worth the time.

I graduated from college long time ago. I like mathematics, but my current skills are just enough to follow the mathematics details. I mean, I believe I can follow the discussion at most and will never be good enough to do something with the mathematics on my own. Given that fact, I wonder if it's still useful to pursue the mathematical details, or should I just take the course on Coursera instead?

I wonder how you guys do it?

BTW, I googled a lot to understand generalized linear model and exponential family and didn't find a lot of discussion on them on sites like reddit, quora, medium, etc. That gave me the impression that it's not required knowledge for machine learning engineers and hence the question.

Thanks for any suggestions.

👍︎ 2

💬︎

👤︎ u/hello_rayx

📅︎ Feb 13 2021

🚨︎ report

Jack Posobiec 🇺🇸 on Twitter. Anyone noticing the exponential rise in bad information for the Biden crime-family? twitter.com/JackPosobiec/…

👍︎ 3

💬︎

👤︎ u/alaskansteve

📅︎ Dec 31 2020

🚨︎ report

[measure-theory] An exponential family of probability distributions have densities that are defined relative to a measure. What is a measure in this context?

I would like to understand the nature of measure, using the exponential family of probability distributions as a context (because I understand the latter well). I understand that we want equation (8.1) to integrate to 1 (this is the definition of a probability distribution). Therefore we set that integral to one, and take the e^(-A(n)) term out of the integral (since it is not a function of x), and rearrange to get equation (8.2).

My point is, I understand what's going on algebraically, but I have no idea what role the measure is playing conceptually. I have tried to learn measure theory many times (i.e., on my own), but I just don't seem to get what it is or its motivation. When the word measure is used, I have no idea what it refers to. The Lebesgue measure is supposed to be a generalization of length to sets that more complicated than intervals. I understand it somehow relates to probability. But what is a measure? In the particular context below, for example, what does it contribute to the description of the exponential family?

Source: https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/other-readings/chapter8.pdf

https://preview.redd.it/ose94uoav0d51.png?width=673&format=png&auto=webp&s=424d8cd84e7a03db15ea65c39050c07aca475a41

👍︎ 2

💬︎

👤︎ u/synysterbates

📅︎ Jul 25 2020

🚨︎ report

Can someone explain the significance of exponential families and how to find them?

I am having trouble grasping this concept and what does the exponential family help me find sufficient statistics ? Thanks!

👍︎ 3

💬︎

👤︎ u/SoulVibez

📅︎ Sep 06 2020

🚨︎ report

Covid-19 cases rises at an exponential rate after case numbers reach 50. You may come out of this unscathed (assuming the avg. age of this sub to be mostly below 40), but the elders in your family are statistically much more at risk. This is not to fear monger but a gentle reminder to stay safe.

👍︎ 9

💬︎

👤︎ u/Hunterkiller1992

📅︎ Mar 04 2020

🚨︎ report

22yo managing a small family business in Electrical Eng. How to handle the stress of the company exponential growth?

By family business, I mean my father's company.

We have 5 employees and I've been managing it for a year with my father.

The company is growing in a exponential rate since I started there a year ago and it's becoming too much info to handle. I'm at college too so my mind isn't at the healthiest state right now since I've been overthinking everything.

We do need to hire more people but I'm afraid in the future the workflow goes down and I can't pay everyone, or that people just sit around doing nothing during this "standby period"

Any advice?

👍︎ 3

💬︎

👤︎ u/HardcoreJorge

📅︎ Oct 24 2019

🚨︎ report

[D] Exponential Family PCA

I was looking for a python implementation of general exponential family PCA (binomial in particular) but I couldn't find anything. Maybe I was using the wrong search terms. Sklearn has a bunch of PCA variants but none seem to be what I'm looking for.

Does anyone know of an implementation?

Thanks.

👍︎ 4

💬︎

👤︎ u/tpapp157

📅︎ May 13 2019

🚨︎ report

Preparing Your Family to Succeed in an Exponential World [Webinar] | Singularity University youtube.com/watch?v=elr-s…

👍︎ 10

💬︎

👤︎ u/mind_bomber

📅︎ Aug 14 2019

🚨︎ report

What is rational behind the exponential family distribution?

From elementary probability course, the probability distributions such as Gaussian, Poisson or exponential all have a good motivation. After staring at the formula of the exponential family distribution for a long time, I still do not get any intuition.

Can anyone help me understand Why we need it in the first place? What are some advantages of modeling a response variable to be exponential family vs normal?

👍︎ 9

💬︎

👤︎ u/zhaoc033

📅︎ Feb 04 2018

🚨︎ report

Given compound interest's exponential curve being most affected by time, why don't most families have trust funds? With moderate, consistent saving, couldn't such funds completely finance a family's livelihood after a generation?

The millionaires have trust funds but why not the middle class? Compound interest has an exponential effect, so couldn't a lifestyle of moderate live-below-your-means habits result in your children never having to work? I suppose very few people do this because all or most of their saved money goes directly to their own retirement, so there's little left over for a separate fund.

Are there glaring financial reasons this wouldn't work? Could I start a fund now, contribute modestly but regularly, not spend it in retirement, and (after 60 years) make my family wealthy?

If yes, why don't people do this more often?

👍︎ 25

💬︎

👤︎ u/seniorbycredit

📅︎ Mar 16 2016

🚨︎ report

[R] Roger Grosse's "Theorem 2" challenge: exponential families, Bregman divergences and duality (inFERENCe) inference.vc/grosses-chal…

👍︎ 28

💬︎

👤︎ u/fhuszar

📅︎ Nov 29 2017

🚨︎ report

Natural Parameter Space for a exponential family is a convex set ? Proof explanation.

A natural parameter family is defined as follows

𝑝(𝑥|𝜂)=ℎ(𝑥)𝑒𝑥𝑝(𝜂𝑇(𝑥)+𝐴(𝜂))p(x|η)=h(x)exp(ηT(x)+A(η))where T: sufficient statistics A: log partition function.

We want to prove that the natural parameter space N given by

N={𝜂:∫(𝑒𝑥𝑝(𝐴(𝜂)))<∞}N={η:∫(exp(A(η)))<∞}is convex.

The proof rests on holder inequality and is given here. I am attaching a picture for a quick reference I have looked at the definition of holder inequality. I am not really sure how the 1/𝜆1/λ and 1/1−𝜆1/1−λ are written in the denominator in eq 8.35 when applying the holder inequality in the proof given.

Also in Eq 8.36 how is the $e^{\lambda n^T T(x)} is discared for the integral.

Please help in explaining these things?

👍︎ 6

💬︎

👤︎ u/KrisSingh

📅︎ May 06 2019

🚨︎ report

Lawmakers’ families cash in on sketchy broadband deal, and Arkansas has questions | This company, which analysts deemed monumentally inferior to its competitors, but who has friends in high places, asked for, and was awarded, exponentially more money. arktimes.com/arkansas-blo…

👍︎ 119

💬︎

👤︎ u/SetMau92

📅︎ Nov 18 2021

🚨︎ report

Resources on the history of the exponential family?

I'm pretty blown away by how re-parameterizing so many different distributions to an exponential family form gives you so many nice properties, and am curious about the historical development of this concept. I definitely don't think I would have come up with this, and I'm curious what the path towards its discovery looked like.

Also, are there "shitty" exponential families? Ie, distribution families defined prior to the discovery of the exponential family that have some of the nice properties, but not all?

Thanks.

👍︎ 6

💬︎

👤︎ u/TissueReligion

📅︎ Oct 03 2018

🚨︎ report

Does any choice of component functions create valid exponential family pdfs?

So exponential families can be parameterized as $f(x|\theta)=h(x)exp(T(x)n(\theta)−A(n)).$

I'm trying to understand what the conditions are on $h(x)$, $T(x)$, and $n(\theta)$ are for $f(x|\theta)$ to be a valid pdf. (I'm ignoring A(n) for now, since its just a normalizing factor).

I know we need $\int_R h(x)exp(T(x)n(\theta)-A(n)) dx = 1$ overall, along with positivity, but are there any requirements on these four component functions individually?

👍︎ 2

💬︎

👤︎ u/TissueReligion

📅︎ Oct 05 2018

🚨︎ report

Lawmakers’ families cash in on sketchy broadband deal, and Arkansas has questions | This company, which analysts deemed monumentally inferior to its competitors, but who has friends in high places, asked for, and was awarded, exponentially more money. arktimes.com/arkansas-blo…

👍︎ 49

💬︎

👤︎ u/SetMau92

📅︎ Nov 18 2021

🚨︎ report

MHRD sponsored lecture in Statistics : Cramer -Rao lower bound for exponential family of distributions

I hit on this lecture while looking for Fisher Information matrix, closely related with MLE (Maximum Likelihood estimators). The narrator( yes, it's narrator) of this video is an esteemed professor at University of Calcutta. I managed to watch it for 5 mins.

This is the quality of education our universities are capable of imparting to our children. WTF is happening. Honestly, if I was the decision maker, I would NEVER have put this lecture online for the entire world to see as a mockery of our education system.

👍︎ 5

💬︎

👤︎ u/goodbeertimes

📅︎ Nov 15 2017

🚨︎ report

Exponential Family Embeddings newton.ac.uk/seminar/2016…

👍︎ 14

💬︎

👤︎ u/dharma-1

📅︎ Aug 04 2016

🚨︎ report

[R] Sampling-free Uncertainty Estimation in Gated Recurrent Units with Exponential Families arxiv.org/abs/1804.07351v…

👍︎ 3

💬︎

👤︎ u/undefdev

📅︎ Apr 26 2018

🚨︎ report

Is anyone else frightened about the eventual spread of ebola? It is growing exponentially. Exponential curves end badly. My family thinks I am being paranoid.

👍︎ 2

💬︎

👤︎ u/phame

📅︎ Sep 11 2014

🚨︎ report

Ask ML: Mean parameterization of exponential families

Hello all, I was wondering if someone could help me understand a concept that I just came across when reading a graphical model text by Wainwright and Jordan: http://www.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf

What I'm confused about is how the mean parameters paramaterize a distribution. Do they just replace the canonical parameters in the regular exponential family density function? This concept is introduced in section 3.4 of the text

👍︎ 11

💬︎

👤︎ u/the_mind_is_a_sponge

📅︎ Mar 16 2012

🚨︎ report

Exponential-Family Harmoniums vs standard RBMs [Question]

I'm experimenting with Restricted Boltzmann Machines to do some unsupervised learning so I can have an actual generative model of some data I'm working with. Also useful for pre-training of deep belief nets or things like that.

Anyway, a lot of the data I'm working with is real-valued data. Sometimes it's integers between, say, 1 and 20, and sometimes it's real values between 0 and 1,000,000 that are approximately exponentially distributed. I say "real values," but really it's rounded to the nearest 100th. It's distributed according to a real distribution, for all intents and purposes, but it doesn't require a great deal of precision in terms of decimal points.

So I was toying with the idea of setting up a so-called "Exponential-Family Harmonium," but I would need to use some kind of weird Energy formulation with mixed data types, since some of my input vector coordinates are binary values, some are binomially distributed, and some are maybe exponential or Gaussian. But it occurred to me that I could convert all of my real-valued entries into binary digits and use a standard RBM.

My question: would that actually work? That is, would converting all non-binary numbers into binary (and then just using a normal Restricted Boltzmann Machine trained with Contrastive Divergence) work? What kind of internal representation of the data would my algorithm come up with? Would it actually make sense when I run the program backwards to generate new examples of data?

Has anyone tried this? If so, how did it work out?

👍︎ 2

💬︎

👤︎ u/M_Bus

📅︎ Dec 29 2014

🚨︎ report

Are exponential families taught in undergrad? When do you learn about them?

👍︎ 2

💬︎

👤︎ u/enfieldacademy

📅︎ Jun 19 2012

🚨︎ report

Inference on Exponential Families. Talk by Alex Smola - Machine Learning Class - CMU computervisiontalks.com/2…

👍︎ 9

💬︎

👤︎ u/ojaved

📅︎ Jun 24 2015

🚨︎ report

Square Root Graphical Models: Multivariate Generalizations of Univariate Exponential Families that Permit Positive Dependencies. (arXiv:1603.03629v1 [stat.ML]) arxiv.org/abs/1603.03629

👍︎ 3

💬︎

👤︎ u/arXibot

📅︎ Mar 14 2016

🚨︎ report

Graphical Models, Exponential Families, and Variational Inference - a 300 page advanced tutorial nowpublishers.com/product…

👍︎ 11

💬︎

👤︎ u/urish

📅︎ Sep 13 2009

🚨︎ report

Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures. (arXiv:1508.05243v1 [stat.ML]) arxiv.org/abs/1508.05243

👍︎ 2

💬︎

👤︎ u/arXibot

📅︎ Aug 24 2015

🚨︎ report