29 Hilarious Bayesian optimization Puns

[P] Bayesian optimization book

I am in the process of finalizing a monograph on Bayesian optimization to be published next year by Cambridge University Press. The target audience is graduate students in machine learning, statistics, and related fields, but I hope practitioners will find it useful as well.

A major goal of the book is to build up modern Bayesian optimization algorithms “from scratch,” revealing unifying themes in their design.

I am making a draft available for initial commentary and erratum squashing:

https://bayesoptbook.com/

Once published, the book will remain freely available on the companion webpage.

I welcome feedback via creating an issue on an associated GitHub repository:

https://github.com/bayesoptbook/bayesoptbook.github.io

I hope the community will find this resource useful!

-Roman Garnett

👍︎ 340

💬︎

👤︎ u/romangarnett

📅︎ Oct 09 2021

🚨︎ report

Genetic Algorithms| Bayesian Optimization | Reinforcement Learning

Hello all,

I'm interested in learning more about Genetic algorithms and Bayesian optimization in the context of Hyperparameter tuning in Machine Learning and Operations Research. Not interested in medium articles, I want to dive and understand the Math. I am also intested to get a good introduction to Reinforcement Learning.

Could you suggest good books/ pedagogical articles about these three subjects?

👍︎ 26

💬︎

👤︎ u/quilograma

📅︎ Nov 21 2021

🚨︎ report

Neural Network Hyperparameter Tuning using Bayesian Optimization analyticsindiamag.com/neu…

👍︎ 3

💬︎

👤︎ u/analyticsindiam

📅︎ Nov 30 2021

🚨︎ report

"Bayesian Optimization Book" draft, Garnett 2021 bayesoptbook.com/

👍︎ 21

💬︎

👤︎ u/gwern

📅︎ Nov 15 2021

🚨︎ report

"Bayesian Optimization Book" draft, Garnett 2021 bayesoptbook.com/

👍︎ 3

💬︎

👤︎ u/gwern

📅︎ Nov 15 2021

🚨︎ report

[D] Which approaches for hyperparameter optimization with bayesian optimization can handle continuous and discrete variables?

So far it is clear, that one-hot encoding with Gaussian Processes or some surrogate models such as Tree Parzen-estimators and Random Forests can naturally handle categorical as well as real-values variables when used for hyperparameter optimization with Gaussian Processes.

I want to optimize a search space of mixed variables. Which other approaches are there?

For Gaussian Processes I found the following helpful reference:

Dealing with categorical and integer-valued variables in Bayesian Optimization with Gaussian Processes

But how about approaches such as Neural Networks?

👍︎ 17

💬︎

👤︎ u/MrAchillesTurtle

📅︎ Sep 26 2021

🚨︎ report

[P] Bayesian optimization book (/r/MachineLearning) reddit.com/r/MachineLearn…

👍︎ 2

💬︎

👤︎ u/ContentForager

📅︎ Oct 14 2021

🚨︎ report

[Research] A Zero Maths Understanding of Bayesian Optimization

https://towardsdatascience.com/a-zero-maths-understanding-of-bayesian-optimization-e064a957a124

I was trying to convince my team to use Bayesian Optimization for hyperparameter optimization for 4 level ensemble model. Before I could make a decision, I had to explain to one of the clients that why it is necessary. I was thinking of what's the easiest way in which I can communicate what we are trying to accomplish here.

The same discussion is in the form of a write-up in this post.

Please let me know your comments and critique the POV on Bayesian Optimization.

It's always interesting to see the different perspectives when it comes to conventional stats vs Bayesian analysis.

👍︎ 22

💬︎

👤︎ u/prashantmdgl9

📅︎ Sep 01 2021

🚨︎ report

[Discussion] Coffee and Bayesian Optimization

I was trying to convince my team to use Bayesian Optimization for hyperparameter optimization for 4 level ensemble model. Before I could make a decision, I had to explain to one of the clients that why it is necessary. I was thinking of what's the easiest way in which I can communicate what we are trying to accomplish here.

So, I came up with a story.

I gifted you a sophisticated and hypothetical coffee machine and asked you to brew the best coffee for yourself by modulating the thousands of dials that are there on the machine. You are an intelligent fellow and quickly realise that it is an optimization problem. You have two options:

Change the settings of the dials umpteen times, brew different types of coffees, taste all of them, find your best brew, and then die from caffeine overdose.
Try to find a function brew-quality = f(brew-styles) by quantifying various factors involved and find global maxima using the gradient descent method by taking advantage of calculating the derivatives of the function.

There are two things to note here:

We don’t really know what the function is, it is a black box. We brew coffee by modulating dials and we get the coffee as the output, what happens inside the machine, we don’t really know. The only information we have is how does the coffee taste on a particular setting of the machine.
Even if we knew what the brew-quality function was, it would be an expensive one to evaluate for we can’t have thousands of cups of coffee for obvious reasons given stomach capacity and health hazards.

So, what shall we do?

There is a framework that can solve this problem and that is Bayesian optimization.

Let’s assume the black-box function for the brew-quality is:

image by author

The function is a black box and we can only evaluate it for different inputs(brew styles).

Let’s say we want to find the best brew after sampling only 15 cups of coffee, what we’ll do is brew a few cups of coffee < 15, in this case, we brewed 6) and have an estimated function as shown below in the red.

image by author

Now, we’ll use this estimated function to determine where to evaluate next.

>This estimation of the original function from the estimated

... keep reading on reddit ➡

👍︎ 16

💬︎

👤︎ u/prashantmdgl9

📅︎ Sep 01 2021

🚨︎ report

[D] Minimizing the Acquisition Function During Bayesian Optimization

Has anyone worked with "Bayesian Optimization" before?

I was reading over the details of Bayesian Optimization and it looks quite interesting. It seems to be an effective way to optimize expensive loss functions using the Bayesian Framework and Gaussian Process. Essentially, an "acquisition function" is developed that "guides" the optimization algorithm on where to "search" next - balancing concepts such as "exploitation" (consider regions that provided desirable outputs) and "exploration" (consider areas that have not been thoroughly searched). In the end, the Bayesian Optimization (like all optimization algorithms) provides the analyst with a set of inputs that (attempted) to minimize or maximize the objective function (e.g. a loss function in a neural network).

I had some general questions about the math involved in this:

Does anyone know if there are any mathematical results that demonstrates "why using an acquisition function to decide which points to select next is effective"? Are there any theoretical results that show why modelling the real objective function using gaussian process and acquisition functions is a good idea? Why is this meaningful? Does this approach have any foundations that suggest it might actually obtain convergence and find a minimum point? I.e. this is a stupid question - but what is the math that indicates to us that Bayesian Optimization is clearly better than "random search"? Or is there just a heuristic reason, i.e. considering regions that balance exploration and exploitation?
Can Bayesian Optimization be considered as "Derivative-Free Optimization"? Looking at the overall steps involved, it seems that the Bayesian Optimization algorithm is not evaluating any derivatives. Does this mean that Bayesian Optimization is well suited for functions that do not have derivatives (e.g. piecewise, non-smooth)?
Ultimately, the Bayesian Optimization algorithm decides which point to consider next based on the minimization of the acquisition function. Does anyone know "how exactly is the acquisition function minimized"? Basically, it seems like if you have an acquisition function a(x) : you would need to evaluate a(x) over the entire range of the input variables to determine its minimum value (i.e. argmin of a(x)) , or perform an optimization algorithm on a(x) itself. Won't this be a cumbersome and exhaustive process? I briefly saw in some videos that evaluating the argmin of a(x) is challenging, but still worth the effo

... keep reading on reddit ➡

👍︎ 97

💬︎

👤︎ u/jj4646

📅︎ Jul 15 2021

🚨︎ report

PSA: for those using Bayesian Optimization keep in mind alpha may need fine tuning

I feel like alpha isn't talked about nearly enough when it comes to BO. There are many optimization frameworks that use BO in the backend and most rarely touch alpha after they decide on a value. From my experience alpha is absolutely critical for most real life use cases (most scenarios in which BO is useful these days there is noise in the target function) and more importantly alpha can be a very sensitive parameter to set. For my particular problem I was getting completely invalid gpr fits when I was testing some log choices ex. (1e-3, 1e-4, 1e-5). It turns out that I needed something around pow(10, -1.75) to get something decent. Both 1e-1 and 1e-2 didn't work at all.

So for those who might not know, your BO optimization/framework could not be working at all if alpha isn't set correctly.

👍︎ 3

💬︎

👤︎ u/Yogi_DMT

📅︎ Sep 26 2021

🚨︎ report

[P] Bayesian optimization book /r/MachineLearning/commen…

👍︎ 2

💬︎

👤︎ u/shadiakiki1986

📅︎ Oct 10 2021

🚨︎ report

Bayesian Optimization for Chemical Synthesis and Chemical Process Development youtube.com/watch?v=hZpdA…

👍︎ 22

💬︎

👤︎ u/organiker

📅︎ May 18 2021

🚨︎ report

Coding Bayesian Optimization (Bayes Opt) with BOTORCH - Full code example youtube.com/watch?v=BQ4kV…

👍︎ 3

💬︎

👤︎ u/OptimizationGeek

📅︎ Aug 11 2021

🚨︎ report

"Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020", Turner et al 2021 arxiv.org/abs/2104.10201

👍︎ 39

💬︎

👤︎ u/gwern

📅︎ Apr 27 2021

🚨︎ report

[D] Best Bayesian Optimization Library in R?

I'm looking for an R-library to optimize any multivariate objective function with Bayesian Optimization (BO). In python, I usually use Optuna (https://optuna.org/) for BO. Do you have any recommendations for equivalent libraries in R?

👍︎ 4

💬︎

👤︎ u/thisisthehappylion

📅︎ Jun 25 2021

🚨︎ report

Automating Chemical Synthesis using Experimental Design by Bayesian Optimization reddit.com/r/Chempros/com…

👍︎ 3

💬︎

👤︎ u/organiker

📅︎ May 18 2021

🚨︎ report

[D] Selecting Hyperparameters Using Bayesian Optimization

Usually, hyperparameters for a machine learning algorithm (e.g. "learning rate" for a neural network) are selected through some sort of grid search method. This includes selecting a fixed range of hyperparameters (e.g. try 3 different learning rates : 0.01, 0.05 and 0.001) or randomly selecting hyperparameters (e.g. try 3 different learning rates between 0.01 and 0.05). We then select the hyperparameter which results in the machine learning algorithm having the highest accuracy (on the training set).

Recently, I started reading about a much more involved method of selecting hyperparameter called "Bayesian Optimization". The way I understand it : Bayesian Optimization treats the different hyperparameters (e.g. perhaps not the best example, but let's say random forest where there are two hyperparameters: the number of trees and the number of variables used in making splits) along with the accuracy metric as a "functional space". We can imagine a 3 dimensional cube in which the axis correspond to "number of trees", "number of splitting variables" and "accuracy". Given the data we have observed (e.g. lets assume this is a supervised binary classification problem: predicting bankruptcy based on financial indicators), there is a hypothetical (not fully knowable) 3D plane that exists in this cube . This 3D plane has a corresponding function (we don't know the exact form of this function) - somewhere on this 3D plane there is a "highest point" or a "lowest point" : whichever it is, the coordinates (i.e. value of "number of trees" and "number of splitting variable") of this point will correspond to the optimal choice of hyperparameters for the random forest model, which in turn will yield the best accuracy.

As mentioned, the exact form of this function is not known. We can choose different values of "number of trees" and "number of splitting variables", build random forest models with our data for those values, and then record the accuracy on the training data. This will allow us to "recover" certain points on the surface of this 3D plane (at times, this can be very a computationally expensive process).

My understanding is: we can assume that the surface of this plane can be represented as a Gaussian Process - therefore, a Gaussian Process (defined by a choice of kernel function, the observed data, and recorded combinations of hyperparameter choices) is said to be a "surrogate" of the 3D plane. Using Bayesian Inference, we can then choose an "acquisition funct

... keep reading on reddit ➡

👍︎ 11

💬︎

👤︎ u/SQL_beginner

📅︎ Mar 03 2021

🚨︎ report

How is Facebook using Bayesian Optimization with BoTorch? mathsgee.com/26206/how-is…

👍︎ 2

💬︎

👤︎ u/Mathsgee

📅︎ Apr 10 2021

🚨︎ report

[News] Distill article on Bayesian Optimization

Our (@ApoorvAgnihotr2 and @nipun_batra) article on Bayesian Optimization was recently published at Distill—a top machine learning journal. Apart from being my first published article, it is the first one from India! Thank you for the amazing experience @distillpub.

I hope you all find the article useful. :)

https://distill.pub/2020/bayesian-optimization/

👍︎ 267

💬︎

👤︎ u/apoorvagni0

📅︎ May 15 2020

🚨︎ report

In this video we explain the basic methodology and show based on a specific example how it works. We focus especially on the acquisition function and also the difference of optimization performance using hyperparameters within the Bayesian optimization. Check out our Channel and KEEP OPTIMIZING!! youtube.com/watch?v=M-NTk…

👍︎ 6

💬︎

👤︎ u/OptimizationGeek

📅︎ Mar 09 2021

🚨︎ report

[P] Bayesian Hyperparameter Optimization with tune-sklearn in PyCaret

Just wanted to share a not widely known feature of PyCaret. By default, PyCaret's tune_model uses the tried and tested RandomizedSearchCV from scikit-learn. However, not everyone knows about the various advanced options tune_model() currently allows you to use such as cutting edge hyperparameter tuning techniques like Bayesian Optimization through libraries such as tune-sklearn, Hyperopt, and Optuna.

Here's a blog post with code snippets and performance benchmarks if you want to learn more.

👍︎ 5

💬︎

👤︎ u/mgalarny

📅︎ Mar 04 2021

🚨︎ report

[N] Bayesian Optimization (Bayes Opt): Easy explanation of popular hyperparameter tuning method

Bayesian Optimization is one of the most popular approaches to tune hyperparameters in machine learning. Still, it can be applied in several areas for single objective black-box optimization. We create a video on our YT Channel "Optimization Geeks" (Link in the Comments), where we explain the basic methodology and show based on a specific example how it works.

We focus especially on the acquisition function and also the difference of optimization performance using hyperparameters within the Bayesian optimization. This video aims not only to give you a better understanding of Bayesian Optimization but also to give a better feeling when it should be applied in which way.

Check out the video and subscribe to the channel!! And never forget KEEP OPTIMIZING!!

👍︎ 2

💬︎

👤︎ u/OptimizationGeek

📅︎ Mar 09 2021

🚨︎ report

Who uses Bayesian optimization to tune trading algorithms?

I recall there was a post on using Bayesian for optimizing indicator parameters or maybe something like that. Just wondering if anyone has success on using Bayesian for optimizing your algos.

👍︎ 17

💬︎

👤︎ u/wingchun777

📅︎ Dec 18 2020

🚨︎ report

Where can someone get the solutions for "Machine learning: a Bayesian and optimization perspective by Sergios Theodoridis"

👍︎ 2

💬︎

👤︎ u/yangtzech

📅︎ Dec 07 2020

🚨︎ report

Question: Using Bayesian Optimization with Domain Reduction

Hello,

I've been attempting to use BO to optimize a black box function, and started by using https://github.com/fmfn/BayesianOptimization

I then switched to try to use the newer Facebook Ax library, thinking it would improve results. Unfortunately, it hasn't. In the BO python package above, there is a concept of "Domain Reduction", which when used with my black box function has been very successful in rapidly finding near optimal ranges for 26 parameters, some Choice and some Int/Float Range. I've linked it here:

https://github.com/fmfn/BayesianOptimization/blob/master/bayes_opt/domain_reduction.py
https://github.com/fmfn/BayesianOptimization/blob/master/examples/domain_reduction.ipynb

I've tried running Ax on the same data, configured in a similar fashion without the "Domain Reduction" extra above, and gotten significantly worse results. My attempts were run with: Sobol 30 steps, 150 GPEI trials, as well as 250 trials. It never came close to optimizing the target.

I've been struggling to understand two things:

Is this somewhat already implemented (bounds tightening, domain reduction, in any sense) in the Ax project?
If not, what would be my best path to adding something like this within Ax?
Are there any better tools for my task?

Thanks everyone

👍︎ 7

💬︎

👤︎ u/supernovaballstars2

📅︎ Sep 03 2020

🚨︎ report

[D] Bayesian Optimization: does it work?

I need to use Bayesian Optimization at work. I did some research on theory already. Based on your experience: 1) So what is its pros and cons as a general method of optimization for black-box expensive functions (not for ml purposes, let's say scheduling) in production? Did it do good job in your project? 2) Who are key people/institutions/open source projects developing it now ?

👍︎ 24

💬︎

👤︎ u/dondonquixote

📅︎ Apr 01 2020

🚨︎ report

Fastest package for online Bayesian optimization/Gaussian processes in python?

Not sure if this is the best sub for it but i figured i would give it a go since i can find good benchmarks online.

I'm looking into using BO/GPs for a manufacturing application where the feedback is given by a human. The data is relatively low dimensional but I plan to run a lot of campaigns/trials so I'm looking for something that will very fast for relatively simple spaces to minimize the time the human evaluator has to wait.

I've looked into GPy, GPytorch, PyMC3 and Shogun but I don't see many sources comparing these for speed. I could do so myself but I was wondering if there was someone already with some idea of what might work better. So if anyone has a suggestion for a package i would greatly appreciate it.

Thanks

👍︎ 16

💬︎

👤︎ u/TenSaiRyu

📅︎ May 27 2020

🚨︎ report

Genetic Algorithms| Bayesian Optimization | Reinforcement Learning

Hello all,

I'm interested in learning more about Genetic algorithms and Bayesian optimization in the context of Hyperparameter tuning in Machine Learning and Operations Research. Not interested in medium articles, I want to dive and understand the Math. I am also intested to get a good introduction to Reinforcement Learning.

Could you suggest good books/ pedagogical articles about these three subjects?

👍︎ 2

💬︎

👤︎ u/quilograma

📅︎ Nov 21 2021

🚨︎ report