A list of puns related to "Bayesian optimization"
I am in the process of finalizing a monograph on Bayesian optimization to be published next year by Cambridge University Press. The target audience is graduate students in machine learning, statistics, and related fields, but I hope practitioners will find it useful as well.
A major goal of the book is to build up modern Bayesian optimization algorithms βfrom scratch,β revealing unifying themes in their design.
I am making a draft available for initial commentary and erratum squashing:
https://bayesoptbook.com/
Once published, the book will remain freely available on the companion webpage.
I welcome feedback via creating an issue on an associated GitHub repository:
https://github.com/bayesoptbook/bayesoptbook.github.io
I hope the community will find this resource useful!
-Roman Garnett
Hello all,
I'm interested in learning more about Genetic algorithms and Bayesian optimization in the context of Hyperparameter tuning in Machine Learning and Operations Research. Not interested in medium articles, I want to dive and understand the Math. I am also intested to get a good introduction to Reinforcement Learning.
Could you suggest good books/ pedagogical articles about these three subjects?
So far it is clear, that one-hot encoding with Gaussian Processes or some surrogate models such as Tree Parzen-estimators and Random Forests can naturally handle categorical as well as real-values variables when used for hyperparameter optimization with Gaussian Processes.
I want to optimize a search space of mixed variables. Which other approaches are there?
For Gaussian Processes I found the following helpful reference:
But how about approaches such as Neural Networks?
https://towardsdatascience.com/a-zero-maths-understanding-of-bayesian-optimization-e064a957a124
I was trying to convince my team to use Bayesian Optimization for hyperparameter optimization for 4 level ensemble model. Before I could make a decision, I had to explain to one of the clients that why it is necessary. I was thinking of what's the easiest way in which I can communicate what we are trying to accomplish here.
The same discussion is in the form of a write-up in this post.
Please let me know your comments and critique the POV on Bayesian Optimization.
It's always interesting to see the different perspectives when it comes to conventional stats vs Bayesian analysis.
I was trying to convince my team to use Bayesian Optimization for hyperparameter optimization for 4 level ensemble model. Before I could make a decision, I had to explain to one of the clients that why it is necessary. I was thinking of what's the easiest way in which I can communicate what we are trying to accomplish here.
So, I came up with a story.
I gifted you a sophisticated and hypothetical coffee machine and asked you to brew the best coffee for yourself by modulating the thousands of dials that are there on the machine. You are an intelligent fellow and quickly realise that it is an optimization problem. You have two options:
There are two things to note here:
So, what shall we do?
There is a framework that can solve this problem and that is Bayesian optimization.
Letβs assume the black-box function for the brew-quality is:
The function is a black box and we can only evaluate it for different inputs(brew styles).
Letβs say we want to find the best brew after sampling only 15 cups of coffee, what weβll do is brew a few cups of coffee < 15, in this case, we brewed 6) and have an estimated function as shown below in the red.
Now, weβll use this estimated function to determine where to evaluate next.
>This estimation of the original function from the estimated
... keep reading on reddit β‘Has anyone worked with "Bayesian Optimization" before?
I was reading over the details of Bayesian Optimization and it looks quite interesting. It seems to be an effective way to optimize expensive loss functions using the Bayesian Framework and Gaussian Process. Essentially, an "acquisition function" is developed that "guides" the optimization algorithm on where to "search" next - balancing concepts such as "exploitation" (consider regions that provided desirable outputs) and "exploration" (consider areas that have not been thoroughly searched). In the end, the Bayesian Optimization (like all optimization algorithms) provides the analyst with a set of inputs that (attempted) to minimize or maximize the objective function (e.g. a loss function in a neural network).
I had some general questions about the math involved in this:
Does anyone know if there are any mathematical results that demonstrates "why using an acquisition function to decide which points to select next is effective"? Are there any theoretical results that show why modelling the real objective function using gaussian process and acquisition functions is a good idea? Why is this meaningful? Does this approach have any foundations that suggest it might actually obtain convergence and find a minimum point? I.e. this is a stupid question - but what is the math that indicates to us that Bayesian Optimization is clearly better than "random search"? Or is there just a heuristic reason, i.e. considering regions that balance exploration and exploitation?
Can Bayesian Optimization be considered as "Derivative-Free Optimization"? Looking at the overall steps involved, it seems that the Bayesian Optimization algorithm is not evaluating any derivatives. Does this mean that Bayesian Optimization is well suited for functions that do not have derivatives (e.g. piecewise, non-smooth)?
Ultimately, the Bayesian Optimization algorithm decides which point to consider next based on the minimization of the acquisition function. Does anyone know "how exactly is the acquisition function minimized"? Basically, it seems like if you have an acquisition function a(x) : you would need to evaluate a(x) over the entire range of the input variables to determine its minimum value (i.e. argmin of a(x)) , or perform an optimization algorithm on a(x) itself. Won't this be a cumbersome and exhaustive process? I briefly saw in some videos that evaluating the argmin of a(x) is challenging, but still worth the effo
I feel like alpha isn't talked about nearly enough when it comes to BO. There are many optimization frameworks that use BO in the backend and most rarely touch alpha after they decide on a value. From my experience alpha is absolutely critical for most real life use cases (most scenarios in which BO is useful these days there is noise in the target function) and more importantly alpha can be a very sensitive parameter to set. For my particular problem I was getting completely invalid gpr fits when I was testing some log choices ex. (1e-3, 1e-4, 1e-5). It turns out that I needed something around pow(10, -1.75) to get something decent. Both 1e-1 and 1e-2 didn't work at all.
So for those who might not know, your BO optimization/framework could not be working at all if alpha isn't set correctly.
I'm looking for an R-library to optimize any multivariate objective function with Bayesian Optimization (BO). In python, I usually use Optuna (https://optuna.org/) for BO. Do you have any recommendations for equivalent libraries in R?
Usually, hyperparameters for a machine learning algorithm (e.g. "learning rate" for a neural network) are selected through some sort of grid search method. This includes selecting a fixed range of hyperparameters (e.g. try 3 different learning rates : 0.01, 0.05 and 0.001) or randomly selecting hyperparameters (e.g. try 3 different learning rates between 0.01 and 0.05). We then select the hyperparameter which results in the machine learning algorithm having the highest accuracy (on the training set).
Recently, I started reading about a much more involved method of selecting hyperparameter called "Bayesian Optimization". The way I understand it : Bayesian Optimization treats the different hyperparameters (e.g. perhaps not the best example, but let's say random forest where there are two hyperparameters: the number of trees and the number of variables used in making splits) along with the accuracy metric as a "functional space". We can imagine a 3 dimensional cube in which the axis correspond to "number of trees", "number of splitting variables" and "accuracy". Given the data we have observed (e.g. lets assume this is a supervised binary classification problem: predicting bankruptcy based on financial indicators), there is a hypothetical (not fully knowable) 3D plane that exists in this cube . This 3D plane has a corresponding function (we don't know the exact form of this function) - somewhere on this 3D plane there is a "highest point" or a "lowest point" : whichever it is, the coordinates (i.e. value of "number of trees" and "number of splitting variable") of this point will correspond to the optimal choice of hyperparameters for the random forest model, which in turn will yield the best accuracy.
As mentioned, the exact form of this function is not known. We can choose different values of "number of trees" and "number of splitting variables", build random forest models with our data for those values, and then record the accuracy on the training data. This will allow us to "recover" certain points on the surface of this 3D plane (at times, this can be very a computationally expensive process).
My understanding is: we can assume that the surface of this plane can be represented as a Gaussian Process - therefore, a Gaussian Process (defined by a choice of kernel function, the observed data, and recorded combinations of hyperparameter choices) is said to be a "surrogate" of the 3D plane. Using Bayesian Inference, we can then choose an "acquisition funct
... keep reading on reddit β‘Our (@ApoorvAgnihotr2 and @nipun_batra) article on Bayesian Optimization was recently published at Distillβa top machine learning journal. Apart from being my first published article, it is the first one from India! Thank you for theΒ amazing experience @distillpub.
I hope you all find the article useful. :)
Just wanted to share a not widely known feature of PyCaret. By default, PyCaret's tune_model uses the tried and tested RandomizedSearchCV from scikit-learn. However, not everyone knows about the various advanced options tune_model() currently allows you to use such as cutting edge hyperparameter tuning techniques like Bayesian Optimization through libraries such as tune-sklearn, Hyperopt, and Optuna.
Here's a blog post with code snippets and performance benchmarks if you want to learn more.
Bayesian Optimization is one of the most popular approaches to tune hyperparameters in machine learning. Still, it can be applied in several areas for single objective black-box optimization. We create a video on our YT Channel "Optimization Geeks" (Link in the Comments), where we explain the basic methodology and show based on a specific example how it works.
We focus especially on the acquisition function and also the difference of optimization performance using hyperparameters within the Bayesian optimization. This video aims not only to give you a better understanding of Bayesian Optimization but also to give a better feeling when it should be applied in which way.
Check out the video and subscribe to the channel!! And never forget KEEP OPTIMIZING!!
I recall there was a post on using Bayesian for optimizing indicator parameters or maybe something like that. Just wondering if anyone has success on using Bayesian for optimizing your algos.
Hello,
I've been attempting to use BO to optimize a black box function, and started by using https://github.com/fmfn/BayesianOptimization
I then switched to try to use the newer Facebook Ax library, thinking it would improve results. Unfortunately, it hasn't. In the BO python package above, there is a concept of "Domain Reduction", which when used with my black box function has been very successful in rapidly finding near optimal ranges for 26 parameters, some Choice and some Int/Float Range. I've linked it here:
https://github.com/fmfn/BayesianOptimization/blob/master/bayes_opt/domain_reduction.py
https://github.com/fmfn/BayesianOptimization/blob/master/examples/domain_reduction.ipynb
I've tried running Ax on the same data, configured in a similar fashion without the "Domain Reduction" extra above, and gotten significantly worse results. My attempts were run with: Sobol 30 steps, 150 GPEI trials, as well as 250 trials. It never came close to optimizing the target.
I've been struggling to understand two things:
Thanks everyone
I need to use Bayesian Optimization at work. I did some research on theory already. Based on your experience: 1) So what is its pros and cons as a general method of optimization for black-box expensive functions (not for ml purposes, let's say scheduling) in production? Did it do good job in your project? 2) Who are key people/institutions/open source projects developing it now ?
Not sure if this is the best sub for it but i figured i would give it a go since i can find good benchmarks online.
I'm looking into using BO/GPs for a manufacturing application where the feedback is given by a human. The data is relatively low dimensional but I plan to run a lot of campaigns/trials so I'm looking for something that will very fast for relatively simple spaces to minimize the time the human evaluator has to wait.
I've looked into GPy, GPytorch, PyMC3 and Shogun but I don't see many sources comparing these for speed. I could do so myself but I was wondering if there was someone already with some idea of what might work better. So if anyone has a suggestion for a package i would greatly appreciate it.
Thanks
Hello all,
I'm interested in learning more about Genetic algorithms and Bayesian optimization in the context of Hyperparameter tuning in Machine Learning and Operations Research. Not interested in medium articles, I want to dive and understand the Math. I am also intested to get a good introduction to Reinforcement Learning.
Could you suggest good books/ pedagogical articles about these three subjects?
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.