37 Hilarious Hyperparameter Puns

[D] I'm new and scrappy. What tips do you have for better logging and documentation when training or hyperparameter training?

Hi,

I'm relatively new to the field. I'm self-taught (like most) and have a pretty scrappy approach to logging and documenting. I want to have a more organised approach.

I'm looking for tips, resources, or a discussion on what methods are available to be more systematic with the way I document my ML training and hyperparameter optimisation.

For example, when training a model for a specific project, do you save each hyperparameter configuration and output in a .json? Or should I use a .txt? Is one of these formats more suitable for hundreds of separate fittings?

How do you save your learning curves? Also a .json file?

Is there any reason to save each model you train?

Thanks in advance!

Edit: I use sklearn mostly. Is there a universally accepted way to log or is it package dependent?

👍︎ 205

💬︎

👤︎ u/MetalOrganicKneeJerk

📅︎ May 24 2021

🚨︎ report

Hyperparameters for DQN

How important are the hyperparameters for DQN? I'm trying to implement a DQN based on the Nature paper with the game Pong but since I don't have much computation power, I'm only training on 200,000 frames with all parameters scaled down. After training (without target network), my policy converged to only one action and the Q values were all negative. Not sure if this is due to a bug in my code or because of my hyperparameters.

Edit: https://colab.research.google.com/drive/1HKZmpVZApc7US4pkHC81X9juO5nWBwe7?usp=sharing

Some results: https://imgur.com/a/6USc4WW

👍︎ 2

💬︎

👤︎ u/Kewlwasabi

📅︎ Jul 05 2021

🚨︎ report

[D] How do people handle hyperparameter optimization?

Broadly speaking, does it make sense, or is it common for folks doing research in other areas of ML (for example computer vision), to be well read on modern hyperparameter optimization research? Or do they typically just use external libraries?

I'm currently working on my first research project, and am curious what the best path to go down is

👍︎ 8

💬︎

👤︎ u/CS_Student95

📅︎ Jul 01 2021

🚨︎ report

Introducing Sklearn-genetic-opt: Hyperparameters tuning using evolutionary algorithms [project]

I recently released a new open-source python library that makes it easy to fine tune scikit-learn models hyperparameters using evolutionary algorithms.

The package is called Sklearn-genetic-opt and provides several optimization algorithms, build in plots to understand the results, custom callbacks to control the iterations and more.

Check the documentation to get started

If you want to know more the details or contribute, you can check the Github repository

Install the package runing:

pip install sklearn-genetic-opt

I hope this can be useful to the general community and that the package keep growing to bring new features, any feedback is very welcome, still a lot of work to do!

👍︎ 77

💬︎

👤︎ u/rodrigo-arenas

📅︎ Jun 02 2021

🚨︎ report

"Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020", Turner et al 2021 arxiv.org/abs/2104.10201

👍︎ 36

💬︎

👤︎ u/gwern

📅︎ Apr 27 2021

🚨︎ report

Package for auto hyperparameters tuning of scikit-learn models

Hi everyone, I want to share with you this open source project that you can use to tune your supervised models from scikit-learn with some cool features.

Docs: https://sklearn-genetic-opt.readthedocs.io/ Repo: https://github.com/rodrigo-arenas/Sklearn-genetic-opt

Sklearn-genetic-opt uses evolutionary algorithms to choose the set of hyperparameters that optimizes (max or min) the cross-validation scores, it can be used for both regression and classification problems.

Currently it has these features:

GASearchCV: Principal class of the package, holds the evolutionary cross validation optimization routine.
Algorithms: Set of different evolutionary algorithms to use as optimization procedure.
Callbacks: Custom evaluation strategies to generate early stopping rules, logging (into TensorBoard, .pkl files, etc) or your custom logic.
Plots: Generate pre-defined plots to understand the optimization process.
MLflow: Build-in integration with mlflow to log all the hyperparameters, cv-scores and the fitted models.

Any feedback, suggestion, contribution or comments are very welcome!

👍︎ 5

💬︎

👤︎ u/rodrigo-arenas

📅︎ Jun 28 2021

🚨︎ report

Is cross-validation hyperparameter tuning sometimes not better than just setting reasonable values?

I have noticed this in practice like in Kaggle comps. Sometimes I have done no hyperparameter tuning by just setting whatever seems reasonable ish in a ballpark or using defaults and it ends up performing better than doing computationally intensive and tuning every little hyperparameter.

In the real world I also wonder whether cross validation for hyperparams can result in being more sensitive to things like data and concept drift. Because well if the future data doesn’t look like your validation set then the CV would have resulted in you overfitting the hyperparameters themselves.

👍︎ 9

💬︎

👤︎ u/ice_shadow

📅︎ May 23 2021

🚨︎ report

"A Generalizable Approach To Learning Optimizers", Almeida et al 2021 {OA} (RNN hyperparameter tuning) arxiv.org/abs/2106.00958

👍︎ 10

💬︎

👤︎ u/gwern

📅︎ Jun 03 2021

🚨︎ report

[D] Heuristics for initializing GP length scale hyperparameters?

I have been training a sparse Gaussian process using the matern-5/2 kernel and I am having trouble getting the objective function to converge, and I think it has to do with my initialization of length scale hyperparameters.

I am not training on actual function observations, but instead on summations of several function observations and derivatives of the function—due to the particular application.

Currently, I am initializing each length scales as the std. dev. over all training data inputs for the corresponding feature, but it doesn’t seem to be working well. Does anyone know of other heuristics?

👍︎ 4

💬︎

👤︎ u/Longjumping_Wrap_827

📅︎ Jun 03 2021

🚨︎ report

Hyperparameter tuning in multiple/sequential slurm jobs?

Hi,

I have question about fair, distributed hyperparameter tuning using slurm on a HPC cluster.

The problem I want to solve:

I need to do hyper parameter tuning which might take a while and will block a big chunk of the available GPUs. Additionally, we have quite strict rules about fair resource usage. Violation might lead to killed jobs and/or bans for a certain time.

Solution I thought of:

I Thought about using solutions like sklearns GridSearchCV (or similar substitutions). Ideally every "training step" would be spawned in a separate Slurm job, with results returned to the "parent" and aggregated in the end.

Question:

Can anybody provide a solution/guidance on how to do this?

👍︎ 2

💬︎

👤︎ u/fXb0XTC3

📅︎ Jun 07 2021

🚨︎ report

Hyperparameter Optimization (HPO) using AutoGluon

Hey Folks - I recently learned about AutoGluon (https://auto.gluon.ai) and was hoping to use it for HPO among other ML tasks! Using their quick guide, I can successfully use their TabularPredictor for my regression problem and get a number of models trained and have access to a number of details, e.g., performance, and hyperparameters used. However, using the same dataset I fail (with somewhat of a cryptic error message) when trying to do HPO, doesn anyone have any experience with HPO using AutoGluon?

👍︎ 2

💬︎

👤︎ u/Nickie_Niaki

📅︎ Jun 15 2021

🚨︎ report

5 ways Vertex Vizier hyperparameter tuning improves ML models cloud.google.com/blog/pro…

👍︎ 2

💬︎

👤︎ u/gcpblogbot

📅︎ Jun 03 2021

🚨︎ report

"Hyperparameter Selection for Imitation Learning", Hussenot et al 2021 {GB} arxiv.org/abs/2105.12034

👍︎ 9

💬︎

👤︎ u/gwern

📅︎ May 26 2021

🚨︎ report

[D] When do you start optimizing hyperparameters when trying out a new idea?

When implementing a new idea, I find it hard to decide how much time I should spend on getting it to work before moving on to the next one. So I would like to know how other people do it, do you optimize hyperparameters always before abandoning a new approach? Or is it more like the last thing to do if you already had some level of success?

👍︎ 6

💬︎

👤︎ u/aledinuso

📅︎ Apr 20 2021

🚨︎ report

[D] What is the best practice regarding hyperparameter tuning for baseline models?

Hey guys, I've developed a topic model that is a PGM so it doesn't have that many hyperparameters, (think something like LDA) so of course I've tuned them but not extensively, just trying different values to get it to converge, no grid search or anything (it works well for a range of hyperparameters anyways). Also I'm using just one set of hyperparameters for all 5 datasets.

I guess the question is how much hyperparameter tuning do I have to perform for the baseline models for a fair comparison? Right now all the baseline models work well after minor adjustments or even with default values, once again same set of hyperparameters for all 5 datasets, all except for the 'neural' topic models (like ProdLDA). In fact ProdLDA performs poorly on all 5 of the datasets I'm testing on (with the hyperparameters from their code), even though it performs well on the dataset that they use in their paper. So I do have a suspicion that the model may have been tuned specifically for that dataset.

How am I supposed to deal with this? Like I suppose I shouldn't give the neural topic models special treatment, but at the same time, do I have to perform grid search on all models? Or just report the really poor results?

👍︎ 70

💬︎

👤︎ u/fuqmebaby

📅︎ Jan 07 2021

🚨︎ report

Hyperparameter tuning in multiple/sequential slurm jobs? /r/learnmachinelearning/c…

👍︎ 2

💬︎

👤︎ u/fXb0XTC3

📅︎ Jun 07 2021

🚨︎ report

PyTorch Hyperparameter optimisation with Ray Tune

Hi,

I get the following error when using tune.run() and I do not know why? Could someone please advise me?

AttributeError                            Traceback (most recent call last)
&lt;ipython-input-145-af69a8390d75&gt; in &lt;module&gt;()
     13 best_trial = result.get_best_trial(metric = "loss", mode="min")
     14 
---&gt; 15 print("Best trial config: {}".format(best_trial.config))
     16 print("Best trial final validation loss: {}".format(
     17     best_trial.last_result["loss"]))

AttributeError: 'NoneType' object has no attribute 'config'

The code I have used is below. The error comes from the tune.run() returning a NoneType object and I have no idea why. I have spent the entire day trying to debug but cannot find whats wrong. My code:

checkpoint_dir = '/content/gdrive/MyDrive/Checkpoint_dir'
data_dir = '/content/gdrive/MyDrive/Data_dir'

epochs=5

def custom_train_part(config, checkpoint_dir=None, data_dir=None):
    model = LSTM(len(True_IMF_df.T), config["Hidden"], config["Layers"], 1)
    
    device = "cpu"
    if torch.cuda.is_available():
        device = "cuda:0"
        if torch.cuda.device_count() &gt; 1:
            model = nn.DataParallel(model)
    model.to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=config["lr"])
    criterion = nn.MSELoss()

    if checkpoint_dir:
        model_state, optimizer_state = torch.load(
            os.path.join(checkpoint_dir, "checkpoint"))
        model.load_state_dict(model_state)
        optimizer.load_state_dict(optimizer_state)
    
    for e in range(epochs):

        running_loss = 0.0
        epoch_steps = 0
        
        model.train()  # put model to training mode
        x = x_hht_train.to(device)
        y = y_hht_train.to(device)

        scores = model(x)
        loss = criterion(scores, y_hht_train)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        #print(f"Running loss: {running_loss}")
        epoch_steps += 1

        if e % 5 == 0:
            print(f'Epoch: {e}, loss = {loss.cpu().item()}')
            # check_accuracy(loader_val, model)
            print()
    
        # Validation loss
        val_los

... keep reading on reddit ➡

👍︎ 2

💬︎

👤︎ u/InvestingScientist

📅︎ May 20 2021

🚨︎ report

The Importance of Hyperparameter Optimization for Model-based Reinforcement Learning – The Berkeley Artificial Intelligence Research Blog bair.berkeley.edu/blog/20…

👍︎ 30

💬︎

👤︎ u/_harias_

📅︎ Apr 20 2021

🚨︎ report

[P] StoRM: Mutation-based hyperparameter tuner

For those struggling to find a decent hyperparameter tuner (NN tuning for example), I have designed a tuner that attempts to remedy a lot of the issues associated with this type of parameter space (nested, categorical, conditional, etc.)

There is a runnable script in the examples folder that demonstrates StoRM's performance compared random tuning. Please feel free to post any feedback and let me know if it is useful for you.

https://github.com/ben-arnao/StoRM

👍︎ 7

💬︎

👤︎ u/Yogi_DMT

📅︎ Apr 19 2021

🚨︎ report

Ways to exactly replicate a random search when hyperparameter tuning?

Hey, I'm trying to optimize the hyperparameters of a multilayer perceptron regressor I've built on Keras using random search, because I don't have the computational power to do an exhaustive grid search. I have a held out validation set that the model's performance is assessed on with every randomly chosen hyperparameter combination. My question is, is there a way to repeat a random search? By that I mean is it possible to perform a second search, in which every hyperparameter combination that was randomly selected in my first random search is retested? The reason I want to do this is I want to change the amount of noise present in the validation set, and see the relative degradations in performance for each of the randomly tested hyperparameter combinations I tested in my random search? Really stuck on this so any help would be greatly, greatly appreciated. Thanks!

👍︎ 2

💬︎

👤︎ u/Snapdown_City

📅︎ Apr 01 2021

🚨︎ report

"Hyperparameter Selection for Imitation Learning", Hussenot et al 2021 {GB} arxiv.org/abs/2105.12034

👍︎ 2

💬︎

👤︎ u/research_mlbot

📅︎ May 26 2021

🚨︎ report

"Hyperparameter Optimisation" is the ultimate cheat code to buy your ML project more time.

I have found that this term impresses non-technical stakeholders and project managers a lot, and since it does improve your trained model performance it's a legit task to add to a project timeline. I've got a random search script that tests a thousand permutations in the background for me so I can parallel work on other tasks.

Only used it on one occasion to buy myself an extra week for a solo project to actually solve some stupidly complex data reconciliation problem that was only allocated half a day by PM.

👍︎ 481

💬︎

👤︎ u/joeycloud

📅︎ Oct 23 2020

🚨︎ report

[D] Hyperparameter tuning with a budget constraint on total number of model parameters

Is this possible to do with any of the hyperparameter frameworks out there? I have a convolutional network with the hypterparameters being kernel size, number of layers, and number of channels, and would like to search through those parameters while placing a limit on the total number of parameters present in the model due to where I'm deploying it. I found this paper but could not find any code implementing their strategy. Has anyone implemented something like this already?

Facebook's Ax allows something similar, but only with linear combinations of parameters which is tough since the number of parameters for a conv layer is superlinear (((m * n * d) + 1) * k). Seems like this should be a common problem, so curious why there isn't more out there on this as well.

👍︎ 3

💬︎

👤︎ u/Daddy_Long_Legs

📅︎ Apr 23 2021

🚨︎ report

There are so many hyperparameters that we consider while training NNs, but each one has a different scale of impact due to their varying sensitivity. Checkout Two Minute Concepts from Robofied to know exactly this!

👍︎ 2

💬︎

👤︎ u/robofied

📅︎ May 16 2021

🚨︎ report

DO more training on a trained RL model (SAC) with different hyperparameters

I am a newbie to the world of deep RL learning. Recently, I have been very heavily working with SAC for training a model I was given. I was curious if there is a way to load a trained model and do more training on it with DIFFERENT hyperparameters. I have been able to successfully load model and do additional training on them but I was curious to know if I could do training with hyperparameters different from the initial model parameters.

This might probably be a very dumb question but forgive me as I am still learning!

👍︎ 2

💬︎

👤︎ u/SpaceExp_NN

📅︎ Apr 05 2021

🚨︎ report

[D] Struggling to Understand the Difference between Parameters and Hyperparameters

https://imgur.com/a/3qAYHil

I find myself a bit confused regarding the topic of "parameters" and "hyperparameters".

A) For example (If I understand correctly), a model like a random forest is a non-parametric model - this is because the random forest model does not make any assumptions about the distribution of the data (source: https://stackoverflow.com/questions/13845816/are-decision-trees-e-g-c4-5-considered-nonparametric-learning) and there are potentially an infinite number of parameters in a random forest (source: https://sebastianraschka.com/faq/docs/parametric_vs_nonparametric.html - just a question: what are the model "parameters" in a random forest model?)

However, a random forest model has hyperparameters such as the "depth of each tree" and the "number of splitting variables". These hyperparameters can be fine tuned using grid search for example - different combinations of these hyperparameters are selected, a random forest model is made using an individual combination of these hyperparameters, and then the accuracy is recorded on the training data. In this picture (https://imgur.com/a/3qAYHil ), could the points in this 3 dimensional graph be considered as a "loss function" (e.g. imagine trying to "interpolate" a 3 dimensional plane over those points)? (for some reason, I don't think so)

B) If I also understand correctly, a neural network can be considered as a parametric model - mainly because a neural network has a fixed number of model parameters (the "weights" of the neural network, the exact number of weights depend on the number of layers and neurons). Examples of hyperparameters in the neural network could be the "learning rate" - a value that is decided by the user. Based on the training data, the learning rate and the number of weights - an optimization algorithm (e.g. gradient descent) tries to determine the optimal values of the weights, resulting in a the lowest error on the training data. The loss function of a neural network has roughly the same number of variables as the number of weights.

Is my understanding correct?

Thanks

👍︎ 6

💬︎

👤︎ u/blueest

📅︎ Mar 29 2021

🚨︎ report

Hyperparameter tuning examples

Hi. I'm a newbie in rl. I'm performing hyperparameter tuning on A2C. I would like to learn some good-style hyperparameter tuning code examples, such as for grid search. Could anyone help give any suggestions or links? Much appreciated!

👍︎ 4

💬︎

👤︎ u/vincent341

📅︎ Apr 05 2021

🚨︎ report

[D] For those in the industry - how much hyperparameter tuning do you typically end up doing?

I've been working in the ML/DL industry for 2.5 years now - purely on the application side, my job is to find the most appropriate model/ensemble of models out there and fine tune them for the application. In most cases though, they work great with the set hyperparameters out of the box and I've never really had to tweak them too much (apart from learning rate schedules and sometimes other model specific hyperparameters) to get my models to do what is expected of them. How common is it for all of you in the industry to tune these for your applications and retrain them to see improvements? I realize this is a subjective question and depends on application so I'd like to get a general sense of where its important and where it isn't. In most cases at my work, the cost of exploring the hyperparameter space and retraining exceeds the improvement in performance we get. I'd really like to expand my knowledge of the scene and know how this is for other parts of the industry.

EDIT: Woah, very overwhelmed by the number of responses from all across the industry. Its great to see the wide range of ways people work in different orgs based on their size/scope. Learnt a lot today, thank you all very much for taking the time to reply!

👍︎ 225

💬︎

👤︎ u/C0hentheBarbarian

📅︎ Dec 22 2020

🚨︎ report

[D] How to find the "best" hyperparameter in Neural network?

It's a simple question. How to find the parameter of the network which will be best. I have implemented some kind of Neural Network, which is used to generate some images. Now I want to find out the parameters to generate even more images. I don't know how to find the "best" parameters. How to do that? Please help me.

👍︎ 10

💬︎

👤︎ u/machinelearningGPT2

📅︎ Jan 08 2021

🚨︎ report

[Q] Which combination of "hyperparameters" to use for regression analysis?

I come primarily from a machine learning background, which I believe is what you'd say as a predictive form of analysis. However, I am now trying to publish some results in a paper about how different variables influence an outcome variable. For this, I believe inferential statistics is the way to go.

My problem setting

I have 107 rows of data with ~15 integer-valued continuous features (counts data), and a floating-point continuous column that I need to predict. I am trying to prove that these features are predictive of the continuous column (I'm not sure if using the word "predictive" is apt here because my objective is to persuade people that there is an underlying, real-world phenomenon that relates these features to the outcome, so like I mentioned, it's probably more inferential).

My doubt

I am using SAS JMP to do my analyses. When I do a "Fit Model" on my data, I am presented with a bunch of different options for modelling (which JMP terms as "Personalities"):

Standard Least Squares
Stepwise
Generalized Regression
Mixed model
Loglinear variance
Manova
Generalized Linear Model

I have a rough idea of what 1, 2, 3, and 7 are about (although I don't know the difference between 3 and 7).

Now, when I choose a specific model, say, Generalized Regression, I am further presented with options for "Estimation Method":

Lasso
Best Subset
Backward elimination
Forward elimination
Standard least squares
... and a few more

Then, there is also the choices for "Validation method":

AICc
BIC
KFold
Leave-One-Out

The reason I am overwhelmed by the number of choices is not because of the sheer number of combinations possible, but also because of the following observation:

When I run Generalized Regression + Standard Least Squares + AICc validation, I find that none of the features have a p-value < 0.05 (or even close). However, the moment I switch to Generalized Regression + Best subset + AICc validation, I suddenly have 3 features whose p-values are < 0.05. This makes me confused as I would have expected a best subset to be not that different from a standard least squares (at least for a small problem having 15 variables and 107 rows).

While I got excited by the low p-value from the Best Subset method, I want to make sure that I am using a standard method of analysis and not blindly going with the one that gives me the lowest p-values.

Am I in the right direction? What would be the "stand

... keep reading on reddit ➡

👍︎ 3

💬︎

👤︎ u/kinkhis

📅︎ Apr 01 2021

🚨︎ report

[P] Onepanel - open source, extensible deep learning platform, now includes an Ubuntu based deep learning desktop, hyperparameter tuning and a Python DSL for defining pipelines and workflows v.redd.it/6g0hzdy3kqw61

👍︎ 43

💬︎

👤︎ u/_rusht

📅︎ May 02 2021

🚨︎ report

[N] Facebook AI Introduces A New Self-Supervised Learning Framework For Model Selection And Hyperparameter Tuning For Large-Scale Forecasting

Researchers at Facebook AI have recently released a new self-supervised learning framework for model selection (SSL-MS) and hyperparameter tuning (SSL-HPT), which provides accurate forecasts with less computational time and resources. The SSL-HPT algorithm estimates hyperparameters 6-20x faster when compared with baseline (search-based) algorithms, producing accurate forecasting results in numerous applications.

At present, Forecasting is one of the significant data science and machine learning tasks performed. Therefore, it is crucial to have fast, reliable, and accurate forecasting results with large amounts of time series data for managing various businesses.

Time series analysis is used to find trends and forecast future values. A slight difference in hyperparameters in this type of analysis could lead to very different forecast results for a given model and have serious consequences. Therefore, it’s essential to select optimal hyperparameter values.

Summary: https://www.marktechpost.com/2021/04/06/facebook-ai-introduces-a-new-self-supervised-learning-framework-for-model-selection-and-hyperparameter-tuning-for-large-scale-forecasting/

Paper: https://arxiv.org/abs/2102.05740?

Facebook Source: https://ai.facebook.com/blog/large-scale-forecasting-self-supervised-learning-framework-for-hyper-parameter-tuning/

👍︎ 42

💬︎

👤︎ u/techsucker

📅︎ Apr 06 2021

🚨︎ report

Looking for Hyperparameters Tuning package with aggressive termination

Do you know any python package for hyperparameters tuning with aggressive termination in the case of totally wrong parameters? It would be nice to natively support CatBoost, LightGBM, Xgboost (I just love GBM for the performance)

👍︎ 2

💬︎

👤︎ u/pp314159

📅︎ Mar 04 2021

🚨︎ report

Where can I find documentation on stylegan2 hyperparameters? If it doesn't really exist, where can I go for information on GAN hyperparameters in general?

I can't find any official (or unofficial) documentation besides the github repository and the academic paper, neither of which are quite as useful as I would like them to be. If anybody can point me in the right direction for either that or just information on tuning GAN hyperparameters in general that would be awesome. Thanks!

👍︎ 2

💬︎

👤︎ u/typical_sasquatch

📅︎ May 15 2021

🚨︎ report

Fastest Hyperparameter Optimization techniques

I'm building a model using SVM and i tried using GridSearch and Randomized Search for my hyperparameter optimization and it takes days to finish (and still haven't finished by now). Is there a faster way besides using Gridsearch and Randomized search or is there a way that can speed up the process ?

👍︎ 3

💬︎

👤︎ u/notcutepotato

📅︎ Mar 24 2021

🚨︎ report

Weird RL with hyperparameter optimizers

https://themerge.substack.com/p/weird-rl-with-hyperparameter-optimizers

Hey all, I had great fun writing this post about hacking together a script using Optuna to optimize tiny policies. If you’ve got time and are interested, please check it out. Hope you enjoy it!

👍︎ 4

💬︎

👤︎ u/jcobp

📅︎ Mar 13 2021

🚨︎ report

Exploring hyperparameter meta-loss landscapes with Jax lukemetz.com/exploring-hy…

👍︎ 3

💬︎

👤︎ u/BatmantoshReturns

📅︎ Mar 23 2021

🚨︎ report