A list of puns related to "Hyperparameter"
Hi,
I'm relatively new to the field. I'm self-taught (like most) and have a pretty scrappy approach to logging and documenting. I want to have a more organised approach.
I'm looking for tips, resources, or a discussion on what methods are available to be more systematic with the way I document my ML training and hyperparameter optimisation.
For example, when training a model for a specific project, do you save each hyperparameter configuration and output in a .json? Or should I use a .txt? Is one of these formats more suitable for hundreds of separate fittings?
How do you save your learning curves? Also a .json file?
Is there any reason to save each model you train?
Thanks in advance!
Edit: I use sklearn mostly. Is there a universally accepted way to log or is it package dependent?
How important are the hyperparameters for DQN? I'm trying to implement a DQN based on the Nature paper with the game Pong but since I don't have much computation power, I'm only training on 200,000 frames with all parameters scaled down. After training (without target network), my policy converged to only one action and the Q values were all negative. Not sure if this is due to a bug in my code or because of my hyperparameters.
Edit: https://colab.research.google.com/drive/1HKZmpVZApc7US4pkHC81X9juO5nWBwe7?usp=sharing
Some results: https://imgur.com/a/6USc4WW
Broadly speaking, does it make sense, or is it common for folks doing research in other areas of ML (for example computer vision), to be well read on modern hyperparameter optimization research? Or do they typically just use external libraries?
I'm currently working on my first research project, and am curious what the best path to go down is
I recently released a new open-source python library that makes it easy to fine tune scikit-learn models hyperparameters using evolutionary algorithms.
The package is called Sklearn-genetic-opt and provides several optimization algorithms, build in plots to understand the results, custom callbacks to control the iterations and more.
Check the documentation to get started
If you want to know more the details or contribute, you can check the Github repository
Install the package runing:
pip install sklearn-genetic-opt
I hope this can be useful to the general community and that the package keep growing to bring new features, any feedback is very welcome, still a lot of work to do!
Hi everyone, I want to share with you this open source project that you can use to tune your supervised models from scikit-learn with some cool features.
Docs: https://sklearn-genetic-opt.readthedocs.io/ Repo: https://github.com/rodrigo-arenas/Sklearn-genetic-opt
Sklearn-genetic-opt uses evolutionary algorithms to choose the set of hyperparameters that optimizes (max or min) the cross-validation scores, it can be used for both regression and classification problems.
Currently it has these features:
Any feedback, suggestion, contribution or comments are very welcome!
I have noticed this in practice like in Kaggle comps. Sometimes I have done no hyperparameter tuning by just setting whatever seems reasonable ish in a ballpark or using defaults and it ends up performing better than doing computationally intensive and tuning every little hyperparameter.
In the real world I also wonder whether cross validation for hyperparams can result in being more sensitive to things like data and concept drift. Because well if the future data doesnβt look like your validation set then the CV would have resulted in you overfitting the hyperparameters themselves.
I have been training a sparse Gaussian process using the matern-5/2 kernel and I am having trouble getting the objective function to converge, and I think it has to do with my initialization of length scale hyperparameters.
I am not training on actual function observations, but instead on summations of several function observations and derivatives of the functionβdue to the particular application.
Currently, I am initializing each length scales as the std. dev. over all training data inputs for the corresponding feature, but it doesnβt seem to be working well. Does anyone know of other heuristics?
Hi,
I have question about fair, distributed hyperparameter tuning using slurm on a HPC cluster.
The problem I want to solve:
I need to do hyper parameter tuning which might take a while and will block a big chunk of the available GPUs. Additionally, we have quite strict rules about fair resource usage. Violation might lead to killed jobs and/or bans for a certain time.
Solution I thought of:
I Thought about using solutions like sklearns GridSearchCV (or similar substitutions). Ideally every "training step" would be spawned in a separate Slurm job, with results returned to the "parent" and aggregated in the end.
Question:
Can anybody provide a solution/guidance on how to do this?
Hey Folks - I recently learned about AutoGluon (https://auto.gluon.ai) and was hoping to use it for HPO among other ML tasks! Using their quick guide, I can successfully use their TabularPredictor for my regression problem and get a number of models trained and have access to a number of details, e.g., performance, and hyperparameters used. However, using the same dataset I fail (with somewhat of a cryptic error message) when trying to do HPO, doesn anyone have any experience with HPO using AutoGluon?
When implementing a new idea, I find it hard to decide how much time I should spend on getting it to work before moving on to the next one. So I would like to know how other people do it, do you optimize hyperparameters always before abandoning a new approach? Or is it more like the last thing to do if you already had some level of success?
Hey guys, I've developed a topic model that is a PGM so it doesn't have that many hyperparameters, (think something like LDA) so of course I've tuned them but not extensively, just trying different values to get it to converge, no grid search or anything (it works well for a range of hyperparameters anyways). Also I'm using just one set of hyperparameters for all 5 datasets.
I guess the question is how much hyperparameter tuning do I have to perform for the baseline models for a fair comparison? Right now all the baseline models work well after minor adjustments or even with default values, once again same set of hyperparameters for all 5 datasets, all except for the 'neural' topic models (like ProdLDA). In fact ProdLDA performs poorly on all 5 of the datasets I'm testing on (with the hyperparameters from their code), even though it performs well on the dataset that they use in their paper. So I do have a suspicion that the model may have been tuned specifically for that dataset.
How am I supposed to deal with this? Like I suppose I shouldn't give the neural topic models special treatment, but at the same time, do I have to perform grid search on all models? Or just report the really poor results?
Hi,
I get the following error when using tune.run
()
and I do not know why? Could someone please advise me?
AttributeError Traceback (most recent call last)
<ipython-input-145-af69a8390d75> in <module>()
13 best_trial = result.get_best_trial(metric = "loss", mode="min")
14
---> 15 print("Best trial config: {}".format(best_trial.config))
16 print("Best trial final validation loss: {}".format(
17 best_trial.last_result["loss"]))
AttributeError: 'NoneType' object has no attribute 'config'
The code I have used is below. The error comes from the tune.run() returning a NoneType
object and I have no idea why. I have spent the entire day trying to debug but cannot find whats wrong. My code:
checkpoint_dir = '/content/gdrive/MyDrive/Checkpoint_dir'
data_dir = '/content/gdrive/MyDrive/Data_dir'
epochs=5
def custom_train_part(config, checkpoint_dir=None, data_dir=None):
model = LSTM(len(True_IMF_df.T), config["Hidden"], config["Layers"], 1)
device = "cpu"
if torch.cuda.is_available():
device = "cuda:0"
if torch.cuda.device_count() > 1:
model = nn.DataParallel(model)
model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=config["lr"])
criterion = nn.MSELoss()
if checkpoint_dir:
model_state, optimizer_state = torch.load(
os.path.join(checkpoint_dir, "checkpoint"))
model.load_state_dict(model_state)
optimizer.load_state_dict(optimizer_state)
for e in range(epochs):
running_loss = 0.0
epoch_steps = 0
model.train() # put model to training mode
x = x_hht_train.to(device)
y = y_hht_train.to(device)
scores = model(x)
loss = criterion(scores, y_hht_train)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss += loss.item()
#print(f"Running loss: {running_loss}")
epoch_steps += 1
if e % 5 == 0:
print(f'Epoch: {e}, loss = {loss.cpu().item()}')
# check_accuracy(loader_val, model)
print()
# Validation loss
val_los
... keep reading on reddit β‘For those struggling to find a decent hyperparameter tuner (NN tuning for example), I have designed a tuner that attempts to remedy a lot of the issues associated with this type of parameter space (nested, categorical, conditional, etc.)
There is a runnable script in the examples folder that demonstrates StoRM's performance compared random tuning. Please feel free to post any feedback and let me know if it is useful for you.
Hey, I'm trying to optimize the hyperparameters of a multilayer perceptron regressor I've built on Keras using random search, because I don't have the computational power to do an exhaustive grid search. I have a held out validation set that the model's performance is assessed on with every randomly chosen hyperparameter combination. My question is, is there a way to repeat a random search? By that I mean is it possible to perform a second search, in which every hyperparameter combination that was randomly selected in my first random search is retested? The reason I want to do this is I want to change the amount of noise present in the validation set, and see the relative degradations in performance for each of the randomly tested hyperparameter combinations I tested in my random search? Really stuck on this so any help would be greatly, greatly appreciated. Thanks!
I have found that this term impresses non-technical stakeholders and project managers a lot, and since it does improve your trained model performance it's a legit task to add to a project timeline. I've got a random search script that tests a thousand permutations in the background for me so I can parallel work on other tasks.
Only used it on one occasion to buy myself an extra week for a solo project to actually solve some stupidly complex data reconciliation problem that was only allocated half a day by PM.
Is this possible to do with any of the hyperparameter frameworks out there? I have a convolutional network with the hypterparameters being kernel size, number of layers, and number of channels, and would like to search through those parameters while placing a limit on the total number of parameters present in the model due to where I'm deploying it. I found this paper but could not find any code implementing their strategy. Has anyone implemented something like this already?
Facebook's Ax allows something similar, but only with linear combinations of parameters which is tough since the number of parameters for a conv layer is superlinear (((m * n * d) + 1) * k). Seems like this should be a common problem, so curious why there isn't more out there on this as well.
I am a newbie to the world of deep RL learning. Recently, I have been very heavily working with SAC for training a model I was given. I was curious if there is a way to load a trained model and do more training on it with DIFFERENT hyperparameters. I have been able to successfully load model and do additional training on them but I was curious to know if I could do training with hyperparameters different from the initial model parameters.
This might probably be a very dumb question but forgive me as I am still learning!
I find myself a bit confused regarding the topic of "parameters" and "hyperparameters".
A) For example (If I understand correctly), a model like a random forest is a non-parametric model - this is because the random forest model does not make any assumptions about the distribution of the data (source: https://stackoverflow.com/questions/13845816/are-decision-trees-e-g-c4-5-considered-nonparametric-learning) and there are potentially an infinite number of parameters in a random forest (source: https://sebastianraschka.com/faq/docs/parametric_vs_nonparametric.html - just a question: what are the model "parameters" in a random forest model?)
However, a random forest model has hyperparameters such as the "depth of each tree" and the "number of splitting variables". These hyperparameters can be fine tuned using grid search for example - different combinations of these hyperparameters are selected, a random forest model is made using an individual combination of these hyperparameters, and then the accuracy is recorded on the training data. In this picture (https://imgur.com/a/3qAYHil ), could the points in this 3 dimensional graph be considered as a "loss function" (e.g. imagine trying to "interpolate" a 3 dimensional plane over those points)? (for some reason, I don't think so)
B) If I also understand correctly, a neural network can be considered as a parametric model - mainly because a neural network has a fixed number of model parameters (the "weights" of the neural network, the exact number of weights depend on the number of layers and neurons). Examples of hyperparameters in the neural network could be the "learning rate" - a value that is decided by the user. Based on the training data, the learning rate and the number of weights - an optimization algorithm (e.g. gradient descent) tries to determine the optimal values of the weights, resulting in a the lowest error on the training data. The loss function of a neural network has roughly the same number of variables as the number of weights.
Is my understanding correct?
Thanks
Hi. I'm a newbie in rl. I'm performing hyperparameter tuning on A2C. I would like to learn some good-style hyperparameter tuning code examples, such as for grid search. Could anyone help give any suggestions or links? Much appreciated!
I've been working in the ML/DL industry for 2.5 years now - purely on the application side, my job is to find the most appropriate model/ensemble of models out there and fine tune them for the application. In most cases though, they work great with the set hyperparameters out of the box and I've never really had to tweak them too much (apart from learning rate schedules and sometimes other model specific hyperparameters) to get my models to do what is expected of them. How common is it for all of you in the industry to tune these for your applications and retrain them to see improvements? I realize this is a subjective question and depends on application so I'd like to get a general sense of where its important and where it isn't. In most cases at my work, the cost of exploring the hyperparameter space and retraining exceeds the improvement in performance we get. I'd really like to expand my knowledge of the scene and know how this is for other parts of the industry.
EDIT: Woah, very overwhelmed by the number of responses from all across the industry. Its great to see the wide range of ways people work in different orgs based on their size/scope. Learnt a lot today, thank you all very much for taking the time to reply!
It's a simple question. How to find the parameter of the network which will be best. I have implemented some kind of Neural Network, which is used to generate some images. Now I want to find out the parameters to generate even more images. I don't know how to find the "best" parameters. How to do that? Please help me.
β
β
β
β
I come primarily from a machine learning background, which I believe is what you'd say as a predictive form of analysis. However, I am now trying to publish some results in a paper about how different variables influence an outcome variable. For this, I believe inferential statistics is the way to go.
My problem setting
I have 107 rows of data with ~15 integer-valued continuous features (counts data), and a floating-point continuous column that I need to predict. I am trying to prove that these features are predictive of the continuous column (I'm not sure if using the word "predictive" is apt here because my objective is to persuade people that there is an underlying, real-world phenomenon that relates these features to the outcome, so like I mentioned, it's probably more inferential).
My doubt
I am using SAS JMP to do my analyses. When I do a "Fit Model" on my data, I am presented with a bunch of different options for modelling (which JMP terms as "Personalities"):
I have a rough idea of what 1, 2, 3, and 7 are about (although I don't know the difference between 3 and 7).
Now, when I choose a specific model, say, Generalized Regression, I am further presented with options for "Estimation Method":
Then, there is also the choices for "Validation method":
The reason I am overwhelmed by the number of choices is not because of the sheer number of combinations possible, but also because of the following observation:
When I run Generalized Regression + Standard Least Squares + AICc validation, I find that none of the features have a p-value < 0.05 (or even close). However, the moment I switch to Generalized Regression + Best subset + AICc validation, I suddenly have 3 features whose p-values are < 0.05. This makes me confused as I would have expected a best subset to be not that different from a standard least squares (at least for a small problem having 15 variables and 107 rows).
While I got excited by the low p-value from the Best Subset method, I want to make sure that I am using a standard method of analysis and not blindly going with the one that gives me the lowest p-values.
Am I in the right direction? What would be the "stand
... keep reading on reddit β‘Researchers at Facebook AI have recently released a new self-supervised learning framework for model selection (SSL-MS) and hyperparameter tuning (SSL-HPT), which provides accurate forecasts with less computational time and resources. The SSL-HPT algorithm estimates hyperparameters 6-20x faster when compared with baseline (search-based) algorithms, producing accurate forecasting results in numerous applications.
At present, Forecasting is one of the significant data science and machine learning tasks performed. Therefore, it is crucial to have fast, reliable, and accurate forecasting results with large amounts of time series data for managing various businesses.
Time series analysis is used to find trends and forecast future values. A slight difference in hyperparameters in this type of analysis could lead to very different forecast results for a given model and have serious consequences. Therefore, itβs essential to select optimal hyperparameter values.
Paper: https://arxiv.org/abs/2102.05740?
Facebook Source: https://ai.facebook.com/blog/large-scale-forecasting-self-supervised-learning-framework-for-hyper-parameter-tuning/
Do you know any python package for hyperparameters tuning with aggressive termination in the case of totally wrong parameters? It would be nice to natively support CatBoost, LightGBM, Xgboost (I just love GBM for the performance)
I can't find any official (or unofficial) documentation besides the github repository and the academic paper, neither of which are quite as useful as I would like them to be. If anybody can point me in the right direction for either that or just information on tuning GAN hyperparameters in general that would be awesome. Thanks!
I'm building a model using SVM and i tried using GridSearch and Randomized Search for my hyperparameter optimization and it takes days to finish (and still haven't finished by now). Is there a faster way besides using Gridsearch and Randomized search or is there a way that can speed up the process ?
https://themerge.substack.com/p/weird-rl-with-hyperparameter-optimizers
Hey all, I had great fun writing this post about hacking together a script using Optuna to optimize tiny policies. If youβve got time and are interested, please check it out. Hope you enjoy it!
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.