A list of puns related to "Generalized Additive Model"
Hi!
I'm looking for a concise, relatively non-mathy introduction to Generalized Additive Mixed Models, similar in scope, language and difficulty to the GLMM book by Finch et. al. (2019). I understand the basic intuition behind GAMs and how they work, but there are some things I am struggling with, partly due to my insufficient knowledge of higher mathematics, and partly due to the scarcity of annotated examples. I'm looking for something that does a good job of explaining how splines are constructed from basis functions, clarifies the differences between coding interactions via two-dimensional splines and tensor products (beyond one being isotropic), and clearly explain how random effects work and are interpreted in a GAM context. Preferably, I would like step-by-step examples with explanations on how to interpret the results.
Thank your for your recommendations!
Iβm looking at implementing Generalized Additive Models to work as speedily as possible (the entire end-to-end process), so started looking at using C#βs ML.NET.
I havenβt used C# since ~2014 so reading the code is a bit difficult, but itβs part of the FastTree library and is clearly a tree-based implementation. I tested it with a simple y ~ sin(x)
model and it was dreadful (the FastForest regressor is much better).
Does anyone have any insight on whatβs being used here, or references on the subject? Iβve used GAMs in R and Python and never seen a non-spline-based implementation before.
I know this is a very specialized question, but I'll give some context. I have been notified by my graduate committee that I must pin down the topics I want to focus my thesis on within the coming month or so. I thought I'd focus on topics I thought were interesting in my ML course, but we hadn't really gone into much detail on them beyond a high-level overview.
For context on my background, I am somewhere between beginner and intermediate in ML, and focused mostly on health-related research (most of my research has been math-based, but my thesis must be mostly tied to an actual development project versus pure math).
Any resources or recommendations would be useful! Thanks in advance!
A paper I read used 'exponential kernel regression' to model the impact of value estimates from a reinforcement learning model on observed choice behavior. I am not sure what the 'exponential' part of the kernel regression even means, and frankly, the internet hasn't provided really any information on that specific combination of words, but I I understand that kernel regression is a form of non-linear non-parametric regression. However, I know you can also use generalized additive models for non-linear regression, as well as polynomials and spline.
I think I understand that the shortcomings of spline include you have to define the knots and where they are, whereas polynomials you have to define the quadratic terms and such. But when do you use kernel vs. generalized additive models for nonlinear regression? Under what conditions is one better or the other more well suited?
Has deep learning made GAMs less relevant? They both seem like attractive ways to fit non-linear data, but I don't hear much about GAMs at all.
I'm having trouble determining whether modelling my data would be best suited for a generalized linear model or a generalized additive model...I am relatively new to all this so please be easy on me. I am learning!
My data is measured over April-October over a 10 year period, in five different locations. I am interested in modelling the relationship of bacteria abundance (so count data from my understanding). I'm not as interested in the different locations so I plan to treat this as my random effect. I plan to treat month as a fixed effect (I am interested in the seasonal trends in variation). When I did a basic plot the data has a super funky shape so I imagine a generalized additive model will provide more flexilibity for this.
Am I correct in thinking generalized additive models will be best? Also does anyone have any recommended R packages to do this? I have been having trouble finding one that i like. Any other tips and tricks would be great!!
I'm trying to understand the basics of GAMs. Wood's book "Generalized Additive Models: an introduction with R" (1st edition) introduces GAMs via a cubic spline basis {b_j (x)} (see p. 122), where b_1(x)=1, b_2(x)=x and
b_{i+2} (x) is defined via a certain function R(x,z). But there's definitely an error in the definition of R(x,z).
Could anyone suggest another simple but mathematically rigorous (and relatively comprehensive) introduction to GAMs?
Thanks in advance.
I am working on a paper that involves fitting a generalized additive mixed model to data that describes how far people travel on a daily basis. The main relationship I want to estimate is that between age and distance traveled per day. I expected there to be a non-linear relationship between these variables, so have been using a GAMM for this analysis (using the 'brms' and 'mgcv' R packages). I include home town and the identity of the person as random effects. When I describe how I've specified this model, I have been using text like this:
> "Models for the relationship between age and distance traveled were developed separately for each sex, and each model included age as a predictor variable and a varying intercept term for individual and home town. Both the mean and the variance of daily distance traveled were specified to vary as a function of age, individual, and home town. Smoothing functions were estimated for the age variable using penalized thin plate regression".
I am not feeling comfortable about this text because I don't know if I should call raw age a 'predictor variable' or whether, more formally, the actual predictor variable is the output of the smoothing function applied to age. I want to come up with a readable way of describing the way this model is specified -- and if anyone has examples or advice I'd love to hear it.
Thanks!
tl;dr: There are a variety of BART models for different kinds of data, this article reviews a bunch of them.
abstract: "Bayesian additive regression trees (BART) is a flexible prediction model/machine learning approach that has gained widespread popularity in recent years. As BART becomes more mainstream, there is an increased need for a paper that walks readers through the details of BART, from what it is to why it works. This tutorial is aimed at providing such a resource. In addition to explaining the different components of BART using simple examples, we also discuss a framework, the General BART model, that unifies some of the recent BART extensions, including semiparametric models, correlated outcomes, statistical matching problems in surveys, and models with weaker distributional assumptions. By showing how these models fit into a single framework, we hope to demonstrate a simple way of applying BART to research problems that go beyond the original independent continuous or binary outcomes framework. "
paper: https://onlinelibrary.wiley.com/doi/full/10.1002/sim.8347
arxiv'd: https://arxiv.org/abs/1901.07504
Hello!
I have a repeated measure study with a continuous and two categorical predictors (time for repeated and group for control/experiment group. I checked the assumptions, and realized that there may be some problems to go with linear mixed-effects and decided to switch to generalized linear mixed effects models.
However, I cannot decide which family I should choose for the models in R. I have two models for two different continuous predictors. By the way, I checked the assumptions overall, not based on groups which I hope it is a better practice.
Model 1 = https://ibb.co/VLwTNq7 Model 2 = https://ibb.co/ZNFT8Hs
Just dependent: https://ibb.co/sWz44FW and its histogram: https://ibb.co/5n9mDf2
Which family should I choose?
Thanks in advance!
I made a custom statistical model, for a specific use, but now struggle to explain it. I managed to make it work with SHAP, but it doesn't explain non-additive models well. So, is there any known solutions like SHAP but for non-additive models?
I recently was having a debate with a Data Scientist (with little statistical training) about GLMs. He believes that GLMs (such as logistic regression) are linear. I have some statistical training and as far I have heard many of my peers don't consider GLMs to be linear.
I started probing further to substantiate my claim and came across these quotes in an Quora answer.
Link to the post -> https://www.quora.com/Why-is-logistic-regression-considered-a-linear-model
> For the benefit of others, this answer is at odds with what statisticians have meant by "linear model" ever since the term "generalized linear model" was introduced. The answer a statistician would give to this question is "logistic regression *is not* a linear model. "A statistician calls a model "linear" if the mean of the response is a linear function of the parameter, and this is clearly violated for logistic regression. Logistic regression is a *generalized linear model*. Generalized linear models are, despite their name, not generally considered linear models. They have a linear component, but the model itself is nonlinear due to the nonlinearity introduced by the link function.
I think this group has a significant number of statisticians. Hence I wanted to ask you, Do you guys consider GLMs to be linear ? Do you agree with the quoted text above ?
I am estimating logit models with more than a few variables, and would like to neatly show average partial effects (APEs) for the model in this way:
Basically, show a table like the one that the stargazer
command would produce for any kind of lm
or glm
object, but with APEs instead of slope coefficients and their standard errors rather than the ones for the slope coefficients.
My code goes something like this:
# Estimate the models
fit1<-glm(ctol ~ y16 + polscore + age,
data = df46,
family = quasibinomial(link = 'logit'))
fit2<-glm(ctol ~ y16*polscore + age,
data = df46,
family = quasibinomial(link = 'probit'))
fit3<-glm(ctol ~ y16 + polscore + age + ed,
data = df46,
family = quasibinomial(link = 'logit'))
# Calculate marginal effects
me_fit1<-margins_summary(fit1)
me_fit2<-margins_summary(fit2)
me_fit3<-margins_summary(fit3)
The output of a margins_summary
object, while itself a data.frame
object, cannot just be passed to stargazer
to produce the nice looking output it would do with a glm
object, like fit1
in my code before.
> me_fit1
factor AME SE z p lower upper
age -0.0031 0.0005 -5.8426 0.0000 -0.0041 -0.0020
polscore 0.0033 0.0031 1.0646 0.2871 -0.0028 0.0093
y16 0.1184 0.0166 7.1271 0.0000 0.0859 0.1510
Trying to pass me_fit1
to stargazer
simply prints the data.frame
summary stats, as stargazer would normally do with objects of this type.
> stargazer(me_fit1, type = 'text')
=========================================================
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
---------------------------------------------------------
AME 3 0.040 0.068 -0.003 0.0001 0.061 0.118
SE 3 0.007 0.009 0.001 0.002 0.010 0.017
z 3 0.783 6.489 -5.843 -2.389 4.096 7.127
p 3 0.096 0.166 0 0 0.1 0
lower 3 0.026 0.052 -0.004 -0.003 0.042 0.086
upper 3 0.053 0.085 -0.002 0.004 0.080 0.151
---------------------------------------------------------
I've tried using the coef
and se
options from stargazer
to change the coefficients presented of stargazer(fit1)
to APEs and their errors. While its simple to show APEs, trying to show their standard errors is problematic because it canno
If you work a lot with GLMs, and you need a good review + explanation, I really like these two resources. They have a focus on application, but the second one will also walk you through some of the math of how these models work:
The UCLA IDRE data analysis examples page. When I was learning R + GLMs, this page was a lifesaver. I'd copy code from here to teach myself the workflow for various models: https://stats.idre.ucla.edu/other/dae/
Generalized Linear Models at Princeton's Woodrow Wilson School. A nice discussion of the math of various GLM examples (including survival and discrete choice models), and code in R and STATA to use them: https://data.princeton.edu/wws509/notes
A paper I read used 'exponential kernel regression' to model the impact of value estimates from a reinforcement learning model on observed choice behavior. I am not sure what the 'exponential' part of the kernel regression even means, and frankly, the internet hasn't provided really any information on that specific combination of words, but I I understand that kernel regression is a form of non-linear non-parametric regression. However, I know you can also use generalized additive models for non-linear regression, as well as polynomials and spline.
I think I understand that the shortcomings of spline include you have to define the knots and where they are, whereas polynomials you have to define the quadratic terms and such. But when do you use kernel vs. generalized additive models for nonlinear regression? Under what conditions is one better or the other more well suited?
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.