A list of puns related to "Multiclass classification"
Essentially, I have textual data that was originally coded to be multiclass, each instance in one of K=8 topics. What I have found is that there are sometimes "errors" in the sense that an instance repeats or has near-duplicates several times in the data, but is sometimes coded differently. "Errors" is in quotes because often, these discrepancies are "understandable," i.e. a document that discusses Hockey also happens to discuss Soccer, but there is no general "Sports" label, and so humans will rightly disagree on whether Hockey or Soccer are the dominant topics in that document.
To remedy this for future data, I wanted to train the model on this multiclass system, but have it output predictions in a multilabel fashion. Of course, the model can only be validated internally on its performance on the multiclass system; out-of-sample, though, I want to be able to print out both the multiclass label it decides on (if it's forced to) and whether it thinks multiple labels might be present.
The simplest way I thought to do this was to allow the model's uncertainty to reflect the potentially multi-label nature of an instance. In other words, in its multiclass prediction, it might assign Soccer Pr=0.45 and Hockey Pr=0.40, and all of the other 6 topics are squished in the remaining 0.15 from the softmax function. Normally we would pick the max(confidence) prediction to be the predicted label. Could I simply reduce the rule and say "if Pr >0.30, give it that label"? By pigeonhole this would mean at most documents have 3 topics. Is there a more robust way to do this / is it possible? Sorry if the question doesn't make sense.
As the title states. If I have a multiclass classification task with 5 classes, total instances for them being [1000,1000,500,250,100], without any special care taken to address class imbalance, most methods would be far more concerned with learning about the larger classes, and might miss for example the 5th class entirely.
To address this, my understanding is there are two main favored approaches. The first is to simply duplicate the undersampled (training set) instances until the training set is perfectly balanced, e.g. if the above example is my training set, I'd duplicate instances until I have [1000,1000,1000,1000,1000]. I'd then train like usual, and validate using the usual validation set, taking care that these instances are are not twins from the training data.
The second approach is to augment the loss function such that the model is penalized relatively more for predictions away from ground-truth for the undersampled classes, e.g. in PyTorch's nn.CrossEntropyLoss, I'd add weight=[0.1,0.1,0.2,0.4,1]
such that being correct about 1 instance of the 100-instance class is as valuable as being correct about 10 instances of the 1,000-instance class.
The main thing I'm getting from various sources is that in practice, these two approaches arrive at similar results. For already-large models that demand a lot of compute, if that's the case, would it not be more time-efficient to simply augment the loss function? Oversampling for the above example is nearly doubling the size of the training set.
On the other hand, I'm also seeing that the risk in augmenting the loss function is if the batch size is small enough, it's still possible the model will not see enough examples of the undersampled classes to learn about them in a consistent way, even with them receiving a higher priority.
I guess my question is what are the concrete differences between them, and what works best in practice (in general, or task-specific)? Is there some other approach to handling class imbalance that's far better and I'm just missing it? Thanks for any insight you can offer!
Suppose there's a set of (multiclass) labeled documents, say 10 classes, and the team found intercoder reliability to be about 90%. I could account for this directly in PyTorch's nn.CrossEntropyLoss by defining label_smooth=0.1
to give the model soft rather than hard targets.
Beyond this though, what kind of performance should I interpret as "good enough"? Right now, with that stated intercoder reliability, my model is at approximately 89.8% accuracy / 90% F1 score. There are some examples of duplicate text entries in the training data that are coded differently, i.e. conditioned on two documents having the same input text, they get coded as different labels roughly 5.3% of the time.
Should I just expect that 90% is about as good as it gets, since the labels themselves slightly weaker than would be the case in "pure" supervised learning?
I have a corpus of 30 years of labeled articles and am fitting a neural network to predict those labels by reading the instances' texts. To account for domain shift, I want to in some way tell the NN "yes, that article written 30 years ago still has some useful information, but it's far more important to understand the stuff written last year." I'm not sure how to do it with PyTorch's nn.CrossEntropyLoss function, but:
I have multiclass classifier but having low precision/recall/accuracy on only one class. Network is shallow - baseline model. I just wanna know what would be the cause (if this cant be verified..then nvm) and what can I do to improve it?
Hi - I have a multi-class problem with 168 classes - we are trying to predict the most likely class. We originally used XGBoost and got good results from an accuracy standpoint, but the model takes a while to run. We tried to use LGBM instead because it is faster than XGBoost in our experience. However, we found that while each round took less time, the error only ever increased. We tried using the `multi_logloss` objective as well as the `multi_error` objective and had a similar experience with both. We used a very similar set of parameters for the XGB and LGBM models.
Why might XGBoost always improve each round but LGBM always gets worse each round?
Hi Machine Learning Enthusiasts π I'd be super grateful if you could help me (a beginner) out on a few questions to get over my decision paralysis. I am currently in the planning/research phase of an App-Project (fluttr) in which I want to utilize IMUs (Inertial Measurement Units, "Movement Sensors") to analyze movements of athletes (Kickboxers) and detect what techniques they perform. In order to achieve this I want to record samples of all Technqiues performed by a number of different athletes and train a Model using that data, my use case thus seems to fall under 'Multiclass Classification using Supervised Learning/Labelled Data' (right?). I know some basic principles of Machine Learning but haven't worked with it yet so I started out by reading a number of research papers about IMUs + Machine Learning to find out what Techniques/Models they employed. From that I was able to extract out the following steps:
IMU Sensor Fusion using Kalman or Complimentary Filter (?)
Some kind of Feature-Extraction
Sliding Scanning Window to analyze continuous incoming data (I am thinking 250ms-500ms since techniques can be very fast)
Dynamic Time Warping in order to detect Techniques performed at varying speeds
Detection/Classification of Techniques using Machine Learning Model
Here are my questions:
Feature Extraction: Since I'm using 4 IMUs with 9 degrees of Freedom each at a Frequency of probably 50Hz, feeding the full dataset into the model is probably not a good idea. I was confronted with a variety of approaches here, going from simply deciding to use features like Accelerometer/Gyroscope min/max/mean/range/std, to using Principal Component Analysis (PCA) for Dimension Reduction & Linear Discriminant Analysis (LDA) for Feature Extraction, to even something fancy like using the full raw IMU-data to form a signal image which is then fed into a Convolutional Neural Network to extract a feature vector. What would be the advisable approach to take here?
ML Model: I am stuck on deciding whether to use a Support Vector Machine (SVM) or Neural Net (NN). I am slightly leaning towards SVM as this seems to be the recommended Model for Classification Problems, it can also be trained very fast and seems to guarantee to converge on optimal accuracy. On the other hand, it doesnt natively support Multiclass Classification but achieves this by breaking the problem down into several binary Classification problems, I dont know if this is a big drawback or not though. Ne
... keep reading on reddit β‘I'm working on a binary classification task. I set the model output as softmax with 2 classes rather than sigmoid with 1 (softmax with 2 I think is equivalent to sigmoid?). Either way, this question remains:
The PyTorch Dataset that I have has items in __getitem__ that look like this:
{'input_ids': tensor([101, ..., 102]),
'attention_mask': tensor([1, ..., 1]),
'target': tensor(1)}
The target can equal 0 or 1, and input_ids
and attention_mask
both follow directly from a BertTokenizer
object. The types and shapes of these items are:
item['input_ids'].shape # = torch.Size([512])
item['attention_mask'].shape # = torch.Size([512])
item['target'].shape # = torch.Size([])
I use a PyTorch LightningModule to house the BERT model; the forward pass sends in batches of items from the Dataset, as prepared in the training_step
method of the LightningModule:
def training_step(self, batch, batch_idx):
input_ids = batch["input_ids"]
attention_mask = batch["attention_mask"]
target = batch["target"]
loss, outputs = self(input_ids, attention_mask, target)
self.log("train_loss", loss, prog_bar=True, logger=True)
return {"loss": loss, "predictions": outputs, "target": target}
The forward
method is just a simple linear layer atop a base BertModel
. To illustrate my point, I've included three print() statements that aren't usually there:
def forward(self, input_ids, attention_mask, leadership, republican_governor):
print(input_ids, input_ids.shape)
print(attention_mask, attention_mask.shape)
print(leadership, leadership.shape)
bert_output = self.bert(input_ids, attention_mask=attention_mask)
bert_output = self.text_1(bert_output.pooler_output) # nn.Linear(768,512)
bert_output = self.bert_1_batchnorm(bert_output)
bert_output= self.relu_activation(bert_output)
bert_output = self.dropout(bert_output)
bert_output = self.text_2(bert_output) # nn.Linear(512, n_classes=2)
output = bert_output.squeeze(1)
loss = 0
if targetis not None:
loss = self.criterion(output, target)
output = torch.softmax(output, dim=1)
return loss, output
What I see from these print statements is... confusing:
print(input_ids, input_ids.shape)
# returns tensor([[101, ..., 0], ..., [101, ..., 102]]) and torch.Size([16,512])
print(attention_mask, attention_mas
... keep reading on reddit β‘Can I please get some multiclass classification ideas from you guys? Thankyou!
Dear Members,
As I am not very comfortable with the backend functions of Keras, I would like to know if the block of code indicated below for calculating precision, recall and F1-score (and which can be found here and there in various threads) can be used as is for the case of multiclass classification.
I thank you in advance for your help.
def recall_m(y_true, y_pred):
y_true = K.ones_like(y_true)
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
all_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (all_positives + K.epsilon())
return recall
def precision_m(y_true, y_pred):
y_true = K.ones_like(y_true)
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
def f1_score(y_true, y_pred):
precision = precision_m(y_true, y_pred)
recall = recall_m(y_true, y_pred)
return 2*((precision*recall)/(precision+recall+K.epsilon()))
Can it be mathematically shown that binary classification is "easier" compared to multiclass classification and regression?
The way I see it: a binary classification problem is like a true/false exam but a multiclass classification problem is like a multiple choice exam. We all know that guessing a true/false exam is easier than a multiple choice exam. The same way, can we say that binary classification is inherently harder than multiclass classification? Are multiclass problems harder for statistical models/ml algorithms?
The same way, when you have regression and your performance metric is MSE, "guessing" will punish your model even more. (E.g. on a math test where you actually have to calculate the answer, there is a much smaller ratio of correct answers compared to set of all possible answers - and guessing the answer will seriously punish you).
In the end can we approach this comparison this way: An MSE problem (e.g. regression) will punish "guessing" (and mistakes in general) more than a multiclass problem, and a multiclass problem will punish "guessing" more than a binary classification problem? Can we say in general, it's harder to make multiclass classification models than binary classification models?
Are my conclusions correct? Are there any formal results that discuss this?
Does anyone know where i can find data on the best performing multiclass text classifiers? This is the only info i could find and it seems it hasn't been updated since 2019.
I'm looking to use one of these for 3 class sentiment classification, negative, neutral, positive. Looking for data comparing the likes of:
Mpnet
Electra
RoBERTa
BERT
ALBERT
Or any other better models i haven't heard of.
On a side note i see a lot of benchmarks such as SQUAD have ensembles of 2 or more models. How is this done? Do they get predictions from both and then take the highest output vector score between the two of them as the prediction?
Hi all,
I have been working on a project using multiclass classification with mostly tree based models. The project was first implemented in Python and then implemented in R. I believe we ended up using the SkLearn wrapper for XGBoost in Python and when we switched to R, re-training on the same train/test split such as Python with native XGBoost gave us better results overall, (higher AUC and slightly higher accuracy) as well.
Now the problem that I am facing is, when I tried to import the same .model file into Python and run inference on the same test set again, just to check if the native implementation outperforms in both languages, I cannot seem to get the probabilities for all the classes. Can someone please help with this?
Model creation in R -
# Create XGBoost Dmatrix
X_train = subset(training, select = -c(DIAS_PAGO_FLAG))
y_train = subset(training, select = c(DIAS_PAGO_FLAG))
X_val = subset(validation, select = -c(DIAS_PAGO_FLAG))
y_val = subset(validation, select = c(DIAS_PAGO_FLAG))
dtrain = xgb.DMatrix(data = as.matrix(X_train), label = as.matrix(y_train))
dtest = xgb.DMatrix(data = as.matrix(X_val), label = as.matrix(y_val))
set.seed(0)
# XGBoost model training
num_class = 5
params = list(
booster="gbtree",
# n_estimators = 250,
max_depth=15,
gamma=0,
subsample= 1,
colsample_bytree=0.3,
objective="multi:softmax",
eval_metric="mlogloss",
num_class=num_class,
nthread = -1
)
model = xgboost(
params = params,
data = dtrain,
nrounds = 100
)
y_pred = predict(model, dtest)
test_prediction = matrix(y_pred, nrow = num_class,
ncol=length(y_pred)/num_class) %>%
t() %>%
data.frame() %>%
mutate(label = y_val + 1,
max_prob = max.col(., "last"))
confusionMatrix(factor(test_prediction$max_prob),
factor(test_prediction$label$DIAS_PAGO_FLAG))
roc_multi = multiclass.roc(test_prediction$max_prob, test_prediction$label$DIAS_PAGO_FLAG, direction = "auto")
print(roc_multi)
Multi-class area under the curve: 0.8931
# Saved model file to be reused in Python
xgb.save(model, "XGB_Model_R_v2.model")
So I got the AUC value of around 0.89, running a similar implementation in Python throws an error -
# I already got the same train & test files loaded separat
... keep reading on reddit β‘Hello all, I want to apply autoencoders for a multi-class classification problem. I am not able to find any sample code for the same. Can anyone please help me with the resource where autoencoders have been used for a similar use case? Dataset:https://www.kaggle.com/jsrojas/ip-network-traffic-flows-labeled-with-87-apps
So I have been reading some papers, seems like for multi-class classification problem, all the ones I come across mainly use Precision, Recall and F1-score as the performance metric. Using something like sklearn's classification report also showed those 3. So to me that suggest those are the 3 most widely used/informative ones.
I am just doing some googling online on metrics used for classification model and I come across something like log loss, AUC which can also be used to measure the model's performance.
So my question is, why log loss & AUC are not as 'popular' for multi-class classification? Is it because it it just make much less sense when it's used for multi-class instead of binary classification? (won't something like a table matrix just overcome this problem?) Or is it because some other reasons?
Hello everybody
I have built a network for the classification of three classes. The network consists of a CNN followed by two fully-connected layers. The CNN consists of convolutional layers, followed by batch normalization, a RELU activation, max pooling and drop out. The three classes are imbalanced (as can be seen in the confusion matrix below). I have optimized the parameters of the network to maximize AUC.
I'm calculating the AUC using macro- and micro-averaging. As can be seen in the ROC plot, the AUC is not that bad. On the other hand, the confusion matrix looks pretty bad, especially the first (low) class is badly predicted. The network tends to predict the majority class. As output of the network I'm getting a probability for each class. Then, I'm just taking the class according to the maximum probability for creating the confusion matrix.
I have tried to use balanced class weights while training the network (in the fit
method of Keras). This helped that the network also predicts more often the minority class(es) but on the other hand the AUC was decreasing.
Is there a way to infer probability thresholds from the ROC plot? I think for two classes the optimal probability threshold can be inferred from the ROC plot by taking the max(TPR - FPR)
but here I have three classes... Or is there another method?
Iβm using scikit-learn and I donβt really know too much about all this. Iβm very new to it and am learning through trial and error. Iβm trying to predict what crime is most likely to happen and where. I have three columns (state, district, year) and 5 other columns to predict (5 classes of crime). I get the error when i try to use rfc.score(x,y) and also when i try to use sklearnβs classification report, etc. I understand sklearnβs metrics canβt be used here because multiclass multioutput isnβt supported. Iβm not very sure about how to proceed now.
In trying to make my sizing and risk management functions, I was using the predict_proba function from sklearn in order to generate probability, but this doesn't work well for multiclass problems (especially bc my dataset is imbalanced). Instead of probability, it returns the mean vote of the trees, generally hovering between .4-.6 for the majority class.
Because of this, I don't have a 'real' probability from my predicitons, and I can't appropriately size my bets according to probability. What approach should I do here? I am aware that my cost function isnt linear. Should the system be prediction dependent or prediction independent?
Hi,
Iβm trying to perform face recognition as multi-class classification on a dataset composed by 10 classes. Each class is made of 21 photo (17 for training and 3 for validation). Iβm building a cnn from scratch and, after many model where loss curve didnβt decrease at all, I got this results.
Proven that this is a case of overfitting, what can I do to reduce overfitting? I tried adding dropout or using data Augmentation but I had even worse results!
The architecture of my model is similar to vggnet and I trained it for 30 epochs.
EDIT:Iβm using this dataset (http://www.scface.org)
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.