Can I convert a multiclass classification system to multilabel?

Essentially, I have textual data that was originally coded to be multiclass, each instance in one of K=8 topics. What I have found is that there are sometimes "errors" in the sense that an instance repeats or has near-duplicates several times in the data, but is sometimes coded differently. "Errors" is in quotes because often, these discrepancies are "understandable," i.e. a document that discusses Hockey also happens to discuss Soccer, but there is no general "Sports" label, and so humans will rightly disagree on whether Hockey or Soccer are the dominant topics in that document.

To remedy this for future data, I wanted to train the model on this multiclass system, but have it output predictions in a multilabel fashion. Of course, the model can only be validated internally on its performance on the multiclass system; out-of-sample, though, I want to be able to print out both the multiclass label it decides on (if it's forced to) and whether it thinks multiple labels might be present.

The simplest way I thought to do this was to allow the model's uncertainty to reflect the potentially multi-label nature of an instance. In other words, in its multiclass prediction, it might assign Soccer Pr=0.45 and Hockey Pr=0.40, and all of the other 6 topics are squished in the remaining 0.15 from the softmax function. Normally we would pick the max(confidence) prediction to be the predicted label. Could I simply reduce the rule and say "if Pr >0.30, give it that label"? By pigeonhole this would mean at most documents have 3 topics. Is there a more robust way to do this / is it possible? Sorry if the question doesn't make sense.

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/eadala
πŸ“…︎ Dec 16 2021
🚨︎ report
(Multiclass classification) Handling class imbalance with oversampling versus using class weights?

As the title states. If I have a multiclass classification task with 5 classes, total instances for them being [1000,1000,500,250,100], without any special care taken to address class imbalance, most methods would be far more concerned with learning about the larger classes, and might miss for example the 5th class entirely.

To address this, my understanding is there are two main favored approaches. The first is to simply duplicate the undersampled (training set) instances until the training set is perfectly balanced, e.g. if the above example is my training set, I'd duplicate instances until I have [1000,1000,1000,1000,1000]. I'd then train like usual, and validate using the usual validation set, taking care that these instances are are not twins from the training data.

The second approach is to augment the loss function such that the model is penalized relatively more for predictions away from ground-truth for the undersampled classes, e.g. in PyTorch's nn.CrossEntropyLoss, I'd add weight=[0.1,0.1,0.2,0.4,1] such that being correct about 1 instance of the 100-instance class is as valuable as being correct about 10 instances of the 1,000-instance class.

The main thing I'm getting from various sources is that in practice, these two approaches arrive at similar results. For already-large models that demand a lot of compute, if that's the case, would it not be more time-efficient to simply augment the loss function? Oversampling for the above example is nearly doubling the size of the training set.

On the other hand, I'm also seeing that the risk in augmenting the loss function is if the batch size is small enough, it's still possible the model will not see enough examples of the undersampled classes to learn about them in a consistent way, even with them receiving a higher priority.

I guess my question is what are the concrete differences between them, and what works best in practice (in general, or task-specific)? Is there some other approach to handling class imbalance that's far better and I'm just missing it? Thanks for any insight you can offer!

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/eadala
πŸ“…︎ Nov 17 2021
🚨︎ report
(Supervised Learning; Multiclass Classification) If intercoder reliability is an issue, should it lower our expectations for model performance?

Suppose there's a set of (multiclass) labeled documents, say 10 classes, and the team found intercoder reliability to be about 90%. I could account for this directly in PyTorch's nn.CrossEntropyLoss by defining label_smooth=0.1 to give the model soft rather than hard targets.

Beyond this though, what kind of performance should I interpret as "good enough"? Right now, with that stated intercoder reliability, my model is at approximately 89.8% accuracy / 90% F1 score. There are some examples of duplicate text entries in the training data that are coded differently, i.e. conditioned on two documents having the same input text, they get coded as different labels roughly 5.3% of the time.

Should I just expect that 90% is about as good as it gets, since the labels themselves slightly weaker than would be the case in "pure" supervised learning?

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/eadala
πŸ“…︎ Nov 13 2021
🚨︎ report
(Multiclass classification) Adding weights to the loss function to account for time

I have a corpus of 30 years of labeled articles and am fitting a neural network to predict those labels by reading the instances' texts. To account for domain shift, I want to in some way tell the NN "yes, that article written 30 years ago still has some useful information, but it's far more important to understand the stuff written last year." I'm not sure how to do it with PyTorch's nn.CrossEntropyLoss function, but:

  • Would it make sense to weight the loss function such that more recent observations receive higher weight? This would force the model to pay closer attention to those observations, and worry less about older ones, but not discount them entirely. If this makes sense,
  • Would it make more sense to apply a linear weighting scheme, i.e. documents 1 year old are twice as valuable as 2-year-old, thrice as valuable as 3-year-old, four times as valuable as 4-year-old, etc.?
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/eadala
πŸ“…︎ Nov 10 2021
🚨︎ report
Multiclass classification low performance on one class

I have multiclass classifier but having low precision/recall/accuracy on only one class. Network is shallow - baseline model. I just wanna know what would be the cause (if this cant be verified..then nvm) and what can I do to improve it?

πŸ‘︎ 9
πŸ’¬︎
πŸ“…︎ Sep 28 2021
🚨︎ report
[D] LGBM vs. XGBoost for Multiclass Classification: XGBoost error constantly decreases but LGBM only increases

Hi - I have a multi-class problem with 168 classes - we are trying to predict the most likely class. We originally used XGBoost and got good results from an accuracy standpoint, but the model takes a while to run. We tried to use LGBM instead because it is faster than XGBoost in our experience. However, we found that while each round took less time, the error only ever increased. We tried using the `multi_logloss` objective as well as the `multi_error` objective and had a similar experience with both. We used a very similar set of parameters for the XGB and LGBM models.

Why might XGBoost always improve each round but LGBM always gets worse each round?

πŸ‘︎ 11
πŸ’¬︎
πŸ‘€︎ u/jsxgd
πŸ“…︎ Aug 04 2021
🚨︎ report
IMU Multiclass Classification

Hi Machine Learning Enthusiasts πŸ˜„ I'd be super grateful if you could help me (a beginner) out on a few questions to get over my decision paralysis. I am currently in the planning/research phase of an App-Project (fluttr) in which I want to utilize IMUs (Inertial Measurement Units, "Movement Sensors") to analyze movements of athletes (Kickboxers) and detect what techniques they perform. In order to achieve this I want to record samples of all Technqiues performed by a number of different athletes and train a Model using that data, my use case thus seems to fall under 'Multiclass Classification using Supervised Learning/Labelled Data' (right?). I know some basic principles of Machine Learning but haven't worked with it yet so I started out by reading a number of research papers about IMUs + Machine Learning to find out what Techniques/Models they employed. From that I was able to extract out the following steps:

IMU Sensor Fusion using Kalman or Complimentary Filter (?)

Some kind of Feature-Extraction

Sliding Scanning Window to analyze continuous incoming data (I am thinking 250ms-500ms since techniques can be very fast)

Dynamic Time Warping in order to detect Techniques performed at varying speeds

Detection/Classification of Techniques using Machine Learning Model

Here are my questions:

Feature Extraction: Since I'm using 4 IMUs with 9 degrees of Freedom each at a Frequency of probably 50Hz, feeding the full dataset into the model is probably not a good idea. I was confronted with a variety of approaches here, going from simply deciding to use features like Accelerometer/Gyroscope min/max/mean/range/std, to using Principal Component Analysis (PCA) for Dimension Reduction & Linear Discriminant Analysis (LDA) for Feature Extraction, to even something fancy like using the full raw IMU-data to form a signal image which is then fed into a Convolutional Neural Network to extract a feature vector. What would be the advisable approach to take here?

ML Model: I am stuck on deciding whether to use a Support Vector Machine (SVM) or Neural Net (NN). I am slightly leaning towards SVM as this seems to be the recommended Model for Classification Problems, it can also be trained very fast and seems to guarantee to converge on optimal accuracy. On the other hand, it doesnt natively support Multiclass Classification but achieves this by breaking the problem down into several binary Classification problems, I dont know if this is a big drawback or not though. Ne

... keep reading on reddit ➑

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/t0b1_alexander
πŸ“…︎ Aug 30 2021
🚨︎ report
(PyTorch & Transformers) Why don't outputs in a BERT model for binary / multiclass classification need to be reshaped to match inputs?

I'm working on a binary classification task. I set the model output as softmax with 2 classes rather than sigmoid with 1 (softmax with 2 I think is equivalent to sigmoid?). Either way, this question remains:

The PyTorch Dataset that I have has items in __getitem__ that look like this:

{'input_ids': tensor([101, ..., 102]),
'attention_mask': tensor([1, ..., 1]),
'target': tensor(1)}

The target can equal 0 or 1, and input_ids and attention_mask both follow directly from a BertTokenizer object. The types and shapes of these items are:

item['input_ids'].shape  # = torch.Size([512])
item['attention_mask'].shape  # = torch.Size([512])
item['target'].shape  # = torch.Size([])

I use a PyTorch LightningModule to house the BERT model; the forward pass sends in batches of items from the Dataset, as prepared in the training_step method of the LightningModule:

def training_step(self, batch, batch_idx):
    input_ids = batch["input_ids"]
    attention_mask = batch["attention_mask"]
    target = batch["target"]
    loss, outputs = self(input_ids, attention_mask, target)
    self.log("train_loss", loss, prog_bar=True, logger=True)
    return {"loss": loss, "predictions": outputs, "target": target}

The forward method is just a simple linear layer atop a base BertModel. To illustrate my point, I've included three print() statements that aren't usually there:

def forward(self, input_ids, attention_mask, leadership, republican_governor):
    print(input_ids, input_ids.shape)
    print(attention_mask, attention_mask.shape)
    print(leadership, leadership.shape)
    bert_output = self.bert(input_ids, attention_mask=attention_mask)
    bert_output = self.text_1(bert_output.pooler_output)  # nn.Linear(768,512)
    bert_output = self.bert_1_batchnorm(bert_output)
    bert_output= self.relu_activation(bert_output)
    bert_output = self.dropout(bert_output)
    bert_output = self.text_2(bert_output)  # nn.Linear(512, n_classes=2)
    
    output = bert_output.squeeze(1)
    loss = 0
    if targetis not None:
      loss = self.criterion(output, target)
    output = torch.softmax(output, dim=1)

    return loss, output

What I see from these print statements is... confusing:

print(input_ids, input_ids.shape)  
# returns tensor([[101, ..., 0], ..., [101, ..., 102]]) and torch.Size([16,512])
print(attention_mask, attention_mas
... keep reading on reddit ➑

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/eadala
πŸ“…︎ Sep 01 2021
🚨︎ report
Multiclass classification ideas

Can I please get some multiclass classification ideas from you guys? Thankyou!

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/emaaarrrr
πŸ“…︎ Jul 19 2021
🚨︎ report
"[Discussion]" Custom metrics with Keras: precision, recall and F1-score for imbalanced multiclass classification

Dear Members,

As I am not very comfortable with the backend functions of Keras, I would like to know if the block of code indicated below for calculating precision, recall and F1-score (and which can be found here and there in various threads) can be used as is for the case of multiclass classification.

I thank you in advance for your help.

def recall_m(y_true, y_pred):
    y_true = K.ones_like(y_true) 
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    all_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    
    recall = true_positives / (all_positives + K.epsilon())
    return recall

def precision_m(y_true, y_pred):
    y_true = K.ones_like(y_true) 
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def f1_score(y_true, y_pred):
    precision = precision_m(y_true, y_pred)
    recall = recall_m(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/lovalery
πŸ“…︎ Jul 02 2021
🚨︎ report
[D] can it be mathematically shown that binary classification is "easier" compared to multiclass classification and regression?

Can it be mathematically shown that binary classification is "easier" compared to multiclass classification and regression?

The way I see it: a binary classification problem is like a true/false exam but a multiclass classification problem is like a multiple choice exam. We all know that guessing a true/false exam is easier than a multiple choice exam. The same way, can we say that binary classification is inherently harder than multiclass classification? Are multiclass problems harder for statistical models/ml algorithms?

The same way, when you have regression and your performance metric is MSE, "guessing" will punish your model even more. (E.g. on a math test where you actually have to calculate the answer, there is a much smaller ratio of correct answers compared to set of all possible answers - and guessing the answer will seriously punish you).

In the end can we approach this comparison this way: An MSE problem (e.g. regression) will punish "guessing" (and mistakes in general) more than a multiclass problem, and a multiclass problem will punish "guessing" more than a binary classification problem? Can we say in general, it's harder to make multiclass classification models than binary classification models?

Are my conclusions correct? Are there any formal results that discuss this?

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/SQL_beginner
πŸ“…︎ Apr 15 2021
🚨︎ report
Current Sota for Multiclass Text Classification?

Does anyone know where i can find data on the best performing multiclass text classifiers? This is the only info i could find and it seems it hasn't been updated since 2019.

I'm looking to use one of these for 3 class sentiment classification, negative, neutral, positive. Looking for data comparing the likes of:

Mpnet

Electra

RoBERTa

BERT

ALBERT

Or any other better models i haven't heard of.

On a side note i see a lot of benchmarks such as SQUAD have ensembles of 2 or more models. How is this done? Do they get predictions from both and then take the highest output vector score between the two of them as the prediction?

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/rpatel9
πŸ“…︎ Mar 26 2021
🚨︎ report
Multiclass classification XGBoost output

Hi all,

I have been working on a project using multiclass classification with mostly tree based models. The project was first implemented in Python and then implemented in R. I believe we ended up using the SkLearn wrapper for XGBoost in Python and when we switched to R, re-training on the same train/test split such as Python with native XGBoost gave us better results overall, (higher AUC and slightly higher accuracy) as well.

Now the problem that I am facing is, when I tried to import the same .model file into Python and run inference on the same test set again, just to check if the native implementation outperforms in both languages, I cannot seem to get the probabilities for all the classes. Can someone please help with this?

Model creation in R -

# Create XGBoost Dmatrix
X_train = subset(training, select = -c(DIAS_PAGO_FLAG))
y_train = subset(training, select = c(DIAS_PAGO_FLAG))
X_val = subset(validation, select = -c(DIAS_PAGO_FLAG))
y_val = subset(validation, select = c(DIAS_PAGO_FLAG))

dtrain = xgb.DMatrix(data = as.matrix(X_train), label = as.matrix(y_train))
dtest = xgb.DMatrix(data = as.matrix(X_val), label = as.matrix(y_val))

set.seed(0)

# XGBoost model training

num_class = 5
params = list(
  booster="gbtree",
  # n_estimators = 250,
  max_depth=15,
  gamma=0,
  subsample= 1,
  colsample_bytree=0.3,
  objective="multi:softmax",
  eval_metric="mlogloss",
  num_class=num_class,
  nthread = -1
)

model = xgboost(
  params = params,
  data = dtrain,
  nrounds = 100
)

y_pred = predict(model, dtest)

test_prediction = matrix(y_pred, nrow = num_class,
                         ncol=length(y_pred)/num_class) %>%
  t() %>%
  data.frame() %>%
  mutate(label = y_val + 1,
         max_prob = max.col(., "last"))

confusionMatrix(factor(test_prediction$max_prob),
                factor(test_prediction$label$DIAS_PAGO_FLAG))

roc_multi = multiclass.roc(test_prediction$max_prob, test_prediction$label$DIAS_PAGO_FLAG, direction = "auto")
print(roc_multi)

Multi-class area under the curve: 0.8931

# Saved model file to be reused in Python
xgb.save(model, "XGB_Model_R_v2.model")

So I got the AUC value of around 0.89, running a similar implementation in Python throws an error -

# I already got the same train & test files loaded separat
... keep reading on reddit ➑

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/goddySHO
πŸ“…︎ Dec 23 2020
🚨︎ report
Multiclass Classification using Autoencoders

Hello all, I want to apply autoencoders for a multi-class classification problem. I am not able to find any sample code for the same. Can anyone please help me with the resource where autoencoders have been used for a similar use case? Dataset:https://www.kaggle.com/jsrojas/ip-network-traffic-flows-labeled-with-87-apps

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/RoutineWin
πŸ“…︎ Jan 21 2021
🚨︎ report
[D] Question on performance metric for multiclass classification

So I have been reading some papers, seems like for multi-class classification problem, all the ones I come across mainly use Precision, Recall and F1-score as the performance metric. Using something like sklearn's classification report also showed those 3. So to me that suggest those are the 3 most widely used/informative ones.

I am just doing some googling online on metrics used for classification model and I come across something like log loss, AUC which can also be used to measure the model's performance.

So my question is, why log loss & AUC are not as 'popular' for multi-class classification? Is it because it it just make much less sense when it's used for multi-class instead of binary classification? (won't something like a table matrix just overcome this problem?) Or is it because some other reasons?

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/Cedar_Wood_State
πŸ“…︎ Sep 11 2020
🚨︎ report
Calibrating probability thresholds for multiclass classification

Hello everybody

I have built a network for the classification of three classes. The network consists of a CNN followed by two fully-connected layers. The CNN consists of convolutional layers, followed by batch normalization, a RELU activation, max pooling and drop out. The three classes are imbalanced (as can be seen in the confusion matrix below). I have optimized the parameters of the network to maximize AUC.

I'm calculating the AUC using macro- and micro-averaging. As can be seen in the ROC plot, the AUC is not that bad. On the other hand, the confusion matrix looks pretty bad, especially the first (low) class is badly predicted. The network tends to predict the majority class. As output of the network I'm getting a probability for each class. Then, I'm just taking the class according to the maximum probability for creating the confusion matrix.

I have tried to use balanced class weights while training the network (in the fit method of Keras). This helped that the network also predicts more often the minority class(es) but on the other hand the AUC was decreasing.

Is there a way to infer probability thresholds from the ROC plot? I think for two classes the optimal probability threshold can be inferred from the ROC plot by taking the max(TPR - FPR) but here I have three classes... Or is there another method?

ROC Curve

Confusion Matrix

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/BlackHawk1001
πŸ“…︎ Dec 21 2020
🚨︎ report
How to resolve β€œmulticlass-multioutput is not supported” error when calculating score(random forest classification)? I’m a beginner to ML and I’m learning on my own.

I’m using scikit-learn and I don’t really know too much about all this. I’m very new to it and am learning through trial and error. I’m trying to predict what crime is most likely to happen and where. I have three columns (state, district, year) and 5 other columns to predict (5 classes of crime). I get the error when i try to use rfc.score(x,y) and also when i try to use sklearn’s classification report, etc. I understand sklearn’s metrics can’t be used here because multiclass multioutput isn’t supported. I’m not very sure about how to proceed now.

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Crimsonlamp
πŸ“…︎ Oct 21 2020
🚨︎ report
Using Deep Learning for End to End Multiclass Text Classification lionbridge.ai/articles/us…
πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/Shirappu
πŸ“…︎ Oct 02 2020
🚨︎ report
Risk Management/Bet Sizing from multiclass classification model

In trying to make my sizing and risk management functions, I was using the predict_proba function from sklearn in order to generate probability, but this doesn't work well for multiclass problems (especially bc my dataset is imbalanced). Instead of probability, it returns the mean vote of the trees, generally hovering between .4-.6 for the majority class.

Because of this, I don't have a 'real' probability from my predicitons, and I can't appropriately size my bets according to probability. What approach should I do here? I am aware that my cost function isnt linear. Should the system be prediction dependent or prediction independent?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/ryohh
πŸ“…︎ Jun 30 2020
🚨︎ report
Using Deep Learning for End to End Multiclass Text Classification lionbridge.ai/articles/us…
πŸ‘︎ 22
πŸ’¬︎
πŸ‘€︎ u/LimarcAmbalina
πŸ“…︎ Apr 02 2020
🚨︎ report
FastAI With TPU In PyTorch For Multiclass Image Classification
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/analyticsindiam
πŸ“…︎ Aug 07 2020
🚨︎ report
Multiclass classification and Error Analysis on Jupyter notebook with scikit-learn. Let me know what you guys think youtu.be/5KyH6v8oKNQ
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/bazziapps
πŸ“…︎ Jun 17 2020
🚨︎ report
End to End Multiclass Image Classification Using Pytorch and Transfer Learning lionbridge.ai/articles/en…
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/LimarcAmbalina
πŸ“…︎ Jun 02 2020
🚨︎ report
How to reduce overfitting for multiclass classification

Hi,

I’m trying to perform face recognition as multi-class classification on a dataset composed by 10 classes. Each class is made of 21 photo (17 for training and 3 for validation). I’m building a cnn from scratch and, after many model where loss curve didn’t decrease at all, I got this results.

Proven that this is a case of overfitting, what can I do to reduce overfitting? I tried adding dropout or using data Augmentation but I had even worse results!

The architecture of my model is similar to vggnet and I trained it for 30 epochs.

EDIT:I’m using this dataset (http://www.scface.org)

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/spaceape__
πŸ“…︎ Apr 08 2019
🚨︎ report
End to End Multiclass Image Classification Using Pytorch and Transfer Learning lionbridge.ai/articles/en…
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/LimarcAmbalina
πŸ“…︎ Jun 02 2020
🚨︎ report
Using Deep Learning for End to End Multiclass Text Classification lionbridge.ai/articles/us…
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/LimarcAmbalina
πŸ“…︎ Apr 06 2020
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.