Toxicity Detection in Online Comments with Limited Data: A Comparative Analysis

We present a comparative study on toxicity detection, focusing on the problem of identifying toxicity types of low prevalence and possibly even unobserved at training time. For this purpose, we train our models on a dataset that contains only a weak type of toxicity, and test whether they are able to generalize to more severe toxicity types. We find that representation learning and ensembling exceed the classification performance of simple classifiers on toxicity detection, while also providing significantly better generalization and robustness. All models benefit from a larger training set size, which even extends to the toxicity types unseen during training.

  • Published in:
    ESANN European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)
  • Type:
    Inproceedings
  • Authors:
    M. Lübbering, M. Pielka, K. Das, M. Gebauer, R. Ramamurthy, C. Bauckhage, R. Sifa
  • Year:
    2021

Citation information

M. Lübbering, M. Pielka, K. Das, M. Gebauer, R. Ramamurthy, C. Bauckhage, R. Sifa: Toxicity Detection in Online Comments with Limited Data: A Comparative Analysis, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), ESANN, 2021, https://www.researchgate.net/publication/355249542_Toxicity_Detection_in_Online_Comments_with_Limited_Data_A_Comparative_Analysis, Luebbering.etal.2021b,