Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation

Dan Li, Zi Long Zhu, Janneke van de Loo, Agnés Masip Gómez, Vikrant Yadav, Georgios Tsatsaronis, Zubair Afzal

January, 2023

Abstract

Extreme multi-label text classification is a prevalent task in industry, but it frequently encounters challenges in terms of machine learning perspectives, including model limitations, data scarcity, and time-consuming evaluation. This paper aims to mitigate these issues by introducing novel approaches. Firstly, we propose a label ranking model as an alternative to the conventional SciBERT-based classification model, enabling efficient handling of large-scale labels and accommodating new labels. Secondly, we present an active learning-based pipeline that addresses the data scarcity of new labels during the update of a classification system. Finally, we introduce ChatGPT to assist with model evaluation. Our experiments demonstrate the effectiveness of these techniques in enhancing the extreme multi-label text classification task.

Type

Conference paper

Publication

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing Industry Track

Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation

Abstract

Dan Li

Data Scientist