Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing Labels

Abstract

Label aggregation (LA) is the task of inferring a high-quality label for an example from multiple noisy labels generated by either human annotators or model predictions. Existing work on LA assumes a label generation process and designs a probabilistic graphical model (PGM) to learn latent true labels from observed crowd labels. However, the performance of PGM-based LA models is easily affected by the noise of the crowd labels. As a consequence, the performance of LA models differs on different datasets and no single LA model outperforms the rest on all datasets. We extend PGM-based LA models by integrating a GP prior on the true labels. The advantage of LA models extended with a GP prior is that they can take as input crowd labels, example features, and existing pre-trained label prediction models to infer the true labels, while the original LA can only leverage crowd labels. Experimental results on both synthetic and real datasets show that any LA models extended with a GP prior and a suitable mean function achieves better performance than the underlying LA models, demonstrating the effectiveness of using a GP prior.

Publication
In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.

Supplementary notes can be added here, including code, math, and images.

Dan Li
Dan Li
Data Scientist