Dan Li

Dan Li

Data Scientist

Elsevier

Hi, my name is Dan Li (李丹). Currently I’m a data scientist at Elsevier, working on dense retrieval, extreme multi-label classification, and document-based question answer generation using generative AI.

Before that, I did my PhD with Prof. Evangelos Kanoulas and Prof. Maarten de Rijke at the University of Amsterdam. My PhD research topics was Evaluation and Optimization of Information Retrieval (check out my PhD dissertation). I’m also intersted in conversational search, crowdsourcing label denoising, evalaution of information retrieval, text-to-image generation, probabilistic graphical models, Gaussian Process.

I’m currently a member of European Laboratory for Learning and Intelligent Systems(ELLIS).

My hometown is in Inner Mongolia (China). I love dancing. I’m a big language fan. I speak Chinese and English as daily and working language. I also speak Japanese, Thai and Dutch.

Interests
  • Information Retrieval
  • Natural Language Processing
  • Artificial Intelligence
Education
  • PhD in Information Retrieval, 2016-2020

    University of Amsterdam, the Netherlands

  • Research exchange, 2018

    Tstinghua University, China

  • Master in Computational Linguistics, 2013-2016

    Dalian University of Techonology, China

  • Bachelor in Mathematics, 2007-2011

    Dalian University of Techonology, China

Selected Papers

(2023). Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation. In EMNLP ‘23.

(2022). Unsupervised Dense Retrieval for Scientific Articles. In EMNLP ‘22.

PDF Poster Video

(2021). Paint4Poem: A dataset for artistic visualization of classical Chinese poems. In Arxiv.

PDF Code

(2020). APS: An active PubMed search system for technology assisted reviews. In SIGIR ‘20.

PDF Video

(2020). Query resolution for conversational search with limited supervision. In SIGIR ‘20.

PDF Code Video

(2020). When to stop reviewing in technology-assisted reviews. In ACM TOIS.

PDF Code Slides Video

(2018). Bayesian optimization for optimizing retrieval systems. In WSDM ‘18.

PDF Slides

(2018). Studying topical relevance with evidence-based crowdsourcing. In CIKM ‘18.

PDF Code

(2018). T-Reader:A Multi-task Deep Reading Comprehension Model with Self-attention Mechanism. In Journal of Chinese Information Processing.

PDF

(2017). Active sampling for large-scale information retrieval evaluation. In CIKM ‘17.

PDF Code Slides

Worknote Papers

(2019). CLEF 2019 technology assisted reviews in empirical medicine overview. In CLEF (Working Notes) 2019.

PDF Dataset

(2018). CLEF 2018 technology assisted reviews in empirical medicine overview. In CLEF (Working Notes) 2018.

PDF Dataset

(2017). CLEF 2017 technology assisted reviews in empirical medicine overview. In CLEF (Working Notes) 2017.

PDF Dataset

Mentoring

  • Thesis supervision

    • I’m open to supervise master thesis in general IR and NLP. If you are a master student looking for internship, feel free to email me.
    • At Elsevier, we have rich textual data and we do fantastic IR/NLP tasks, such as dense retrieval, extreme multi-label text classification, question answer, generative AI, etc.
  • Past master thesis supervision

    • A comparative study of text to image generation methods for visualizing classical Chinese poems. 2022. Zeyou Niu. Msc Artificial Intelligence.
    • Automatic optimization techniques in machine learning pipelines. 2021. Simon Appelt. Msc Artificial Intelligence.
    • Modelling task and worker correlation for crowdsourcing label aggregation. 2020. Ioanna Sanida. Msc Artificial Intelligence.
    • Statistical question classification. 2019. Ruben Halfhide. Msc Data Science.
  • Past bachelor thesis supervision

    • Building a dataset for the visualization of classical Chinese poems. 2020. Elisha A. Nieuwburg, Bsc Artificial Intelligence.
    • De-noise large-scale poem-image pairs for poem-to-image generation. 2020. Fengyuan Sun. Bsc Artificial Intelligence. (Cum laude/outstanding bachelor thesis)
    • A representation of classical Chinese poetry for poem based image generation. 2020. River Vaudrin. Bsc Artificial Intelligence.
    • Image generation for classical Chinese poems. 2020. Nina M. van Liebergen. Bsc Artificial Intelligence.
    • Semantic visualization of classical Chinese poetry. 2020. Silvan Murre. Bsc Artificial Intelligence.