(* indicates equal contribution)

2024

  1. Small Models are Valuable Plug-ins for Large Language Models Canwen Xu, Yichong Xu, Shuohang Wang, Yang Liu, Chenguang Zhu, and Julian McAuley ACL 2024 (Findings)
    [arXiv]
  2. Automatic Pair Construction for Contrastive Post-training Canwen Xu, Corby Rosset, Ethan C. Chau, Luciano Del Corro, Shweti Mahajan, Julian McAuley, Jennifer Neville, Ahmed Hassan Awadallah, and Nikhil Rao NAACL 2024 (Findings)
    [arXiv]
  3. RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems Tianyang Liu, Canwen Xu, and Julian McAuley ICLR 2024
    [URL]

2023

  1. Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data Canwen Xu*, Daya Guo*, Nan Duan, and Julian McAuley EMNLP 2023
    [arXiv]
  2. Spoiler Detection as Semantic Text Matching Ryan Tran*, Canwen Xu*, and Julian McAuley EMNLP 2023
    [URL]
  3. LongCoder: A Long-Range Pre-trained Language Model for Code Completion Daya Guo*, Canwen Xu*, Nan Duan, Jian Yin, and Julian McAuley ICML 2023
    [arXiv]
  4. Mirror: A Natural Language Interface for Data Querying, Summarization, and Visualization Canwen Xu, Julian McAuley, and Penghan Wang WWW 2023 (Demo)
    [arXiv] [Code]
  5. A Survey on Model Compression and Acceleration for Pretrained Language Models Canwen Xu, and Julian McAuley AAAI 2023
    [arXiv]
  6. A Survey on Dynamic Neural Networks for Natural Language Processing Canwen Xu, and Julian McAuley EACL 2023 (Findings)
    [arXiv]

2022

  1. InforMask: Unsupervised Informative Masking for Language Model Pretraining Nafis Sadeq*, Canwen Xu*, and Julian McAuley EMNLP 2022
    [arXiv]
  2. Efficiently Tuned Parameters are Task Embeddings Wangchunshu Zhou*, Canwen Xu*, and Julian McAuley EMNLP 2022
    [arXiv]
  3. Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot Classification Han Wang*, Canwen Xu*, and Julian McAuley NAACL 2022
    [arXiv]
  4. BERT Learns to Teach: Knowledge Distillation with Meta Learning Wangchunshu Zhou*, Canwen Xu*, and Julian McAuley ACL 2022
    [arXiv]
  5. LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval Canwen Xu*, Daya Guo*, Nan Duan, and Julian McAuley ACL 2022 (Findings)
    [arXiv]
  6. PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts Hugging Face + Big Science ACL 2022 (Demo)
    [arXiv]
  7. Multitask Prompted Training Enables Zero-Shot Task Generalization Hugging Face + Big Science ICLR 2022 [Spotlight]
    [URL]
  8. Leashing the Inner Demons: Self-Detoxification for Language Models Canwen Xu, Zexue He, Zhankui He, and Julian McAuley AAAI 2022
    [arXiv]

2021

  1. Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression Canwen Xu*, Wangchunshu Zhou*, Tao Ge, Ke Xu, Julian McAuley, and Furu Wei EMNLP 2021
    [PDF] [arXiv] [URL]
  2. Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting Wangchunshu Zhou, Tao Ge, Canwen Xu, Ke Xu, and Furu Wei EMNLP 2021
    [PDF] [arXiv] [URL]
  3. Datasets: A Community Library for Natural Language Processing The Hugging Face Team EMNLP 2021 (Demo) [Best demo paper award]
    [PDF] [arXiv] [URL] [Code]
  4. Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World Knowledge Canwen Xu*, Wangchunshu Zhou*, Tao Ge, Ke Xu, Julian McAuley, and Furu Wei NAACL-HLT 2021
    [PDF] [arXiv] [URL]

2020

  1. BERT Loses Patience: Fast and Robust Inference with Early Exit Wangchunshu Zhou*, Canwen Xu*, Tao Ge, Julian McAuley, Ke Xu, and Furu Wei NeurIPS 2020
    [PDF] [arXiv] [URL] [Code]
  2. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing Canwen Xu*, Wangchunshu Zhou*, Tao Ge, Furu Wei, and Ming Zhou EMNLP 2020
    [PDF] [arXiv] [URL] [Code]
  3. HuggingFace’s Transformers: State-of-the-art Natural Language Processing The Hugging Face Team EMNLP 2020 (Demo) [Best demo paper award]
    [PDF] [arXiv] [URL] [Code]
  4. MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization Canwen Xu*, Jiaxin Pei*, Hongtao Wu, Yiyu Liu, and Chenliang Li ACL 2020
    [PDF] [arXiv] [URL] [Video] [Code]
  5. Pre-train and Plug-in: Flexible Conditional Text Generation with Variational Auto-Encoders Yu Duan*, Canwen Xu*, Jiaxin Pei*, Jialong Han, and Chenliang Li ACL 2020
    [PDF] [arXiv] [URL] [Video] [Code]
  6. UnihanLM: Coarse-to-Fine Chinese-Japanese Language Model Pretraining with the Unihan Database Canwen Xu, Tao Ge, Chenliang Li, and Furu Wei AACL-IJCNLP 2020
    [PDF] [URL]

2019

  1. DLocRL: A Deep Learning Pipeline for Fine-Grained Location Recognition and Linking in Tweets Canwen Xu, Jing Li, Xiangyang Luo, Jiaxin Pei, Chenliang Li, and Donghong Ji WWW 2019
    [arXiv] [URL]