Computer Science > Computation and Language
[Submitted on 27 Nov 2023 (this version), latest version 4 Dec 2023 (v2)]YUAN 2.0: A Large Language Model with Localized Filtering-based Attention
Shaohua Wu, Xudong Zhao, Shenling Wang, Jiangang Luo, Lingjun Li, Xi Chen, Bing Zhao, Wei Wang, Tong Yu, Rongguo Zhang, Jiahua Zhang, Chao WangIn this work, the Localized Filtering-based Attention (LFA) is introduced to incorporate prior knowledge of local dependencies of natural language into Attention. Based on LFA, we develop and release Yuan 2.0, a large language model with parameters ranging from 2.1 billion to 102.6 billion. A data filtering and generation method is presented to build pretraining and fine-tuning dataset in high quality. A distributed training method with non-uniform pipeline parallel, data parallel, and optimizer parallel is proposed, which greatly reduces the bandwidth requirements of intra-node communication, and achieves good performance in large-scale distributed training. Yuan 2.0 models display impressive ability in code generation, math problem-solving, and chat compared with existing models. The latest version of YUAN 2.0, including model weights and source code, is accessible at Github.
Subjects: | Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC) |
Cite as: | arXiv:2311.15786 [cs.CL] |
(or arXiv:2311.15786v1 [cs.CL] for this version) | |
https://doi.org/10.48550/arXiv.2311.15786 Focus to learn more |
Submission history
From: Tong Yu [view email][v1] Mon, 27 Nov 2023 13:01:59 UTC (1,242 KB)
[v2] Mon, 4 Dec 2023 10:20:57 UTC (1,245 KB)