From arxiv.org
Serving Large Language Models on Huawei CloudMatrix384
1 5
The rapid evolution of large language models (LLMs), driven by growing parameter scales, adoption of mixture-of-experts (MoE) architectures, and expanding context lengths, imposes unprecedented demands on AI infrastructure. Traditional AI clusters face limitations in compute intensity, memory...
on Tue, 9AM