CONNECT WITH US

AI PCs and architectures

Jim Hsiao
Jim Hsiao
DIGITIMES Research observes that on-device large-scale AI model inference is determined not only by the computing performance of xPU, but also by model compression and memory bandwidth all of which will affect the inference performance of AI PCs.
Abstract

As Microsoft, Meta and others actively launch lightweight AI models, and notebook processor vendors introduce system architectures and designs to enhance AI computing performance, AI PCs to be launched in 2024 will be able to execute multiple generative AI tasks offline.

The original versions of large language models (LLM) or large vision models (LVM) cannot be run on notebooks due to their huge volumes and enormous demands for computing power. Through compression technologies, such as model pruning and knowledge distillation, AI models with tens of billions of parameters can be compressed to one-tenth of the original. By quantizing parameters, the model can be further compressed by a factor of four or eight, effectively compressing the large models for use in notebooks with certain accuracy.

The general matrix multiplication (GEMM) and general matrix vector multiplication (GEMV) algorithms, which are the most important components of large-scale AI models, are compute bound and memory bound respectively.

Download full report (subscription required)

Published: June 11, 2024

Pick an option that is right for you

Single Report
  • US$900
Team or Enterprise subscription
Inquire
Have a question?
consultant
Customized market research services
We can customize the research to meet your specific needs, helping you make strategic and profitable business decisions.
Sample reports
Connect with a consultant