ChatGPT Suddenly Exploded Worldwide, What Will Chips Industry Benefit?!
Recently, the ChatGPT craze has swept the world. “ We’re clarifying how ChatGPT’s behavior is shaped and our plans for improving that behavior, allowing more user customization, and getting more public input into our decision-making in these areas.”, said OpenAI.com
ChatGPT (Chat Generative Pre-trained Transformer) is a dialogue AI model launched by OpenAI in November 2022. It achieved a monthly active user exceeding 100 million within two months of its release, making it one of the fastest-growing consumer applications in history.
As a carrier of computing power, AI servers will usher in important development opportunities. It is estimated that the global AI server market will grow from US$12.2 billion in 2020 to US$28.8 billion in 2025, with a compound annual growth rate of 18.8%. From the perspective of chip composition, AI servers are mainly CPU + accelerator chips, usually equipped with accelerator chips such as GPU, FPGA, and ASIC. The combination of CPU and accelerator chips can meet the needs of high-throughput interconnection.
ChatGPT performs reasoning, code writing, text creation, etc. on the basis of the question-and-answer mode. The number of users and the number of times of use have both increased, such as smart speaker, content production, game NPC, robot etc. With the increase in the frequency of use by end users and the surge in data traffic, the requirements for data processing capabilities, reliability, and security of servers have increased accordingly.
According to the data, ChatGPT is a model optimized based on GPT-3.5, which is a fine-tuned version of GPT-3.0. The ability of OpenAI's GPT-3.0 model to store knowledge comes from 175 billion parameters, and the single training cost is about 4.6 million US dollars. GPT-3.5 is trained on Microsoft AzureAI supercomputing infrastructure, and the total computing power consumption is about 3640PF-days (that is, if every To calculate one quadrillion times per second, it takes 3640 days to calculate).
ChatGPT has driven both the volume and price of the chip industry, which not only creates a greater demand for the number of artificial intelligence underlying chips, but also puts forward higher requirements for the computing power of the underlying chips, which drives the demand for high-end chips. It is reported that the cost of purchasing a top-level GPU from Nvidia is 80,000 yuan, and the cost of a GPU server usually exceeds 400,000 yuan. The computing power infrastructure supporting ChatGPT requires at least tens of thousands of Nvidia GPU A100s, and the rapid increase in demand for high-end chips will further increase the average price of chips.
From the perspective of chip composition, AI servers are mainly CPU + accelerator chips, usually equipped with accelerator chips such as GPU, FPGA, and ASIC. The combination of CPU and accelerator chips can meet the needs of high-throughput interconnection.
1. CPU
As the computing and control core of the computer system, it is the final execution unit of information processing and program operation. Its advantage is that it has a large number of caches and complex logic control units, and it is good at logic control and serial operations; its disadvantage is that the amount of calculation is small, and it is not good at complex algorithm operations and parallel and repeated operations. Therefore, CPU can be used for inference/prediction in deep learning.
At present, server CPUs are developing towards multi-core to meet the needs of processing power and speed improvement. For example, the number of cores of AMD EPYC 9004 can reach up to 96. However, the system performance should not only consider the number of CPU cores, but also consider the operating system, scheduling algorithm, applications and drivers, etc.
2.GPU
GPUs are highly suitable for building AI models. Due to their parallel computing capabilities and compatibility with training and reasoning, GPUs are currently widely used in acceleration chips. Taking Nvidia A100 as an example, during the training process, GPUs help solve problems at high speed: 2048 A100 GPUs can handle training workloads such as BERT at scale within one minute. During inference, Multi-Instance GPU (MIG) technology allows multiple networks to run simultaneously on a single A100, optimizing the utilization of computing resources. On top of the other inference performance gains of the A100, structural sparsity support alone can deliver up to a 2X performance boost. On advanced conversational AI models such as BERT, A100 can increase inference throughput up to 249 times that of CPU.
Currently, ChatGPT has sparked an upsurge in GPU applications. Among them, Baidu will soon launch Wenxin Yiyan (ERNIE Bot). Apple introduced the M2 series chips (M2 pro and M2 max) designed by AI accelerators, which will be equipped in the new computer. As the usage of ChatGPT surges, OpenAI needs stronger computing power to respond to the needs of millions of users, thus increasing the demand for NVIDIA GPUs.
AMD plans to launch TSMC's 4nm process "Phoenix" series chips that compete with Apple's M2 series chips, and the "Alveo V70" AI chip designed using the Chiplet process. Both chips are planned to be launched to the market this year, targeting the consumer electronics market and AI reasoning respectively.
3.FPGA
FPGA has the characteristics of high programmable flexibility, short development cycle, on-site reprogramming function, low latency, and convenient parallel computing. It can empower large models through deep learning + distributed cluster data transmission.
4. ASICs
Compared with general integrated circuits in mass production, ASIC has the advantages of smaller size, lower power consumption, improved reliability, improved performance, enhanced confidentiality, and reduced cost, which can further optimize performance and power consumption. With the development of machine learning, edge computing, and autonomous driving, a large number of data processing tasks are generated, and the requirements for chip computing efficiency, computing power, and energy consumption ratio are getting higher and higher. ASICs are widely concerned by combining them with CPUs. , Leading manufacturers at home and abroad are planning to meet the arrival of the AI era.
Among them, Google's latest TPU v4 cluster is called Pod, which contains 4096 v4 chips and can provide floating-point performance of more than 1 exaflops. NVIDIA GPU+CUDA is mainly for large-scale data-intensive HPC and AI applications; the Grace-based system is closely integrated with NVIDIA GPU, and its performance is 10 times higher than that of NVIDIA DGX system. The second-generation Baidu Kunlun AI chip adopts the world's leading 7nm process and is equipped with a self-developed second-generation XPU architecture, which improves performance by 2-3 times compared to the first generation; the third-generation Kunlun chip will be mass-produced in early 2024.
5. Optical module
At present, the demand for model computing power in the AI era has far exceeded the growth rate of Moore's Law, especially after the era of deep learning and large models, it is expected to double in 5-6 months. However, the data transfer rate has become a computing power bottleneck that is easily overlooked. With the growth of data transmission volume, the demand for optical modules as the carrier of equipment interconnection in the data center increases accordingly.
The rise of ChatGPT has promoted the vigorous development of artificial intelligence on the application side, which also puts forward unprecedented demands on the computing power of computing devices. Although AI chips, GPUs, CPU+FPGA, and other chips have formed the underlying computing power support for the existing models, in the face of the potential exponential growth of computing power in the future, Chiplet heterogeneous technology is used to accelerate the implementation of various application algorithms in the short term. The memory-computing integrated chip (reducing data transfer inside and outside the chip) may become a potential way to upgrade computing power in the future.