Today’s artificial intelligence has a very important problem: it is too expensive. The cost of training a modern computer vision model Resnet-152 is about 10 billion floating point operations, which pales in comparison to modern language models.
Training OpenAI’s latest natural language model GPT-3 is expected to cost 3000 trillion floating-point operations, of which the cost on commercial GPUs is at least US$5 million. In contrast, the human brain can recognize faces, answer questions, and drive a car with just a cup of coffee.
How do we do it?
We have made great progress.
The first batch of computers had a special purpose. In 1822, the British mathematician Charles Babbage created a “difference engine” to calculate polynomial functions. In 1958, Cornell University professor Frank Rosenblatt created “Mark I”, which is a physical form of a single-layer perceptron for machine vision tasks. Hardware and algorithms were the same thing in the early days.
The unification of hardware and algorithms has changed with the emergence of the von Neumann architecture, which has a chip design composed of computing processing units and a memory unit that stores data and program instructions. This paradigm shift makes it possible to build general-purpose machines that can be programmed for any task. The Von Neumann architecture has become the blueprint of modern digital computers.
But there is a contradiction here. Data-intensive programs require a lot of communication between memory units and computing units, which slows down the calculation speed. This “von Neumann bottleneck” was the reason for the failure of early attempts at artificial intelligence. Standard CPUs are not efficient in large-scale matrix multiplication, the core computing operation of deep neural networks. Due to the bottleneck of the existing hardware, the early neural network technology was stranded and performed poorly.
Interestingly, the solution to the problem does not come from academia, but from the gaming industry. In the 1970s, GPUs developed to accelerate the development of video games parallelized data-intensive operations with thousands of computing cores. This parallelization is an effective way to solve the von Neumann bottleneck. GPU makes it possible to train deeper neural networks and becomes the hardware currently used in modern artificial intelligence technology.
The success of research in the field of artificial intelligence has a lot of luck. Google researcher Sara Hooker calls it the “hardware lottery”: The early AI researchers were very unfortunate because they were trapped by the slow CPU. Researchers who happened to be in the AI field when the GPU appeared “won” the hardware lottery. They can train neural networks by taking advantage of the efficient acceleration functions of GPUs, thereby making rapid progress.
The problem with the hardware lottery is that once everyone in this entire field becomes a winner, it is difficult to explore new things. The slow development of hardware requires chip manufacturers to make a large amount of early investment with uncertain returns. An insurance approach is to optimize matrix multiplication, which has become the status quo. However, in the long run, this focus on a specific combination of hardware and algorithms will limit our choices.
Let us return to the first question. Why is artificial intelligence so expensive today? The answer may be that we do not have the right hardware yet. The existence of hardware lottery and the mechanism of commercial incentives make it difficult for us to get rid of the current status quo economically.
A good example is Geoffrey Hinton’s capsule neural network-a novel computer vision method. Google researchers Paul Barham and Michael Isard found that this method is very effective on the CPU, but it does not work well on the GPU and TPU.
What is the reason behind this? The accelerator is optimized for the most frequent operations such as standard matrix multiplication, but it lacks the optimization of capsule convolution. Their conclusion (also the title of the paper) is: the machine learning system is in trouble.
AI researchers may “overfit” existing hardware, which will inhibit innovation in the field in the long run.
The way forward
“Achieving the next breakthrough may require a fundamentally different perspective: combining hardware, software and algorithms to model the world.”
In the human brain, memory and calculation are not two separate parts. Instead, they occur in the same place: neurons. Memory comes from the way neurons are connected together through synapses, and computing comes from the way neurons trigger and transmit information from sensory inputs. Just like early computers, hardware and algorithms are the same thing. This is different from the way we build artificial intelligence today.
Although deep neural networks driven by GPUs and TPUs perform well in many tasks today, they are not a long-term solution. Perhaps they are just the local optimal solutions of the combined architecture of hardware and algorithms under the broad prospects.
Realizing that algorithms are not enough is the beginning of the way forward. The next generation of artificial intelligence requires innovation in both hardware and algorithms. Before the advent of GPUs, AI research stalled. If there is no breakthrough in hardware, we may fall into the predicament of stagnation again.