Novel Hiddenite accelerator aims to offer dramatic improvements in deep-learning energy efficiency
Designed to keep as much of the computation on-chip as possible to reduce expensive calls to external memory, the prototype Hiddenite accelerator offers state-of-the-art performance based on the "lottery ticket" hypothesis.
Machine learning via neural networks has proven its worth across a broad swathe of problem domains, but its wide deployment brings a problem of its own: High computational demand, which translates into a need for large amounts of energy.
Dedicated accelerator hardware, better-suited to the workload than the general-purpose processors on which it was previously run, could hold the answer — and one new accelerator, dubbed Hiddenite, is showing impressive results in-silico thanks to a novel three-prong approach based on the concepts of “hidden neural networks” and keeping as much of the computation on-chip as possible.
Hidden knowledge
Hiddenite — short for the Hidden Neural Network Inference Tensor Engine — is, its creators claim, the first accelerator chip which targets hidden neural networks (HNNs). Designed to simplify neural network models without damaging accuracy, the HNN concept is based on Jonathan Frankle and Michael Carbin’s “lottery ticket hypothesis” which suggested a randomly-initialized deep neural network contains “subnetworks” equivalent to the original DNN post-training. By pruning the rest of the network away and keeping these “winning ticket” subnetworks, the complexity of the network is reduced for equivalent accuracy.
The Hiddenite accelerator aims to improve the efficiency of calculating these hidden neural networks, lowering computational complexity and thus power requirements — and doing away, where possible, with expensive calls to off-chip memory which can otherwise hamper a promising chip design.
The Hiddenite accelerator aims to find and calculate "lottery ticket" subnetworks to boost the energy efficiency of neural networks.
“Reducing the external memory access is the key to reducing power consumption,” explains project co-lead Masato Motomura, of the Tokyo Institute of Technology. “Currently, achieving high inference accuracy requires large models. But this increases external memory access to load model parameters. Our main motivation behind the development of Hiddenite was to reduce this external memory access.”
The architecture of the accelerator chip is split into three key sections: A supermask expansion unit, which allows for the use of compression to reduce the size of the binary masks used to find subnetworks; a weight generator unit, which exploits the discovery that weights can be regenerated using a random number generator and a hashing function so as to avoid the need to store weights or seed values; and a high-density four-dimensional parallel processor which prioritizes data reuse for boosted efficiency.
Computationally efficient
To prove the concept, Motomura and colleagues fabricated a prototype Hiddenite processor on Taiwan Semiconductor (TSMC)'s 40nm process node — a considerably older and larger node than is typically used for state-of-the-art devices in the field of deep learning. Measuring 3×3mm (around 0.12×0.12"), the chip’s footprint is primarily taken up by memory — 8Mb of activation memory (AMEM), 256kb of supermask memory (SMEM), and 128kb of zero run-length memory (ZMEM) — with logic found at the center of the die.
The Hiddenite processor, produced on a 40nm node, has the bulk of its footprint dominated by its memories.
Key to the Hiddenite concept is performing as much of the work on-chip as possible. The weight generator means there’s no need to store and load weights from external memory while the supermask expansion hardware means model parameters are less likely to exceed available on-chip memory, the team explains. The parallel processor, meanwhile, boosts efficiency through maximizing reuse of data.
“The first two factors are what set the Hiddenite chip apart from existing DNN inference accelerators,” says Motomura. “Moreover, we also introduced a new training method for hidden neural networks, called ‘score distillation,’ in which the conventional knowledge distillation weights are distilled into the scores because hidden neural networks never update the weights. The accuracy using score distillation is comparable to the binary model while being half the size.”
Hiddenite's efficiency offers stiff competition for rival accelerator designs, despite being built on a much larger process node.
Early testing certainly shows promise: The 40nm Hiddenite prototype was able to beat rival designs build on considerably smaller nodes, from 28nm down to 5nm, in the performance-per-watt metric, offering between 8.1 and 34.8 trillion operations per second (TOPS) per watt depending on voltage and model used — a state-of-the-art showing which doesn’t take into account the additional power efficiency gains made by removing the off-chip memory access required by its contemporaries.
The team’s work was presented at the International Solid-States Circuits Conference 2022 (ISSCC 2022), as Session 15.4. No paper has yet been published for public consumption.
References
Kazutoshi Hirose, Jaehoon Yu, Kota Ando, Yasuyuki Okoshi, Ángel López García-Arias, Junnosuke Suzuki, Thiem Van Chu, Kazushi Kawamura, and Masato Motomura: Hiddenite: 4K-PE Hidden Network Inference 4D-Tensor Engine Exploiting On-Chip Model Construction Achieving 34.8-to-16.0TOPS/W for CIFAR-100 and ImageNet, International Solid-State Circuits Conference 2022, Session 15.4.
Jonathan Frankle and Michael Carbin: The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, Computing Research Repository (CoRR), arXiv. DOI arXiv:1803.03635 [cs.LG].