We propose a partially-frozen neural network
architecture that is optimized to have an
efficient hardware implementation.
Unlike traditional layer-level freezing approaches,
our method vertically freezes a portion of
the weights, distributed across layers.
This leaves room for both adapting to new
tasks and new kinds of data.
We named this architecture SemifreddoNets,
after the Italian dessert, Semifreddo, due
to its partially-frozen nature.
Our system consists of one frozen core and
two parallel trainable cores.
The trainable cores selectively transfer and
enrich frozen core features using trainable
alpha blending parameters.
An optional core shuffle module lets the two
trainable cores exchange feature maps to act
together more efficiently.
Both the frozen and trainable cores have their
topology hard-wired, in fully-pipelined hardware.
Fixing the topology and some of the weights
in hardware reduces the silicon area, logic
delay, and memory requirements, leading to
significant savings in cost and power consumption.
Furthermore, SemifreddoNets can implement
deeper and larger neural network architectures
by reusing the last blocks repeatedly in a
single inference pass.
This block modularity provides the flexibility
to find a reasonable balance between accuracy
and speed, without requiring any hardware
change.
Thanks for watching and check out our paper to learn more.
