Processing in edge devices
For NNs to do their job correctly, they first need to be trained. Typically, this is done ‘offline’ in the cloud and relies on powerful server hardware. The recognising of patterns of objects is known as inferencing and is done in real-time. It involves deploying and running the trained neural network model. Today, this stage is also performed in the cloud, but moving forward, due to scalability issues and to fully achieve AI’s potential it will need to be done at the edge – for example, on mobile and embedded devices. It is also driven by the increasing need for AI-enabled devices to operate remotely and/or untethered, such as drones, smartphones and augmented reality smart glasses.
Looking at connectivity in more detail, mobile networks might not always be available, whether they are 3G, 4G or 5G, not to mention the prohibitive costs involved to stream multiple simultaneous high-resolution video feeds. Therefore, sending data to and from the cloud and expecting a decision in real-time won’t be realistic. As such, it is now time to move the processing and deployment of NNs to edge devices. It is simply not practical to run them over the network due to the issues highlighted earlier – scalability, latency, sporadic inaccessibility and a lack of suitable security.
Why dedicated hardware acceleration is needed
On the other hand, deploying and running NNs on edge devices, brings its own unique challenges, such as limited compute resources, power and memory bandwidth.
To deliver the required level of performance within those constraints, a dedicated silicon offering hardware acceleration for neural networks in needed. This will provide the leap in performance required and much-reduced power consumption – something that consumers care about considerably. While they will come to expect the benefits of AI, such as improved search, they will not want it at a cost to their device’s battery life.