AI accelerators for embedded (benchmark)


W took 2 most common AI accelerators (at least those that most often popped up in our projects to date — Jul’18) and benchmarked them. Both were attached to an example edge device (edge server) which in our case was Lenovo’s Tiny PC. The models we used were: NVIDIA Quadro P1000 and a configuration with 2 Movidius cards (PCIe). On the software side, we used both Caffe and Tensorflow frameworks. Also, we tested the performance of both solutions while using some of the most common computer vision pre-trained models.


The results of our study show that using a GPU for objects detection allows to analyze data in real-time. At the same time, single Intel Movidius as well as two Intel Movidius chips do not provide desired efficiency in the given scenario. However, it still can be successfully used in the applications where real-time processing is not necessary and near-real-time is enough.

Based on the knowledge gained during this study, we concluded that the advantage of NVIDIA GPU over Intel Movidius VPU is not only in performance of computations. The GPU allows for both: training of the DNNs and interference whereas Movidius is designed only for a cooperation with pre-trained models.

Another difference between both accelerators is about their support for various AI libraries/frameworks. While Movidius provides support for two popular frameworks (Caffe and Tensorflow), GPU supports more AI libraries, eg.: cuDNN or Theano.

The difference between these two accelerators can also be noticed on the side of the programming process. In many cases the implementation of an application which uses GPU does not require any special knowledge about the accelerator itself. Most of the AI frameworks provide a built-in support for GPU computing (both training and interference) out of the box. In Movidius case, however, it is required to gain knowledge about its SDK as well. It is not a painful process but still yet another tool in the chain.

When comparing both accelerators, another difference is also the area of usage. While the GPU is a powerful accelerator for AI computations, electricity consumption and size of this kind of accelerators can be an obstacle in many areas. GPU offers notable high performance of computations (order of few TFlops or more), however it is usually dedicated for HPC solutions. At the same time, Intel Movidius is a low-power AI solution dedicated for on-device computer vision. The size of device and power consumption makes it attractive for many usages, eg: IoT solutions, drones or smart security.

Given the context above, here are some additional remarks one might consider when deciding which accelerator is a better fit for a given design.

However, it is important to emphasize that: the comparison of Movidius and NVIDIA as two competing accelerators for AI workloads leads to a conclusion that these two are meant for different tasks.

Therefore looking at these only thru the perspective of the performance benchmarking results might be misleading. To properly choose between Movidius and NVIDIA GPU one should foremost take into account the intended application rather than the performance benchmark results only. Movidius is primarily designed to execute the AI workloads based on trained models (inference). NVIDIA’s GPU on the other hand can do these plus training. Therefore it really depends whether the planned device is to work in execute-only-mode or be capable of updating/re-training its models (brains) as well. And of course these make sense as long as we are talking of executing such tasks within a reasonable time frame.

You can find more details in our blog post on the subject. Link to benchmark results here.

Explore our team’s expertise on NVIDIA technologies here.

AI on edge (powered by NVIDIA and Intel)