SlideShare a Scribd company logo
1 of 11
AI on EDGE
GPU VS. VPU
byteLAKE’s basic benchmark results between two different setups
of example edge devices: with NVIDIA GPU and with Intel’s
Movidius cards.
Artificial
Intelligence
HPC
Machine
Learning
Deep Learning
Computer Vision
Edge Intelligence
byteLAKE
pl. Solny 14/3
50-062 Wroclaw, Poland
+48 508 091 885
+48 505 322 282
+1 650 735 2063
www.byteLAKE.com
AI on EDGE: GPU vs. VPU  Jul-18 2
Devices Configuration
Tests were run on two Lenovo’s Tiny PCs.
Tiny#1: Lenovo ThinkCentre M910x Tiny
• CPU: Intel Core i7-7700T vPro
• AI accelerator: 2 x Intel Movidius Myriad 2 VPU
• Memory: 4 GB LPDDR3
• System: Ubuntu 16.04 LTS
Tiny#2: Lenovo ThinkCentre M920x Tiny
• CPU: Intel Core™ i5-8500T
• AI accelerator: NVIDIA Quadro P1000
• Memory: 4 GB GDDR5
• System: Ubuntu 18.04 LTS
Software Configuration:
• Frameworks: Caffe, Tensorflow, OpenCV 3.4
• Drivers:
o Tiny #1: Intel Movidius Neural Compute SDK v1
o Tiny #2: Nvidia GPU Drivers ver. 390.48; CUDA Toolkit 8
AI on EDGE: GPU vs. VPU  Jul-18 3
Test procedure description:
During the course of the studies, we analyzed the performance of two Tiny PCs using the state-of-the-
art YOLO (You Only Look Once) real-time detection model [1]. In both cases we focused on a special
version of the YOLO model, called Tiny YOLO model.
The model consists of a single input layer, 8 convolution layers, 8 batch norm layers, 8 relu layers and
a single full-connected layer. Tiny YOLO is able to recognize objects out of 20 classes, including: aero-
plane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person,
potted plant, sheep, sofa, train and tv-monitor. The size of the pre-trained Tiny YOLO detection model
is 50 MB.
The deep neural net (DNN) used for this study has been implemented using Python Caffe AI framework.
Benchmarks were based on a real-time analysis of the sequence of frames captured from the camera.
Also, these were performed using three configurations of the AI devices, including:
• single Movidius Myriad accelerator enabled (Tiny#1);
• two Movidius Myriad cards enabled (Tiny#1);
• single NVIDIA GPU (Tiny#2).
AI on EDGE: GPU vs. VPU  Jul-18 4
The procedure to assess the overall performance of the above Tiny PC configurations took into account
all steps required to generate resulting movie, including:
• grabbing of the frames from the camera;
• frames preparation;
• forwarding of the images through the deep neural net;
• filtering the results of the analysis;
• drawing the results on the frame;
• presenting the results of the analysis.
In order to ensure objectivity of measurements for all of the configurations, the analysis was performed
for a defined number of frames. At the same time, we assumed two criteria of performance: (i) average
value of Frame Per Second (FPS) factor, and (ii) execution time of AI computations using all above
mentioned configurations.
Figure 1 below presents the method of taking the measurements in details (sample code from a single
Movidius configuration; for others: the method has been implemented in a similar fashion).
[1]. YOLO: Real-Time Object Detection, URL: https://pjreddie.com/darknet/yolo/utm_source=
next.36kr.com
AI on EDGE: GPU vs. VPU  Jul-18 5
Figure 1. Adopted method of performance measurements for a single Movidius Myriad accelerator
AI on EDGE: GPU vs. VPU  Jul-18 6
Results
The tests described above were based on RGB frames grabbed by a Creative Live! Cam Sync USB camera.
The original size of a single frame was 1080 x 720 (HD) pixels but due to the required structure of the
input layer of the YOLO detector, we resized the frames to 448 by 448 RGB pixels.
The benchmarks were carried out for a sequence of 500 frames.
The performance results for different configuration of AI accelerators are presented in Table 1 below.
The average FPS factor was calculated using the following formula:
FPSavg = 500 / Ta
where Ta refers to the time of the overall analysis of 500 frames (as described above).
Table 1. Performance results
1 x Movidius Myriad 2 2 x Movidius Myriad 2 1 x NVIDIA P100 GPU
Time [s] 123.1 69.8 23.3
Average FPS factor 4.05 7.16 21.3
As expected, the best performance results were achieved while using the GPU accelerator.
The execution time of this version for 500 frames took ca. 23 seconds, and it allowed for a processing
with the average frequency of ca. 21 frames. Consequently, a single GPU turned out to be 5.28 times
faster than a single Myriad chip and 2.99 times faster than the configuration with two Movidius
accelerators (at least for the given benchmark procedure).
In the scenario where we enabled both Movidius cards, we developed an approach which allowed for
parallel analysis of frames being grabbed from the camera. In consequence, this version was 1.76
times faster than the version with a single Myriad chip. In the given scenario, a single Intel Movidius
was able to perform only at the rate of ca. 4 FPS whereas a double-Movidius configuration reached
ca. 6 FPS.
AI on EDGE: GPU vs. VPU  Jul-18 7
Conclusions
The results of this study show that using a GPU for objects detection based on YOLO model allows to
analyze data in real-time. At the same time, single Intel Movidius as well as two Intel Movidius chips
do not provide desired efficiency in the given scenario. However, it still can be successfully used in the
applications where real-time processing is not necessary and near-real-time is enough.
The comparison of both devices is presented in the Table 2 below. Based on the knowledge gained
during this study, we conclude that the advantage of NVIDIA GPU over Intel Movidius VPU is not only
in performance of computations. The GPU allows for both: training of the DNNs and interference
whereas Movidius is designed only for a cooperation with pre-trained models.
Another difference between both accelerators is about their support for various AI
libraries/frameworks. While Movidius provides support for two popular frameworks (Caffe and
Tensorflow), GPU supports more AI libraries, eg.: cuDNN or Theano.
The difference between these two accelerators can also be noticed on the side of the programming
process. In many cases the implementation of an application which uses GPU does not require any
special knowledge about the accelerator itself. Most of the AI frameworks provide a built-in support
for GPU computing (both training and interference) out of the box. In Movidius case, however, it is
required to gain knowledge about its SDK as well. It is not a painful process but still yet another tool in
the chain.
When comparing both accelerators, another difference is also the area of usage. While the GPU is a
powerful accelerator for AI computations, electricity consumption and size of this kind of accelerators
can be an obstacle in many areas. GPU offers notable high performance of computations (order of few
TFlops or more), however it is usually dedicated for HPC solutions. At the same time, Intel Movidius is
a low-power AI solution dedicated for on-device computer vision. The size of device and power
consumption makes it attractive for many usages, eg: IoT solutions, drones or smart security.
Given the context above, here are some additional remarks one might consider when deciding which
accelerator is a better fit for a given design. However, it is important to emphasize that the comparison
of Movidius and NVIDIA as two competing accelerators for AI workloads leads to a conclusion that
these two are meant for different tasks. Therefore looking at these only thru the perspective of the
performance benchmarking results might be misleading. To properly choose between Movidius and
NVIDIA GPU one should foremost take into account the intended application rather than the
performance benchmark results only. Movidius is primarily designed to execute the AI workloads based
on trained models (inference). NVIDIA’s GPU on the other hand can do these plus training. Therefore it
really depends whether the planned device is to work in execute-only-mode or be capable of
updating/re-training its models (brains) as well. And of course these make sense as long as we are
talking of executing such tasks within a reasonable time frame.
AI on EDGE: GPU vs. VPU  Jul-18 8
Table 2. The comparison of Nvidia GPU and Intel Movidius VPU
INTEL MOVIDIUS NVIDIA GPU
FOR INFERENCING YES YES
FOR TRAINING NO YES
AI FRAMEWORKS CAFFE / TENSORFLOW CAFE/TENSORFLOW/CUDNN
and more...
MAX MODEL SIZE 320 MB No limit
EASY TO CODE? Except knowledge about AI
framework/library,
programmers need to learn
Movidius programming SDK.
Programming AI applications
requires knowledge about
utilized library/framework, eg.:
Caffe or Tensorflow.
FORM FACTOR Small (i.e. mobile, IoT) medium+
POWER CONSUMPTION Low, ~1W medium+
HEATING + -
CAN WORK OFFLINE Yes Yes
MAIN PURPOSE Classification and recognition of
objects
General AI
OS Ubuntu 16.04, Raspberry Pi 3
Raspbian Stretch
As long as the drivers are
available (Windows, Linux)
COMPUTATIONAL POWER 150 GFlops Very high, TFlops and higher
OTHER Imaging/vision accelerators
included (12 specialized vector
VLIW processors (SHAVEs) +
2*RISC processors).
ARITHMETIC 8/16/32 integer, 16/32 floating
point
all
PRICE TAG <$80 $100+
AI on EDGE: GPU vs. VPU  Jul-18 9
Thank you!
Contact us at: welcome@byteLAKE.com
AI on EDGE: GPU vs. VPU  Jul-18 10
Learn how we work:
Listen Actively
We start with a consultancy
session to better understand our
client’s requirements &
assumptions.
1 2
Suggest
We thoroughly analyze the
gathered information and
prepare a draft offer.
3
Agree
We fine tune the offer further
and wrap up everything into a
binding contract.
4
Deliver
Finally, the execution starts. We
deliver projects in a fully
transparent, Agile (SCRUM-
based) fashion.
AI on EDGE: GPU vs. VPU  Jul-18 11
We build Artificial Intelligence
software and integrate that into
products.
We port and optimize algorithms
for parallel, CPU+GPU HPC
architectures.
We deploy AI on data centers, the
cloud and constrained, embedded
devices (AI on Edge).
byteLAKE
www.byteLAKE.com
We are specialists in:
Helping companies transform
for the era of Artificial Intelligence.
We are a team of scientists, programmers, designers
and technology enthusiasts helping industries incorporate
AI techniques into products.
Machine Learning
Deep Learning
Computer Vision
High Performance Computing
Heterogeneous Computing
Edge Intelligence

More Related Content

Similar to Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius

High End Modeling & Imaging with Intel Iris Pro Graphics
High End Modeling & Imaging with Intel Iris Pro GraphicsHigh End Modeling & Imaging with Intel Iris Pro Graphics
High End Modeling & Imaging with Intel Iris Pro GraphicsIntel® Software
 
Performance and power comparisons between nvidia and ati gpus
Performance and power comparisons between nvidia and ati gpusPerformance and power comparisons between nvidia and ati gpus
Performance and power comparisons between nvidia and ati gpusijcsit
 
byteLAKE's CFD Suite (AI-accelerated CFD) - AI Training at the Edge (benchmark)
byteLAKE's CFD Suite (AI-accelerated CFD) - AI Training at the Edge (benchmark)byteLAKE's CFD Suite (AI-accelerated CFD) - AI Training at the Edge (benchmark)
byteLAKE's CFD Suite (AI-accelerated CFD) - AI Training at the Edge (benchmark)byteLAKE
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Intel® Software
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesWithTheBest
 
Dell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarBill Wong
 
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Intel® Software
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Stefano Di Carlo
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
 
Get results from demanding workflows in less time with the new HP Z8 Fury G5 ...
Get results from demanding workflows in less time with the new HP Z8 Fury G5 ...Get results from demanding workflows in less time with the new HP Z8 Fury G5 ...
Get results from demanding workflows in less time with the new HP Z8 Fury G5 ...Principled Technologies
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
Enhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPUEnhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPUMahesh Khadatare
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
VisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalVisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalMasatsugu HASHIMOTO
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI TodayDESMOND YUEN
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Intel® Software
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsGanesan Narayanasamy
 

Similar to Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius (20)

High End Modeling & Imaging with Intel Iris Pro Graphics
High End Modeling & Imaging with Intel Iris Pro GraphicsHigh End Modeling & Imaging with Intel Iris Pro Graphics
High End Modeling & Imaging with Intel Iris Pro Graphics
 
Performance and power comparisons between nvidia and ati gpus
Performance and power comparisons between nvidia and ati gpusPerformance and power comparisons between nvidia and ati gpus
Performance and power comparisons between nvidia and ati gpus
 
byteLAKE's CFD Suite (AI-accelerated CFD) - AI Training at the Edge (benchmark)
byteLAKE's CFD Suite (AI-accelerated CFD) - AI Training at the Edge (benchmark)byteLAKE's CFD Suite (AI-accelerated CFD) - AI Training at the Edge (benchmark)
byteLAKE's CFD Suite (AI-accelerated CFD) - AI Training at the Edge (benchmark)
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
Dell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation Webinar
 
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
 
Apu fc & s project
Apu fc & s projectApu fc & s project
Apu fc & s project
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Get results from demanding workflows in less time with the new HP Z8 Fury G5 ...
Get results from demanding workflows in less time with the new HP Z8 Fury G5 ...Get results from demanding workflows in less time with the new HP Z8 Fury G5 ...
Get results from demanding workflows in less time with the new HP Z8 Fury G5 ...
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
Enhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPUEnhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPU
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
VisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_FinalVisionizeBeforeVisulaize_IEVC_Final
VisionizeBeforeVisulaize_IEVC_Final
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 

More from byteLAKE

byteLAKE's AI Products (use cases) (short)
byteLAKE's AI Products (use cases) (short)byteLAKE's AI Products (use cases) (short)
byteLAKE's AI Products (use cases) (short)byteLAKE
 
byteLAKE's AI Products (use cases) - presentation
byteLAKE's AI Products (use cases) - presentationbyteLAKE's AI Products (use cases) - presentation
byteLAKE's AI Products (use cases) - presentationbyteLAKE
 
byteLAKE's AI Products for Industries (2024-02)
byteLAKE's AI Products for Industries (2024-02)byteLAKE's AI Products for Industries (2024-02)
byteLAKE's AI Products for Industries (2024-02)byteLAKE
 
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)byteLAKE
 
AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...
AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...
AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...byteLAKE
 
Self-Checkout for Restaurants / AI Restaurants (2024-02)
Self-Checkout for Restaurants / AI Restaurants (2024-02)Self-Checkout for Restaurants / AI Restaurants (2024-02)
Self-Checkout for Restaurants / AI Restaurants (2024-02)byteLAKE
 
Self-Checkout (AI for Restautants) - case study by byteLAKE's partner: Simpra
Self-Checkout (AI for Restautants) - case study by byteLAKE's partner: SimpraSelf-Checkout (AI for Restautants) - case study by byteLAKE's partner: Simpra
Self-Checkout (AI for Restautants) - case study by byteLAKE's partner: SimprabyteLAKE
 
byteLAKE: Sztuczna Inteligencja dla Przemysłu i Usług
byteLAKE: Sztuczna Inteligencja dla Przemysłu i UsługbyteLAKE: Sztuczna Inteligencja dla Przemysłu i Usług
byteLAKE: Sztuczna Inteligencja dla Przemysłu i UsługbyteLAKE
 
Przegląd zastosowań sztucznej inteligencji (2024-01)
Przegląd zastosowań sztucznej inteligencji (2024-01)Przegląd zastosowań sztucznej inteligencji (2024-01)
Przegląd zastosowań sztucznej inteligencji (2024-01)byteLAKE
 
Przegląd zastosowań Sztucznej inteligencjI
Przegląd zastosowań Sztucznej inteligencjIPrzegląd zastosowań Sztucznej inteligencjI
Przegląd zastosowań Sztucznej inteligencjIbyteLAKE
 
AI Solutions for Industries
AI Solutions for IndustriesAI Solutions for Industries
AI Solutions for IndustriesbyteLAKE
 
AI-accelerated CFD (Computational Fluid Dynamics)
AI-accelerated CFD (Computational Fluid Dynamics)AI-accelerated CFD (Computational Fluid Dynamics)
AI-accelerated CFD (Computational Fluid Dynamics)byteLAKE
 
Advanced Quality Inspection and Data Insights (Artificial Intelligence)
Advanced Quality Inspection and Data Insights (Artificial Intelligence)Advanced Quality Inspection and Data Insights (Artificial Intelligence)
Advanced Quality Inspection and Data Insights (Artificial Intelligence)byteLAKE
 
AI Solutions for Industries (short)
AI Solutions for Industries (short)AI Solutions for Industries (short)
AI Solutions for Industries (short)byteLAKE
 
Self-Checkout (AI for Restautants)
Self-Checkout (AI for Restautants)Self-Checkout (AI for Restautants)
Self-Checkout (AI for Restautants)byteLAKE
 
Applying Industrial AI Models to Product Quality Inspection
Applying Industrial AI Models to Product Quality InspectionApplying Industrial AI Models to Product Quality Inspection
Applying Industrial AI Models to Product Quality InspectionbyteLAKE
 
byteLAKE and Intel Partnership
byteLAKE and Intel PartnershipbyteLAKE and Intel Partnership
byteLAKE and Intel PartnershipbyteLAKE
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE
 
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...byteLAKE
 
byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)
byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)
byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)byteLAKE
 

More from byteLAKE (20)

byteLAKE's AI Products (use cases) (short)
byteLAKE's AI Products (use cases) (short)byteLAKE's AI Products (use cases) (short)
byteLAKE's AI Products (use cases) (short)
 
byteLAKE's AI Products (use cases) - presentation
byteLAKE's AI Products (use cases) - presentationbyteLAKE's AI Products (use cases) - presentation
byteLAKE's AI Products (use cases) - presentation
 
byteLAKE's AI Products for Industries (2024-02)
byteLAKE's AI Products for Industries (2024-02)byteLAKE's AI Products for Industries (2024-02)
byteLAKE's AI Products for Industries (2024-02)
 
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
 
AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...
AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...
AI Solutions for Industries | Quality Inspection | Data Insights | Predictive...
 
Self-Checkout for Restaurants / AI Restaurants (2024-02)
Self-Checkout for Restaurants / AI Restaurants (2024-02)Self-Checkout for Restaurants / AI Restaurants (2024-02)
Self-Checkout for Restaurants / AI Restaurants (2024-02)
 
Self-Checkout (AI for Restautants) - case study by byteLAKE's partner: Simpra
Self-Checkout (AI for Restautants) - case study by byteLAKE's partner: SimpraSelf-Checkout (AI for Restautants) - case study by byteLAKE's partner: Simpra
Self-Checkout (AI for Restautants) - case study by byteLAKE's partner: Simpra
 
byteLAKE: Sztuczna Inteligencja dla Przemysłu i Usług
byteLAKE: Sztuczna Inteligencja dla Przemysłu i UsługbyteLAKE: Sztuczna Inteligencja dla Przemysłu i Usług
byteLAKE: Sztuczna Inteligencja dla Przemysłu i Usług
 
Przegląd zastosowań sztucznej inteligencji (2024-01)
Przegląd zastosowań sztucznej inteligencji (2024-01)Przegląd zastosowań sztucznej inteligencji (2024-01)
Przegląd zastosowań sztucznej inteligencji (2024-01)
 
Przegląd zastosowań Sztucznej inteligencjI
Przegląd zastosowań Sztucznej inteligencjIPrzegląd zastosowań Sztucznej inteligencjI
Przegląd zastosowań Sztucznej inteligencjI
 
AI Solutions for Industries
AI Solutions for IndustriesAI Solutions for Industries
AI Solutions for Industries
 
AI-accelerated CFD (Computational Fluid Dynamics)
AI-accelerated CFD (Computational Fluid Dynamics)AI-accelerated CFD (Computational Fluid Dynamics)
AI-accelerated CFD (Computational Fluid Dynamics)
 
Advanced Quality Inspection and Data Insights (Artificial Intelligence)
Advanced Quality Inspection and Data Insights (Artificial Intelligence)Advanced Quality Inspection and Data Insights (Artificial Intelligence)
Advanced Quality Inspection and Data Insights (Artificial Intelligence)
 
AI Solutions for Industries (short)
AI Solutions for Industries (short)AI Solutions for Industries (short)
AI Solutions for Industries (short)
 
Self-Checkout (AI for Restautants)
Self-Checkout (AI for Restautants)Self-Checkout (AI for Restautants)
Self-Checkout (AI for Restautants)
 
Applying Industrial AI Models to Product Quality Inspection
Applying Industrial AI Models to Product Quality InspectionApplying Industrial AI Models to Product Quality Inspection
Applying Industrial AI Models to Product Quality Inspection
 
byteLAKE and Intel Partnership
byteLAKE and Intel PartnershipbyteLAKE and Intel Partnership
byteLAKE and Intel Partnership
 
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurations
 
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
 
byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)
byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)
byteLAKE's Scan&GO - Self-Check-Out Solution for Retail (EuroShop'23)
 

Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius

  • 1. AI on EDGE GPU VS. VPU byteLAKE’s basic benchmark results between two different setups of example edge devices: with NVIDIA GPU and with Intel’s Movidius cards. Artificial Intelligence HPC Machine Learning Deep Learning Computer Vision Edge Intelligence byteLAKE pl. Solny 14/3 50-062 Wroclaw, Poland +48 508 091 885 +48 505 322 282 +1 650 735 2063 www.byteLAKE.com
  • 2. AI on EDGE: GPU vs. VPU  Jul-18 2 Devices Configuration Tests were run on two Lenovo’s Tiny PCs. Tiny#1: Lenovo ThinkCentre M910x Tiny • CPU: Intel Core i7-7700T vPro • AI accelerator: 2 x Intel Movidius Myriad 2 VPU • Memory: 4 GB LPDDR3 • System: Ubuntu 16.04 LTS Tiny#2: Lenovo ThinkCentre M920x Tiny • CPU: Intel Core™ i5-8500T • AI accelerator: NVIDIA Quadro P1000 • Memory: 4 GB GDDR5 • System: Ubuntu 18.04 LTS Software Configuration: • Frameworks: Caffe, Tensorflow, OpenCV 3.4 • Drivers: o Tiny #1: Intel Movidius Neural Compute SDK v1 o Tiny #2: Nvidia GPU Drivers ver. 390.48; CUDA Toolkit 8
  • 3. AI on EDGE: GPU vs. VPU  Jul-18 3 Test procedure description: During the course of the studies, we analyzed the performance of two Tiny PCs using the state-of-the- art YOLO (You Only Look Once) real-time detection model [1]. In both cases we focused on a special version of the YOLO model, called Tiny YOLO model. The model consists of a single input layer, 8 convolution layers, 8 batch norm layers, 8 relu layers and a single full-connected layer. Tiny YOLO is able to recognize objects out of 20 classes, including: aero- plane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, sofa, train and tv-monitor. The size of the pre-trained Tiny YOLO detection model is 50 MB. The deep neural net (DNN) used for this study has been implemented using Python Caffe AI framework. Benchmarks were based on a real-time analysis of the sequence of frames captured from the camera. Also, these were performed using three configurations of the AI devices, including: • single Movidius Myriad accelerator enabled (Tiny#1); • two Movidius Myriad cards enabled (Tiny#1); • single NVIDIA GPU (Tiny#2).
  • 4. AI on EDGE: GPU vs. VPU  Jul-18 4 The procedure to assess the overall performance of the above Tiny PC configurations took into account all steps required to generate resulting movie, including: • grabbing of the frames from the camera; • frames preparation; • forwarding of the images through the deep neural net; • filtering the results of the analysis; • drawing the results on the frame; • presenting the results of the analysis. In order to ensure objectivity of measurements for all of the configurations, the analysis was performed for a defined number of frames. At the same time, we assumed two criteria of performance: (i) average value of Frame Per Second (FPS) factor, and (ii) execution time of AI computations using all above mentioned configurations. Figure 1 below presents the method of taking the measurements in details (sample code from a single Movidius configuration; for others: the method has been implemented in a similar fashion). [1]. YOLO: Real-Time Object Detection, URL: https://pjreddie.com/darknet/yolo/utm_source= next.36kr.com
  • 5. AI on EDGE: GPU vs. VPU  Jul-18 5 Figure 1. Adopted method of performance measurements for a single Movidius Myriad accelerator
  • 6. AI on EDGE: GPU vs. VPU  Jul-18 6 Results The tests described above were based on RGB frames grabbed by a Creative Live! Cam Sync USB camera. The original size of a single frame was 1080 x 720 (HD) pixels but due to the required structure of the input layer of the YOLO detector, we resized the frames to 448 by 448 RGB pixels. The benchmarks were carried out for a sequence of 500 frames. The performance results for different configuration of AI accelerators are presented in Table 1 below. The average FPS factor was calculated using the following formula: FPSavg = 500 / Ta where Ta refers to the time of the overall analysis of 500 frames (as described above). Table 1. Performance results 1 x Movidius Myriad 2 2 x Movidius Myriad 2 1 x NVIDIA P100 GPU Time [s] 123.1 69.8 23.3 Average FPS factor 4.05 7.16 21.3 As expected, the best performance results were achieved while using the GPU accelerator. The execution time of this version for 500 frames took ca. 23 seconds, and it allowed for a processing with the average frequency of ca. 21 frames. Consequently, a single GPU turned out to be 5.28 times faster than a single Myriad chip and 2.99 times faster than the configuration with two Movidius accelerators (at least for the given benchmark procedure). In the scenario where we enabled both Movidius cards, we developed an approach which allowed for parallel analysis of frames being grabbed from the camera. In consequence, this version was 1.76 times faster than the version with a single Myriad chip. In the given scenario, a single Intel Movidius was able to perform only at the rate of ca. 4 FPS whereas a double-Movidius configuration reached ca. 6 FPS.
  • 7. AI on EDGE: GPU vs. VPU  Jul-18 7 Conclusions The results of this study show that using a GPU for objects detection based on YOLO model allows to analyze data in real-time. At the same time, single Intel Movidius as well as two Intel Movidius chips do not provide desired efficiency in the given scenario. However, it still can be successfully used in the applications where real-time processing is not necessary and near-real-time is enough. The comparison of both devices is presented in the Table 2 below. Based on the knowledge gained during this study, we conclude that the advantage of NVIDIA GPU over Intel Movidius VPU is not only in performance of computations. The GPU allows for both: training of the DNNs and interference whereas Movidius is designed only for a cooperation with pre-trained models. Another difference between both accelerators is about their support for various AI libraries/frameworks. While Movidius provides support for two popular frameworks (Caffe and Tensorflow), GPU supports more AI libraries, eg.: cuDNN or Theano. The difference between these two accelerators can also be noticed on the side of the programming process. In many cases the implementation of an application which uses GPU does not require any special knowledge about the accelerator itself. Most of the AI frameworks provide a built-in support for GPU computing (both training and interference) out of the box. In Movidius case, however, it is required to gain knowledge about its SDK as well. It is not a painful process but still yet another tool in the chain. When comparing both accelerators, another difference is also the area of usage. While the GPU is a powerful accelerator for AI computations, electricity consumption and size of this kind of accelerators can be an obstacle in many areas. GPU offers notable high performance of computations (order of few TFlops or more), however it is usually dedicated for HPC solutions. At the same time, Intel Movidius is a low-power AI solution dedicated for on-device computer vision. The size of device and power consumption makes it attractive for many usages, eg: IoT solutions, drones or smart security. Given the context above, here are some additional remarks one might consider when deciding which accelerator is a better fit for a given design. However, it is important to emphasize that the comparison of Movidius and NVIDIA as two competing accelerators for AI workloads leads to a conclusion that these two are meant for different tasks. Therefore looking at these only thru the perspective of the performance benchmarking results might be misleading. To properly choose between Movidius and NVIDIA GPU one should foremost take into account the intended application rather than the performance benchmark results only. Movidius is primarily designed to execute the AI workloads based on trained models (inference). NVIDIA’s GPU on the other hand can do these plus training. Therefore it really depends whether the planned device is to work in execute-only-mode or be capable of updating/re-training its models (brains) as well. And of course these make sense as long as we are talking of executing such tasks within a reasonable time frame.
  • 8. AI on EDGE: GPU vs. VPU  Jul-18 8 Table 2. The comparison of Nvidia GPU and Intel Movidius VPU INTEL MOVIDIUS NVIDIA GPU FOR INFERENCING YES YES FOR TRAINING NO YES AI FRAMEWORKS CAFFE / TENSORFLOW CAFE/TENSORFLOW/CUDNN and more... MAX MODEL SIZE 320 MB No limit EASY TO CODE? Except knowledge about AI framework/library, programmers need to learn Movidius programming SDK. Programming AI applications requires knowledge about utilized library/framework, eg.: Caffe or Tensorflow. FORM FACTOR Small (i.e. mobile, IoT) medium+ POWER CONSUMPTION Low, ~1W medium+ HEATING + - CAN WORK OFFLINE Yes Yes MAIN PURPOSE Classification and recognition of objects General AI OS Ubuntu 16.04, Raspberry Pi 3 Raspbian Stretch As long as the drivers are available (Windows, Linux) COMPUTATIONAL POWER 150 GFlops Very high, TFlops and higher OTHER Imaging/vision accelerators included (12 specialized vector VLIW processors (SHAVEs) + 2*RISC processors). ARITHMETIC 8/16/32 integer, 16/32 floating point all PRICE TAG <$80 $100+
  • 9. AI on EDGE: GPU vs. VPU  Jul-18 9 Thank you! Contact us at: welcome@byteLAKE.com
  • 10. AI on EDGE: GPU vs. VPU  Jul-18 10 Learn how we work: Listen Actively We start with a consultancy session to better understand our client’s requirements & assumptions. 1 2 Suggest We thoroughly analyze the gathered information and prepare a draft offer. 3 Agree We fine tune the offer further and wrap up everything into a binding contract. 4 Deliver Finally, the execution starts. We deliver projects in a fully transparent, Agile (SCRUM- based) fashion.
  • 11. AI on EDGE: GPU vs. VPU  Jul-18 11 We build Artificial Intelligence software and integrate that into products. We port and optimize algorithms for parallel, CPU+GPU HPC architectures. We deploy AI on data centers, the cloud and constrained, embedded devices (AI on Edge). byteLAKE www.byteLAKE.com We are specialists in: Helping companies transform for the era of Artificial Intelligence. We are a team of scientists, programmers, designers and technology enthusiasts helping industries incorporate AI techniques into products. Machine Learning Deep Learning Computer Vision High Performance Computing Heterogeneous Computing Edge Intelligence