Bringing CFD Apps to FPGA


CFD, Computational Fluid Dynamics is a critical tool for solving flow problems. It is used both for fundamental research and industrial R&D. You will find CFD Applications in weather forecasting, ocean currents simulations, when designing and analyzing pumps, compressors, fans and even nuclear and hydro power plants. It is widely used across various industries like automotive, chemical, aerospace, biomedical, power and energy, construction and many more.

CFD Applications require a huge amount of computational power. Such apps combine complex numerical algorithms with data structures to help industries analyze and solve problems related to fluid flows (liquids and gases) and their interactions with surfaces defined by boundary conditions.

Traditionally, CFD simulations are executed on CPU-based configurations. During our past research we noticed that many of such algorithms offer an impressive speed-up with more parallelism. However, as the algorithms are memory bound, the global memory bandwidth is one of the main performance limitation. In addition, the execution time is close to the data transfer time, since the computation are fully overlapped with data transfers. In turn, we are able to work with relatively low clock frequencies and become also more energy efficient.

Based on Moore’s law the speed available on a single chip doubles every 18 months. But unfortunately the ever-increasing transistor density does not necessarily deliver comparable improvements in performance. Thus HPC world drifted towards exploratory research work that included systems powered by conventional processors cooperating with multicore, specialized hardware accelerators, namely heterogeneous computing architectures.

More powerful hardware and highly optimized software translates into significantly reduced time of simulations. On a flipside, these also make it possible to perform much more complex simulations. But unfortunately while increasing the number of cores or moving towards heterogeneous architectures we need to be very cautious about the energy consumption as this can easily grow rapidly.

In addition, with the ever-increasing demand for accuracy and simulation capabilities, the CFD applications constantly produce an exponential growth of the required computational resources. Moore’s law is way too slow to keep the pace.

Taking all of these into account, our team concluded that:

  • to address the rapidly growing demand for computational resources, CDF Applications need to be optimized for heterogeneous architectures
  • FPGA (Field Programmable Gate Array) as a specialized hardware accelerator offers excellent energy efficiency, performance benefits and low latency while remaining highly adaptable to individual algorithms specific needs (unlike fixed-function accelerator cards where these can be limited by number of registers, memory etc.)

Working with Xilinx

Following the announcements made by Xilinx during the XDF’18 (Xilinx Developer Forum) we could not be more excited to start working on bringing the CFD Applications to the FPGA Alveo Adaptable Accelerator Cards. These are especially designed for Data Center Workloads.

Starting with CFD Kernels

  • Advection
    movement of some material (dissolved or suspended) in the fluid.
    As explained by Wikipedia, “advection is the transport of a substance or quantity by bulk motion. […] An example of advection is the transport of pollutants or silt in a river by bulk water flow downstream. […] The fluid’s motion is described mathematically as a vector field, and the transported material is described by a scalar field showing its distribution over space. […] In meteorology and physical oceanography, advection often refers to the transport of some property of the atmosphere or ocean, such as heat, humidity or salinity. Advection is important for the formation of orographic clouds and the precipitation of water from clouds, as part of the hydrological cycle.
    Description: first-order step of the non-linear iterative upwind advection MPDATA (Multidimensional Positive Definite Advection Transport Algorithm) schemes (non-oscillatory forward in time). Stencils are the basis for many algorithms to numerically solve partial differential equations (PDE) i.e. Numerical weather prediction (NWP) to solve advection transport problem in computational fluid dynamic (CFD) area. Forward-in-time techniques are standard in computational meteorology, particularly in the context of semi-Lagrangian schemes.
  • Pseudovelocity
    – approximation of the relative velocity
    Description: computation of the psuedovelocity for the second pass of upwind algorithm in MPDATA. This kernel is a key part of providing a second-order-accurate in non-linear advection schemes that suppress/reduce/control numerical oscillations characteristic of higher-order linear schemes.
  • Divergence
    – measures how much of fluid is flowing into (negative divergence) or out (positive divergence) of a certain point in a vector field.
    Description: divergence part of the matrix-free linear operator formulation in the iterative Krylov scheme. This algorithm is a part of linear systems via Krylov subspace methods. This method is widely used in variational data assimilation applications to NWP. Variational data assimilation is used at major weather prediction centers to produce the initial conditions for 7- to 10-day weather forecasts.
  • Thomas algorithm
    – simplified form of Gaussian elimination for tridiagonal system of equations (most common ones)
    Description: Tridiagonal Thomas algorithm for vertical matrix inversion inside preconditioner for the iterative solver. Preconditioner operates on the diagonal part of the full linear problem. Effective preconditioning lies at the heart of multiscale flow simulation, including a broad range of geoscientific applications that rely on semi-implicit integrations of the governing PDEs.


  • Xilinx Alveo U250 and U280 (powered by the Xilinx® UltraScale+™ FPGA) (accelerator)
  • Intel Xeon Gold 6148 and Intel Xeon Platinum 8168 CPUs (SkyLake) (host)

byteLAKE’s implementation advantages

Our version is customized to a real-life scenario and can be directly adapted to geophysical models such as Eulerian/semi-Lagrangian fluid solver (EULAG) designed to simulate the all-scale geophysical flows. For this reason, our kernels are extended by additional quantities as forces (implosion, explosion) and density vectors. In result it uses 9 matrices (3 velocities, 2 x scalar quantity, 2 x density, 2 x forces). We have also included the full configuration of the borders conditions (periodic, open). Unlike general frameworks, our implementation is highly optimized for real-life scenarios, including special use cases.


We will announce the performance results during the upcoming ISC’19 conference in Frankfurt, Germany (June 16-20, 2019). Contact us to book a dedicated demo session at:

byteLAKE and Xilinx bring CFD Apps to FPGA

MPDATA EULAG and CFD Kernels from byteLAKE

Meet us at ISC’19
to learn more!

White Paper (soon)