1e41f4b71Sopenharmony_ci# Using Neon Instructions
2e41f4b71Sopenharmony_ci
3e41f4b71Sopenharmony_ci
4e41f4b71Sopenharmony_ciArm Neon is an advanced Single Instruction Multiple Data (SIMD) architecture extension for Arm processors. It supports parallel processing of multiple pieces of data by using one instruction. It is widely used in fields such as multimedia encoding/decoding and 2D/3D graphics to improve execution performance.
5e41f4b71Sopenharmony_ci
6e41f4b71Sopenharmony_ci
7e41f4b71Sopenharmony_ciThe Neon extension is used since ARMv7. Currently, it is set as a default in Cortex-A7, Cortex-A12, and Cortex-A15 processors, but is optional in other ARMv7 Cortex-A series processors. For details, see [Introducing NEON Development Article](https://developer.arm.com/documentation/dht0002/a/Introducing-NEON/What-is-SIMD-/ARM-SIMD-instructions?lang=en).
8e41f4b71Sopenharmony_ci
9e41f4b71Sopenharmony_ci
10e41f4b71Sopenharmony_ciThe ARMv8-A processors integrate the Neon extension by default, which is supported in both AArch64 and AArch32. For details, see [Learn the architecture - Introducing Neon](https://developer.arm.com/documentation/102474/0100/Fundamentals-of-Armv8-Neon-technology).
11e41f4b71Sopenharmony_ci
12e41f4b71Sopenharmony_ci
13e41f4b71Sopenharmony_ci## Architecture Support in OpenHarmony
14e41f4b71Sopenharmony_ci
15e41f4b71Sopenharmony_ciIn OpenHarmony, the Neon extension is enabled by default in the arm64-v8a ABI. It is disabled by default in the armeabi-v7a ABI, in order to support as many ARMv7-A devices as possible.
16e41f4b71Sopenharmony_ci
17e41f4b71Sopenharmony_ciIn the LLVM toolchain of the OpenHarmony SDK, the armeabi-v7a ABI supports precompiled runtime libraries with many configurations. The directory structure is as follows. **native-root** is the root directory where the native package of the NDK is decompressed.
18e41f4b71Sopenharmony_ci
19e41f4b71Sopenharmony_ci```
20e41f4b71Sopenharmony_ci{native-root}/llvm/lib/clang/current/lib/arm-linux-ohos/
21e41f4b71Sopenharmony_ci    |-- a7_hard_neon-vfpv4
22e41f4b71Sopenharmony_ci    |       |-- clang_rt.crtbegin.o
23e41f4b71Sopenharmony_ci    |       |-- clang_rt.crtend.o
24e41f4b71Sopenharmony_ci    |       |-- ...
25e41f4b71Sopenharmony_ci    |
26e41f4b71Sopenharmony_ci    |-- a7_soft
27e41f4b71Sopenharmony_ci    |       |-- clang_rt.crtbegin.o
28e41f4b71Sopenharmony_ci    |       |-- clang_rt.crtend.o
29e41f4b71Sopenharmony_ci    |       |-- ...
30e41f4b71Sopenharmony_ci    |
31e41f4b71Sopenharmony_ci    |-- a7_softfp_neon-vfpv4
32e41f4b71Sopenharmony_ci            |-- clang_rt.crtbegin.o
33e41f4b71Sopenharmony_ci            |-- clang_rt.crtend.o
34e41f4b71Sopenharmony_ci            |-- ...
35e41f4b71Sopenharmony_ci```
36e41f4b71Sopenharmony_ci
37e41f4b71Sopenharmony_ci**hard**, **soft**, and **softfp** are float-abi. If they are not specified, **softfp** is used by default. **neon-vfpv4** is the parameter type specified by **-mfpu**. The LLVM toolchain selects binary libraries that depend on different architecture configurations based on the compilation parameters.
38e41f4b71Sopenharmony_ci
39e41f4b71Sopenharmony_ci
40e41f4b71Sopenharmony_ci## How to Use
41e41f4b71Sopenharmony_ci
42e41f4b71Sopenharmony_ciThe Neon extension can be used in the following ways:
43e41f4b71Sopenharmony_ci
44e41f4b71Sopenharmony_ci- Use the Auto-Vectorization feature of LLVM. The compiler generates instructions. This feature is enabled by default and can be disabled by running **-fno-vectorize**. For details, see [Auto-Vectorization in LLVM](https://llvm.org/docs/Vectorizers.html).
45e41f4b71Sopenharmony_ci
46e41f4b71Sopenharmony_ci- Use the Neon intrinsics library, which gives you direct, low-level access to Neon instructions.
47e41f4b71Sopenharmony_ci
48e41f4b71Sopenharmony_ci- Write Neon assembly instructions.
49e41f4b71Sopenharmony_ci
50e41f4b71Sopenharmony_ciFor details, see [Arm Neon](https://developer.arm.com/Architectures/Neon).
51e41f4b71Sopenharmony_ci
52e41f4b71Sopenharmony_ci
53e41f4b71Sopenharmony_ci## Example
54e41f4b71Sopenharmony_ci
55e41f4b71Sopenharmony_ciThe following example describes how to use Neon intrinsics in an armeabi-v7a OpenHarmony C++ project.
56e41f4b71Sopenharmony_ci
57e41f4b71Sopenharmony_ci1. Include the **arm_neon.h** header file in the source code. The Neon intrinsics are closely related to the CPU architecture. Therefore, you are advised to include this header file in macros such as **cpu_features_macros**.
58e41f4b71Sopenharmony_ci
59e41f4b71Sopenharmony_ci   ```c++
60e41f4b71Sopenharmony_ci   #include "cpu_features_macros.h"
61e41f4b71Sopenharmony_ci   void call_neon_intrinsics(short *output, const short* input, const short* kernel, int width, int kernelSize)
62e41f4b71Sopenharmony_ci   {
63e41f4b71Sopenharmony_ci      int nn, offset = -kernelSize/2;
64e41f4b71Sopenharmony_ci      for (nn = 0; nn < width; nn++)
65e41f4b71Sopenharmony_ci      {
66e41f4b71Sopenharmony_ci           int mm, sum = 0;
67e41f4b71Sopenharmony_ci           int32x4_t sum_vec = vdupq_n_s32(0); // Neon intrinsics
68e41f4b71Sopenharmony_ci           for(mm = 0; mm < kernelSize/4; mm++)
69e41f4b71Sopenharmony_ci           {
70e41f4b71Sopenharmony_ci               int16x4_t  kernel_vec = vld1_s16(kernel + mm*4);
71e41f4b71Sopenharmony_ci               int16x4_t  input_vec = vld1_s16(input + (nn+offset+mm*4));
72e41f4b71Sopenharmony_ci               sum_vec = vmlal_s16(sum_vec, kernel_vec, input_vec);
73e41f4b71Sopenharmony_ci           }
74e41f4b71Sopenharmony_ci           ...
75e41f4b71Sopenharmony_ci      }
76e41f4b71Sopenharmony_ci      ...
77e41f4b71Sopenharmony_ci   }
78e41f4b71Sopenharmony_ci   ```
79e41f4b71Sopenharmony_ci
80e41f4b71Sopenharmony_ci2. Call the corresponding implementation functions based on the CPU feature.
81e41f4b71Sopenharmony_ci   ```c++
82e41f4b71Sopenharmony_ci   void Compute(void) {
83e41f4b71Sopenharmony_ci   #if defined (CPU_FEATURES_ARCH_ARM)
84e41f4b71Sopenharmony_ci     static const ArmFeatures features = GetArmInfo().features;
85e41f4b71Sopenharmony_ci     // Determine whether the CPU features are supported based on the features field.
86e41f4b71Sopenharmony_ci     if (features.neon) {
87e41f4b71Sopenharmony_ci       // Run optimized code.
88e41f4b71Sopenharmony_ci     } else {
89e41f4b71Sopenharmony_ci       // Call normal functions written in C.
90e41f4b71Sopenharmony_ci     }
91e41f4b71Sopenharmony_ci   #endif
92e41f4b71Sopenharmony_ci   }
93e41f4b71Sopenharmony_ci   ```
94e41f4b71Sopenharmony_ci
95e41f4b71Sopenharmony_ci3. Add the corresponding options to the **CMakeLists.txt** file.
96e41f4b71Sopenharmony_ci   ```makefile
97e41f4b71Sopenharmony_ci   if (${OHOS_ARCH} STREQUAL "armeabi-v7a")
98e41f4b71Sopenharmony_ci       set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mfpu=neon -mfloat-abi=softfp")
99e41f4b71Sopenharmony_ci   endif ()
100e41f4b71Sopenharmony_ci   ```
101e41f4b71Sopenharmony_ci
102e41f4b71Sopenharmony_ciNow you can use Neon intrinsics in your project.
103e41f4b71Sopenharmony_ci
104