1e41f4b71Sopenharmony_ci# Using Neon Instructions 2e41f4b71Sopenharmony_ci 3e41f4b71Sopenharmony_ci 4e41f4b71Sopenharmony_ciArm Neon is an advanced Single Instruction Multiple Data (SIMD) architecture extension for Arm processors. It supports parallel processing of multiple pieces of data by using one instruction. It is widely used in fields such as multimedia encoding/decoding and 2D/3D graphics to improve execution performance. 5e41f4b71Sopenharmony_ci 6e41f4b71Sopenharmony_ci 7e41f4b71Sopenharmony_ciThe Neon extension is used since ARMv7. Currently, it is set as a default in Cortex-A7, Cortex-A12, and Cortex-A15 processors, but is optional in other ARMv7 Cortex-A series processors. For details, see [Introducing NEON Development Article](https://developer.arm.com/documentation/dht0002/a/Introducing-NEON/What-is-SIMD-/ARM-SIMD-instructions?lang=en). 8e41f4b71Sopenharmony_ci 9e41f4b71Sopenharmony_ci 10e41f4b71Sopenharmony_ciThe ARMv8-A processors integrate the Neon extension by default, which is supported in both AArch64 and AArch32. For details, see [Learn the architecture - Introducing Neon](https://developer.arm.com/documentation/102474/0100/Fundamentals-of-Armv8-Neon-technology). 11e41f4b71Sopenharmony_ci 12e41f4b71Sopenharmony_ci 13e41f4b71Sopenharmony_ci## Architecture Support in OpenHarmony 14e41f4b71Sopenharmony_ci 15e41f4b71Sopenharmony_ciIn OpenHarmony, the Neon extension is enabled by default in the arm64-v8a ABI. It is disabled by default in the armeabi-v7a ABI, in order to support as many ARMv7-A devices as possible. 16e41f4b71Sopenharmony_ci 17e41f4b71Sopenharmony_ciIn the LLVM toolchain of the OpenHarmony SDK, the armeabi-v7a ABI supports precompiled runtime libraries with many configurations. The directory structure is as follows. **native-root** is the root directory where the native package of the NDK is decompressed. 18e41f4b71Sopenharmony_ci 19e41f4b71Sopenharmony_ci``` 20e41f4b71Sopenharmony_ci{native-root}/llvm/lib/clang/current/lib/arm-linux-ohos/ 21e41f4b71Sopenharmony_ci |-- a7_hard_neon-vfpv4 22e41f4b71Sopenharmony_ci | |-- clang_rt.crtbegin.o 23e41f4b71Sopenharmony_ci | |-- clang_rt.crtend.o 24e41f4b71Sopenharmony_ci | |-- ... 25e41f4b71Sopenharmony_ci | 26e41f4b71Sopenharmony_ci |-- a7_soft 27e41f4b71Sopenharmony_ci | |-- clang_rt.crtbegin.o 28e41f4b71Sopenharmony_ci | |-- clang_rt.crtend.o 29e41f4b71Sopenharmony_ci | |-- ... 30e41f4b71Sopenharmony_ci | 31e41f4b71Sopenharmony_ci |-- a7_softfp_neon-vfpv4 32e41f4b71Sopenharmony_ci |-- clang_rt.crtbegin.o 33e41f4b71Sopenharmony_ci |-- clang_rt.crtend.o 34e41f4b71Sopenharmony_ci |-- ... 35e41f4b71Sopenharmony_ci``` 36e41f4b71Sopenharmony_ci 37e41f4b71Sopenharmony_ci**hard**, **soft**, and **softfp** are float-abi. If they are not specified, **softfp** is used by default. **neon-vfpv4** is the parameter type specified by **-mfpu**. The LLVM toolchain selects binary libraries that depend on different architecture configurations based on the compilation parameters. 38e41f4b71Sopenharmony_ci 39e41f4b71Sopenharmony_ci 40e41f4b71Sopenharmony_ci## How to Use 41e41f4b71Sopenharmony_ci 42e41f4b71Sopenharmony_ciThe Neon extension can be used in the following ways: 43e41f4b71Sopenharmony_ci 44e41f4b71Sopenharmony_ci- Use the Auto-Vectorization feature of LLVM. The compiler generates instructions. This feature is enabled by default and can be disabled by running **-fno-vectorize**. For details, see [Auto-Vectorization in LLVM](https://llvm.org/docs/Vectorizers.html). 45e41f4b71Sopenharmony_ci 46e41f4b71Sopenharmony_ci- Use the Neon intrinsics library, which gives you direct, low-level access to Neon instructions. 47e41f4b71Sopenharmony_ci 48e41f4b71Sopenharmony_ci- Write Neon assembly instructions. 49e41f4b71Sopenharmony_ci 50e41f4b71Sopenharmony_ciFor details, see [Arm Neon](https://developer.arm.com/Architectures/Neon). 51e41f4b71Sopenharmony_ci 52e41f4b71Sopenharmony_ci 53e41f4b71Sopenharmony_ci## Example 54e41f4b71Sopenharmony_ci 55e41f4b71Sopenharmony_ciThe following example describes how to use Neon intrinsics in an armeabi-v7a OpenHarmony C++ project. 56e41f4b71Sopenharmony_ci 57e41f4b71Sopenharmony_ci1. Include the **arm_neon.h** header file in the source code. The Neon intrinsics are closely related to the CPU architecture. Therefore, you are advised to include this header file in macros such as **cpu_features_macros**. 58e41f4b71Sopenharmony_ci 59e41f4b71Sopenharmony_ci ```c++ 60e41f4b71Sopenharmony_ci #include "cpu_features_macros.h" 61e41f4b71Sopenharmony_ci void call_neon_intrinsics(short *output, const short* input, const short* kernel, int width, int kernelSize) 62e41f4b71Sopenharmony_ci { 63e41f4b71Sopenharmony_ci int nn, offset = -kernelSize/2; 64e41f4b71Sopenharmony_ci for (nn = 0; nn < width; nn++) 65e41f4b71Sopenharmony_ci { 66e41f4b71Sopenharmony_ci int mm, sum = 0; 67e41f4b71Sopenharmony_ci int32x4_t sum_vec = vdupq_n_s32(0); // Neon intrinsics 68e41f4b71Sopenharmony_ci for(mm = 0; mm < kernelSize/4; mm++) 69e41f4b71Sopenharmony_ci { 70e41f4b71Sopenharmony_ci int16x4_t kernel_vec = vld1_s16(kernel + mm*4); 71e41f4b71Sopenharmony_ci int16x4_t input_vec = vld1_s16(input + (nn+offset+mm*4)); 72e41f4b71Sopenharmony_ci sum_vec = vmlal_s16(sum_vec, kernel_vec, input_vec); 73e41f4b71Sopenharmony_ci } 74e41f4b71Sopenharmony_ci ... 75e41f4b71Sopenharmony_ci } 76e41f4b71Sopenharmony_ci ... 77e41f4b71Sopenharmony_ci } 78e41f4b71Sopenharmony_ci ``` 79e41f4b71Sopenharmony_ci 80e41f4b71Sopenharmony_ci2. Call the corresponding implementation functions based on the CPU feature. 81e41f4b71Sopenharmony_ci ```c++ 82e41f4b71Sopenharmony_ci void Compute(void) { 83e41f4b71Sopenharmony_ci #if defined (CPU_FEATURES_ARCH_ARM) 84e41f4b71Sopenharmony_ci static const ArmFeatures features = GetArmInfo().features; 85e41f4b71Sopenharmony_ci // Determine whether the CPU features are supported based on the features field. 86e41f4b71Sopenharmony_ci if (features.neon) { 87e41f4b71Sopenharmony_ci // Run optimized code. 88e41f4b71Sopenharmony_ci } else { 89e41f4b71Sopenharmony_ci // Call normal functions written in C. 90e41f4b71Sopenharmony_ci } 91e41f4b71Sopenharmony_ci #endif 92e41f4b71Sopenharmony_ci } 93e41f4b71Sopenharmony_ci ``` 94e41f4b71Sopenharmony_ci 95e41f4b71Sopenharmony_ci3. Add the corresponding options to the **CMakeLists.txt** file. 96e41f4b71Sopenharmony_ci ```makefile 97e41f4b71Sopenharmony_ci if (${OHOS_ARCH} STREQUAL "armeabi-v7a") 98e41f4b71Sopenharmony_ci set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mfpu=neon -mfloat-abi=softfp") 99e41f4b71Sopenharmony_ci endif () 100e41f4b71Sopenharmony_ci ``` 101e41f4b71Sopenharmony_ci 102e41f4b71Sopenharmony_ciNow you can use Neon intrinsics in your project. 103e41f4b71Sopenharmony_ci 104