gcc neon assembler

pas059 · 07-16-2018, 08:18 AM

Hi,

When i add in-line neon assembly code in a C++ program, i get the error: "Unknown mnemonic".

My version of gcc is: gcc (Ubuntu/Linaro 7.3.0-16ubuntu3) 7.3.0.

Someone can help me?

regards

ab1jx · 07-21-2018, 06:42 AM

The GCC manual is something like 900 pages but there are tons of commandline switches, you probably need an -march or something. https://gcc.gnu.org/onlinedocs/gcc/index...C_Contents I thought I saw something about NEON in there last time I was looking for something else.

pas059 · (This post was last modified: 07-21-2018, 08:54 AM by pas059.)

hi,
i had tested with this option: '-march=armv8-a+simd neon2.c' (or with arm8.1/2/3/4...) but this gives always the same errors like:

/tmp/cc6QtiJV.s:22: Error: unknown mnemonic `vadd.i16' -- `vadd.i16 q0,q1,q2'

The strange thing is that using intrinsics works!

any idea?
regards

z4v4l · 07-21-2018, 01:22 PM

(07-21-2018, 08:52 AM)pas059 Wrote: hi,
i had tested with this option: '-march=armv8-a+simd neon2.c' (or with arm8.1/2/3/4...) but this gives always the same errors like:

/tmp/cc6QtiJV.s:22: Error: unknown mnemonic `vadd.i16' -- `vadd.i16 q0,q1,q2'

The strange thing is that using intrinsics works!

any idea?
regards

are you using aarch32 compilers? what bitness your code targets?

pas059 · (This post was last modified: 07-23-2018, 01:49 AM by pas059.)

hi,
as i specify armv8... as march option, i assume that this an arch64 that is used.
gcc comes with ayufan's ubuntu 18.04 image, so i assume that this arch64 is the default ($gcc -dumpmachine gives aarch64-linux-gnu).
regards

z4v4l · (This post was last modified: 07-23-2018, 07:32 AM by z4v4l.)

(07-23-2018, 01:47 AM)pas059 Wrote: hi,
as i specify armv8... as march option, i assume that this an arch64 that is used.
gcc comes with ayufan's ubuntu 18.04 image, so i assume that this arch64 is the default ($gcc -dumpmachine gives aarch64-linux-gnu).
regards

the instructions you showed are NOT a64 simd instructions, it's a32 simd instructions. Smile

take a look at ARM ARM, where everything is described. And there is an alphabetical list of appropriate instructions too.

pas059 · 07-23-2018, 11:37 AM

hi,
i didn't know until today that Arm had changed the mnemonics between ARMv7 and ARMv8 Blush

. In all the documents i have (and which are about intrinsics) all the equivalences are given for ARMv7. Indeed, using ARMv8 mnemonics, this compiles better Blush

. thanks you z4v4l

z4v4l · 07-23-2018, 12:02 PM

(07-23-2018, 11:37 AM)pas059 Wrote: hi,
i didn't know until today that Arm had changed the mnemonics between ARMv7 and ARMv8 . In all the documents i have (and which are about intrinsics) all the equivalences are given for ARMv7. Indeed, using ARMv8 mnemonics, this compiles better . thanks you z4v4l

it's not just a mnemonics change, it's a totally different ISA. Smile

pas059 · (This post was last modified: 08-14-2018, 10:14 AM by pas059.)

Hi,
just some news.
So, i rewrote some functions from Neon/intrinsics to Neon/assembler in the hope of a performances improvment. Just after finishing the translations intrinsics/assembler, i was very enthousiast Smile

, because the assembler version was running 4 time faster than the intrinsic, but in fact this result was obtained with the debug versions. With the release versions, the results were quasi identical, and perhaps even more that the intrinsic versions runs a little faster than the assembler Big Grin

, but, honestly, there is no significant gap. The only noticable diffenrece is on the code size which is 3 times more important with the intrinsic version.
So, my conclusion, in my cases, is that the compiler generates very fast code using intrinsic.

Other remark, this algorithm (image processing), initially written on a PC/Windows,runs in less than one 1 msec on 1 core of an i7-4790 processor at 3.6GHz, and it takes ~4msec on 1 core of my rock64 (the ARM/Neon version is a little more optimzed than the intel version which also uses intrinsics). At the beginning, i thought that the difference will be more important, but, thanks to Neon, the final result is a little better than expected, and a rock64 is much cheap than an intel solution, and consumes much less power.
I think that something which could increases the performance of the Rock64 will be a faster memory; the one of the rock64 works in 32bits, altought the processor supports 64 bits access. Smile

regards

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	no neon?	ab1jx	5	9,545	01-04-2019, 11:16 AM Last Post: ab1jx

Login




Remember me Lost Password?

About Us