-
Notifications
You must be signed in to change notification settings - Fork 0
SIMD
- Deactivate any optimisation flag, such as
-O3
,-O2
and so on. - Check the OS architecture. Are you both working on 64 or 32 bits?
- Protect your asm code with
volatile
like this:
__asm__ __volatile__(
/* Your code here */
);
- Make sure the path is correct, it should point to
gdb.exe
or a similar file. This setting is located inSettings > Debugger... > Default > Executable path
- Make sure you don't have accented characters in you project path, such as
éèêçàâ
, etc. - To have more info on why the debugging does not work, enable full logs in
Settings > > Debugger... > Common > Full (Debug) log
- Make sure you're compiling with the
-g
flag. - If you have multiple installations of MinGW, uninstall all those not necessary.
Check that you read/write in binary by specifying rb
and wb
in fread
and fwrite
.
On Windows, "Exception code c0000005
is the code for an access violation".
Most probable reasons:
-
You are using an instruction that reads 128 bits of aligned memory, but the operand you provide is unaligned. Use a memory operand that is aligned or use an instruction that do not require aligned memory
-
You are reading/writing outside of allocated memory (eg. you have an array of 3 elements, and you read/write to array[3])
-
You are using incorrect register size, try using 32-bits (
e*
) instead of 64-bits (r*
) registers. -
Don't use statically allocated arrays, prefer dynamically allocated ones using
malloc
.
Check that the compilation flags are not the culprits here. Disable them all and check their influence one by one. Don't forget to clean your build between each compilation.
Also, in CodeBlocks, "Optimize even more (for speed) [-O2]" and "-O2" are not exactly the same. The first one applies the flag at an earlier stage than the second, sometimes leading to malfunctions in the compiled executable.
The list of available instructions and their description can be found at: https://www.felixcloutier.com/x86/
Two different types of registers should be used in this lab:
They must be used in the case of SIMD instructions.
- Register names should be preceded by %%
- Immediates should be preceded by $
The output line will hold any variable to which you will write. The input line will hold any variable that you will read.
"mov %[in], %%rsi\n"
"mov %[out], %%rax\n"
"mov %[l], %%rcx;\n"
"movdqu (%%rsi), %%xmm7\n"
"movdqu %%xmm7, (%%rax)\n"
"add $16, %%rsi\n"
"sub $16, %%rcx\n"
://outputs
:[in]"m" (inbuffer), [out]"m" (outbuffer), [l]"m" (length) //inputs
: "rax", "rsi", "rcx", "xmm7" //clobbers
In the above code snippet, we read three variables and thus need to add them on the second line.
However, even though we have an outbuffer
variable, we do not write into it directly.
We use the address it contains to write into the memory directly.
This slight difference is crucial.
Let's say you want to access a value that is 1024 bytes further than your base address stored in register esi
.
The generic syntax is the following: signed-offset(base,index,scale)
in which some elements can be omitted.
There are thus basically two ways to offset an address:
- Using a constant value:
1024(%esi)
- Using another register containing the offset:
(%esi, %eax)
You can find more details here
Here is a Python script aiming at converting from one syntax to the other.
convert -size 512x512 -depth 8 Angela_512x512.jpg gray:Angela_512x512.raw
convert -size 1024x1024 -depth 8 gray:escher.raw escher.png
For a "debug mode" compilation, use -g
. For a "release mode" compilation, use -02 -s
.
You can go fancy and try other flags as well if you want to benchmark their impact on the performance.