SIMD

Troubleshooting

The code works on some other computer, but not on mine

Deactivate any optimisation flag, such as -O3, -O2 and so on.
Check the OS architecture. Are you both working on 64 or 32 bits?
Protect your asm code with volatile like this:

__asm__ __volatile__(
  /* Your code here */
  );

The debugger does not work, what can I do?

Make sure the path is correct, it should point to gdb.exe or a similar file. This setting is located in Settings > Debugger... > Default > Executable path
Make sure you don't have accented characters in you project path, such as éèêçàâ, etc.
To have more info on why the debugging does not work, enable full logs in Settings > > Debugger... > Common > Full (Debug) log
Make sure you're compiling with the -g flag.
If you have multiple installations of MinGW, uninstall all those not necessary.

My image output is shifted at some point

Check that you read/write in binary by specifying rb and wb in fread and fwrite.

I get `Exception code c0000005`, what's up with that?

On Windows, "Exception code c0000005 is the code for an access violation".

Most probable reasons:

You are using an instruction that reads 128 bits of aligned memory, but the operand you provide is unaligned. Use a memory operand that is aligned or use an instruction that do not require aligned memory
You are reading/writing outside of allocated memory (eg. you have an array of 3 elements, and you read/write to array[3])
You are using incorrect register size, try using 32-bits (e*) instead of 64-bits (r*) registers.
Don't use statically allocated arrays, prefer dynamically allocated ones using malloc.

My code works in "Debug" mode, but not in "Release"

Check that the compilation flags are not the culprits here. Disable them all and check their influence one by one. Don't forget to clean your build between each compilation.

Also, in CodeBlocks, "Optimize even more (for speed) [-O2]" and "-O2" are not exactly the same. The first one applies the flag at an earlier stage than the second, sometimes leading to malfunctions in the compiled executable.

AT&T

Instruction list

The list of available instructions and their description can be found at: https://www.felixcloutier.com/x86/

Registers

Two different types of registers should be used in this lab:

1. 32-bits general-purpose registers:

2. 128 bits xmm registers (xmm0 to xmm7):

They must be used in the case of SIMD instructions.

Syntax

Register names should be preceded by %%
Immediates should be preceded by $

In the clobbers, what should we put in the `output` and `input` lines?

The output line will hold any variable to which you will write. The input line will hold any variable that you will read.

    "mov %[in], %%rsi\n"
    "mov %[out], %%rax\n"
    "mov %[l], %%rcx;\n"
    "movdqu (%%rsi), %%xmm7\n"
    "movdqu %%xmm7, (%%rax)\n"
    "add $16, %%rsi\n"
    "sub $16, %%rcx\n"
    ://outputs
    :[in]"m" (inbuffer), [out]"m" (outbuffer), [l]"m" (length) //inputs
    : "rax", "rsi", "rcx", "xmm7" //clobbers

In the above code snippet, we read three variables and thus need to add them on the second line. However, even though we have an outbuffer variable, we do not write into it directly. We use the address it contains to write into the memory directly. This slight difference is crucial.

How do I offset a pointer address?

Let's say you want to access a value that is 1024 bytes further than your base address stored in register esi. The generic syntax is the following: signed-offset(base,index,scale) in which some elements can be omitted. There are thus basically two ways to offset an address:

Using a constant value: 1024(%esi)
Using another register containing the offset: (%esi, %eax)

You can find more details here

Can I convert from AT&T to Intel and vice versa?

Here is a Python script aiming at converting from one syntax to the other.

Miscellaneous

How can I convert any image into the RAW format used in the labs?

convert -size 512x512 -depth 8 Angela_512x512.jpg gray:Angela_512x512.raw

How can I convert the RAW image into a PNG?

convert -size 1024x1024 -depth 8 gray:escher.raw escher.png

Which GCC compilation flags should we use?

For a "debug mode" compilation, use -g. For a "release mode" compilation, use -02 -s. You can go fancy and try other flags as well if you want to benchmark their impact on the performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD

Troubleshooting

The code works on some other computer, but not on mine

The debugger does not work, what can I do?

My image output is shifted at some point

I get `Exception code c0000005`, what's up with that?

My code works in "Debug" mode, but not in "Release"

AT&T

Instruction list

Registers

1. 32-bits general-purpose registers:

2. 128 bits xmm registers (xmm0 to xmm7):

Syntax

In the clobbers, what should we put in the `output` and `input` lines?

How do I offset a pointer address?

Can I convert from AT&T to Intel and vice versa?

Miscellaneous

How can I convert any image into the RAW format used in the labs?

How can I convert the RAW image into a PNG?

Which GCC compilation flags should we use?

Clone this wiki locally

SIMD

Troubleshooting

The code works on some other computer, but not on mine

The debugger does not work, what can I do?

My image output is shifted at some point

I get Exception code c0000005, what's up with that?

My code works in "Debug" mode, but not in "Release"

AT&T

Instruction list

Registers

1. 32-bits general-purpose registers:

2. 128 bits xmm registers (xmm0 to xmm7):

Syntax

In the clobbers, what should we put in the output and input lines?

How do I offset a pointer address?

Can I convert from AT&T to Intel and vice versa?

Miscellaneous

How can I convert any image into the RAW format used in the labs?

How can I convert the RAW image into a PNG?

Which GCC compilation flags should we use?

Clone this wiki locally

I get `Exception code c0000005`, what's up with that?

In the clobbers, what should we put in the `output` and `input` lines?