Low-Level C++ Programming Tutorial

POD data structures, array operations, x86 inline assembly, vectorization, and SIMD with x86 intrinsics.

Introduction

Work through examples and then complete each exercise directly in Compiler Explorer. Keep optimizations enabled (`-O2`) and inspect generated behavior as you move from scalar code to SIMD.

Section 1: POD Data Structures

Plain Old Data (POD) types are contiguous, trivial, and efficient for low-level memory access.

Example

Opening Compiler Explorer example…

Exercise

Complete `subtract_vectors` and verify the return value is `12`.

Opening exercise workspace…
struct Vector3 {
    float x, y, z;
};

Vector3 subtract_vectors(Vector3 a, Vector3 b) {
    Vector3 result;
    result.x = a.x - b.x;
    result.y = a.y - b.y;
    result.z = a.z - b.z;
    return result;
}
Section 2: Array Operations

Manual array loops enable predictable memory access and can be optimized aggressively by the compiler.

Example

Opening Compiler Explorer example…

Exercise

Complete `max_array` so the program returns `4`.

Opening exercise workspace…
const int SIZE = 4;
float max_array(float arr[SIZE]) {
    float max_val = arr[0];
    for (int i = 1; i < SIZE; i++) {
        if (arr[i] > max_val) {
            max_val = arr[i];
        }
    }
    return max_val;
}
Section 3: x86 Intrinsics

Inline assembly exposes direct x86 instructions for precise control in critical hot paths.

Example

Opening Compiler Explorer example…

Exercise

Fill in the missing assembly line so `subtract(5, 3)` returns `2`.

Opening exercise workspace…
int subtract(int a, int b) {
    int result;
    __asm__ (
        "movl %1, %%eax;"
        "subl %2, %%eax;"
        "movl %%eax, %0;"
        : "=r"(result)
        : "r"(a), "r"(b)
        : "%eax"
    );
    return result;
}
Section 4: Vectorization with Arrays

Auto-vectorization relies on clean loops and explicit compiler hints like `#pragma omp simd`.

Example

Opening Compiler Explorer example…

Exercise

Add the vectorization pragma in `dot_product` so the program returns `30`.

Opening exercise workspace…
const int SIZE = 4;
float dot_product(float a[SIZE], float b[SIZE]) {
    float sum = 0.0;
    #pragma omp simd
    for (int i = 0; i < SIZE; i++) {
        sum += a[i] * b[i];
    }
    return sum;
}
Section 5: SIMD with x86 Intrinsics

SIMD intrinsics (`<xmmintrin.h>`) map directly to vector instructions such as `_mm_add_ps` and `_mm_mul_ps`.

Example

Opening Compiler Explorer example…

Exercise

Complete `_mm_mul_ps` usage so the output sum is `30`.

Opening exercise workspace…
#include <xmmintrin.h>

void multiply_arrays_simd(float* a, float* b, float* result, int n) {
    for (int i = 0; i < n; i += 4) {
        __m128 va = _mm_load_ps(&a[i]);
        __m128 vb = _mm_load_ps(&b[i]);
        __m128 vr = _mm_mul_ps(va, vb);
        _mm_store_ps(&result[i], vr);
    }
}