Task 02 — Maximum Intensity Projection

1 Introduction

Medical imaging modalities such as CT produce volumetric data that encode tissue density as three-dimensional grids of voxels. While slice extraction (Task 01) reveals structure one plane at a time, many clinical questions — especially those involving the vascular system — require a projection that compresses the entire volume into a single 2D image. Maximum Intensity Projection (MIP) achieves this by casting parallel rays through the volume and recording the brightest voxel along each ray.

MIP is particularly valuable for CT angiography and for assessing pneumonia severity in thoracic CT scans: once the lungs are segmented, MIP images of each lung reveal the vascular tree and abnormal masses for different viewing angles, enabling further automated analysis.

This report covers the two main components of the visualization pipeline. 2 introduces 3D wireframe rendering of the volume bounding box, including the forward transform, back-face culling, and 2D line rasterization. 3 builds on this geometric setup to describe the MIP ray-casting algorithm: inverse transforms, ray–AABB intersection via the slab method, 3D DDA traversal, and the two-step rendering architecture (heavy raycasting + fast recolorization). 4 embeds an interactive CPU-based viewer. Finally, 5 presents the GPU implementation using WebGL2 fragment shader ray marching, with actual GLSL code.

2 3D Wireframe Rendering

Before projecting voxel intensities, it is useful to render a wireframe of the volume’s bounding box. This provides spatial context and confirms that the viewing transform is correct. The wireframe pipeline has four stages: geometry construction, coordinate transformation, visibility culling, and line rasterization.

2.1 Scene Geometry

The bounding box of a volume with dimensions $(n_x, n_y, n_z)$ is the axis-aligned box $[0,\, n_x{-}1] \times [0,\, n_y{-}1] \times [0,\, n_z{-}1]$, defined by 8 vertices at its corners. The box has 6 faces, each described by:

A center point.
An outward normal vector $\hat{n}_f$.
Four edges, stored as pairs of vertex indices.

For instance, the $+x$ face has normal $\hat{n} = (1, 0, 0)$ and connects vertices $\{1, 3, 7, 5\}$; the $-x$ face has normal $(-1, 0, 0)$ and connects $\{0, 2, 6, 4\}$.

2.2 Forward Transform

The forward transform $\varphi$ maps volume coordinates to the MIP image plane. The output image is a square of side $d = \lceil\sqrt{n_x^2 + n_y^2 + n_z^2}\rceil$ (the volume diagonal, large enough to contain any rotation), and $h_d = d/2$. With center coordinates $c_x = n_x/2$, $c_y = n_y/2$, $c_z = n_z/2$:

\varphi = T(h_d, h_d, h_d) \cdot R_y(\beta) \cdot R_x(\alpha) \cdot T(-c_x, -c_y, -c_z), \tag{2.1}

where $\alpha$ is the tilt angle (rotation about $x$) and $\beta$ is the spin angle (rotation about $y$). The rotation matrices are:

R_x(\alpha) = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos\alpha & -\sin\alpha & 0 \\ 0 & \sin\alpha & \cos\alpha & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}, \quad R_y(\beta) = \begin{pmatrix} \cos\beta & 0 & \sin\beta & 0 \\ 0 & 1 & 0 & 0 \\ -\sin\beta & 0 & \cos\beta & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}. \tag{2.2}

Reading eq. (2.1) from right to left: the volume is first centered at the origin, then rotated to the desired viewing angle, and finally translated so that the origin maps to the image center $(h_d, h_d, h_d)$.

2.3 Back-Face Culling

Not all six faces are visible from a given viewpoint. With the virtual camera looking down the $-z$ axis, a face is visible if its transformed outward normal points toward the camera. Let $\varphi_r = R_y(\beta) \cdot R_x(\alpha)$ be the rotation-only part of eq. (2.1). The transformed normal is $\hat{n}_t = \varphi_r \cdot \hat{n}_f$, and the visibility condition is:

-n_{t,z} > 0. \tag{2.3}

This is the standard back-face culling test: if the $z$-component of the transformed normal points away from the camera (into the screen), the face is back-facing and should not be drawn. At most three faces are visible simultaneously.

2.4 DDA2D Line Rasterization

Visible edges are drawn on the image plane using the 2D Digital Differential Analyzer (DDA). Given endpoints $\mathbf{p}_1$ and $\mathbf{p}_n$, the algorithm steps one pixel at a time along the dominant axis:

Algorithm 1: \textsc{DDA2D}$(\mathbf{p}_1, \mathbf{p}_n)$

\STATE $\Delta x \gets p_{n,x} - p_{1,x}, \quad \Delta y \gets p_{n,y} - p_{1,y}$
\IF{$|\Delta x| \geq |\Delta y|$}
    \STATE $n \gets \lfloor|\Delta x|\rfloor + 1$
    \STATE $d_x \gets \text{sgn}(\Delta x), \quad d_y \gets d_x \cdot \Delta y / \Delta x$
\ELSE
    \STATE $n \gets \lfloor|\Delta y|\rfloor + 1$
    \STATE $d_y \gets \text{sgn}(\Delta y), \quad d_x \gets d_y \cdot \Delta x / \Delta y$
\ENDIF
\STATE $\mathbf{p} \gets \mathbf{p}_1$
\FOR{$k = 0$ \TO $n - 1$}
    \STATE \textbf{plot} $\lfloor\mathbf{p} + 0.5\rfloor$
    \STATE $\mathbf{p} \gets \mathbf{p} + (d_x, d_y)$
\ENDFOR

The step in the dominant axis is always $\pm 1$, ensuring every pixel along the line is visited. The subordinate axis advances by a fractional amount per step.

2.5 Wireframe Rendering Algorithm

Combining the above components, the complete wireframe algorithm is:

Algorithm 2: \textsc{DrawWireframe}$(\text{vol}, \alpha, \beta)$

\STATE Build scene geometry: 8 vertices, 6 faces with normals and edges
\STATE $\varphi \gets T(h_d) \cdot R_y(\beta) \cdot R_x(\alpha) \cdot T(-\mathbf{c})$
\STATE $\varphi_r \gets R_y(\beta) \cdot R_x(\alpha)$
\FOR{each face $f$}
    \STATE $\hat{n}_t \gets \varphi_r \cdot \hat{n}_f$
    \IF{$-n_{t,z} > 0$} \COMMENT{\textit{Face is visible}}
        \FOR{each edge $(v_1, v_2)$ of face $f$}
            \STATE $\mathbf{q}_1 \gets \varphi \cdot \mathbf{v}_1, \quad \mathbf{q}_2 \gets \varphi \cdot \mathbf{v}_2$
            \STATE \textsc{DDA2D}$(\mathbf{q}_{1,xy}, \mathbf{q}_{2,xy})$ \COMMENT{\textit{Project to 2D and rasterize}}
        \ENDFOR
    \ENDIF
\ENDFOR

3 Maximum Intensity Projection

With the geometric framework from 2 established, the MIP algorithm casts a ray through the volume for each pixel in the output image and records the maximum voxel intensity encountered. This section describes the ray setup, the inverse transform, the slab-based ray–AABB intersection, DDA3D traversal, and the complete MIP pipeline.

3.1 Ray Casting Setup

The MIP output is a $d \times d$ image (same size as the wireframe). For each pixel $(u, v)$, a ray is cast parallel to the viewing direction. In image space, the ray origin is $(u, v, -h_d)$ and the direction is $(0, 0, 1)$.

To find which voxels the ray traverses, we transform the ray back to volume space using the inverse of the forward transform.

3.2 Inverse Transform

The inverse of eq. (2.1) is:

\varphi^{-1} = T(c_x, c_y, c_z) \cdot R_x(-\alpha) \cdot R_y(-\beta) \cdot T(-h_d, -h_d, -h_d). \tag{3.1}

The viewing direction in volume space is obtained by rotating the image-space direction $(0, 0, 1)^T$:

\hat{n}' = R_x(-\alpha) \cdot R_y(-\beta) \cdot (0, 0, 1)^T. \tag{3.2}

This direction is constant for all pixels (parallel projection) and is precomputed once per frame.

3.3 Incremental Ray Origin

Rather than multiplying $\varphi^{-1}$ by each pixel position individually, the ray origin can be decomposed as:

\mathbf{o}(u, v) = \varphi^{-1} \cdot (u, v, -h_d, 1)^T = \mathbf{b} + u \cdot \Delta\mathbf{u} + v \cdot \Delta\mathbf{v}, \tag{3.3}

where $\mathbf{b} = \varphi^{-1} \cdot (0, 0, -h_d, 1)^T$ is the base origin, $\Delta\mathbf{u}$ is the first column of $\varphi^{-1}$, and $\Delta\mathbf{v}$ is the second column. This avoids a full matrix–vector multiplication per pixel: the inner loop only requires three additions per component.

3.4 Ray–AABB Intersection (Slab Method)

Each ray must be clipped to the volume’s axis-aligned bounding box (AABB) $[0, n_x{-}1] \times [0, n_y{-}1] \times [0, n_z{-}1]$. The slab method computes entry and exit parameters $t$ for each axis. For the $x$-axis:

t_{x,1} = \frac{0 - o_x}{n'_x}, \qquad t_{x,2} = \frac{(n_x - 1) - o_x}{n'_x}, \tag{3.4}

and analogously for $y$ and $z$. After sorting each pair so that $t_{i,1} \leq t_{i,2}$, the ray’s valid interval inside the AABB is:

t_{\min} = \max(t_{x,1},\, t_{y,1},\, t_{z,1}), \qquad t_{\max} = \min(t_{x,2},\, t_{y,2},\, t_{z,2}). \tag{3.5}

If $t_{\min} \geq t_{\max}$ or $t_{\max} \leq 0$, the ray misses the volume entirely. The entry and exit points are $\mathbf{e} = \mathbf{o} + t_{\min} \hat{n}'$ and $\mathbf{x} = \mathbf{o} + t_{\max} \hat{n}'$.

3.5 DDA3D Ray Traversal

Once the entry and exit points are known, the ray is traversed voxel by voxel using the 3D DDA — a natural extension of alg. 1 to three dimensions:

Algorithm 3: \textsc{DDA3D}$(\mathbf{p}_1, \mathbf{p}_n)$

\STATE $\Delta x \gets p_{n,x} - p_{1,x}, \quad \Delta y \gets p_{n,y} - p_{1,y}, \quad \Delta z \gets p_{n,z} - p_{1,z}$
\STATE Let $D^* = \arg\max(|\Delta x|, |\Delta y|, |\Delta z|)$ be the dominant axis
\STATE $n \gets \lfloor|D^*|\rfloor + 1$
\STATE $d^* \gets \text{sgn}(D^*)$; scale other increments proportionally
\STATE $\mathbf{p} \gets \mathbf{p}_1$
\FOR{$k = 0$ \TO $n - 1$}
    \STATE Process voxel at $\text{clamp}(\lfloor\mathbf{p} + 0.5\rfloor, \mathbf{0}, \mathbf{n}{-}\mathbf{1})$
    \STATE $\mathbf{p} \gets \mathbf{p} + (d_x, d_y, d_z)$
\ENDFOR

The clamp operation replaces explicit bounds checking with a branchless operation, which is more efficient on modern CPUs.

3.6 MIP Algorithm

Combining all the pieces, the full MIP computation is:

Algorithm 4: \textsc{ComputeMIP}$(\text{vol}, \alpha, \beta, \text{mask})$

\STATE $d \gets \lceil\sqrt{n_x^2 + n_y^2 + n_z^2}\rceil, \quad h_d \gets d/2$
\STATE Compute $\varphi^{-1}$, viewing direction $\hat{n}'$
\STATE Decompose: $\mathbf{b} \gets \varphi^{-1} \cdot (0, 0, -h_d, 1)^T, \quad \Delta\mathbf{u} \gets \varphi^{-1}_{:,0}, \quad \Delta\mathbf{v} \gets \varphi^{-1}_{:,1}$
\STATE Precompute inverse direction: $1/n'_x, \; 1/n'_y, \; 1/n'_z$
\STATE Allocate output image $I$ of size $d \times d$
\FOR{$v = 0$ \TO $d - 1$}
    \STATE $\mathbf{o} \gets \mathbf{b} + v \cdot \Delta\mathbf{v}$
    \FOR{$u = 0$ \TO $d - 1$}
        \STATE Slab intersection: compute $t_{\min}, t_{\max}$ for ray at $\mathbf{o}$ with direction $\hat{n}'$
        \IF{ray misses volume}
            \STATE $\mathbf{o} \gets \mathbf{o} + \Delta\mathbf{u}$; \textbf{continue}
        \ENDIF
        \STATE $\mathbf{e} \gets \mathbf{o} + t_{\min}\hat{n}', \quad \mathbf{x} \gets \mathbf{o} + t_{\max}\hat{n}'$
        \STATE $I_{\max} \gets 0$
        \FOR{each voxel $\mathbf{p}$ along \textsc{DDA3D}$(\mathbf{e}, \mathbf{x})$}
            \IF{mask $\neq$ null and mask$[\mathbf{p}] = 0$}
                \STATE \textbf{continue}
            \ENDIF
            \STATE $I_{\max} \gets \max(I_{\max},\; \text{vol}[\mathbf{p}])$
        \ENDFOR
        \STATE $I[u, v] \gets I_{\max}$
        \STATE $\mathbf{o} \gets \mathbf{o} + \Delta\mathbf{u}$
    \ENDFOR
\ENDFOR
\RETURN $I, \; \min(I), \; \max(I)$

3.7 Window/Level and Colorization

After computing the raw MIP image, the same window/level transformation described in Task 01 is applied:

$$l_1 = \max\!\left(L - \tfrac{W}{2},\; v_{\min}\right), \qquad l_2 = \min\!\left(L + \tfrac{W}{2},\; v_{\max}\right),$$

followed by linear stretching to $[0, 255]$ and colormap lookup. The implementation uses a two-step architecture: the expensive raycasting (alg. 4) is cached, and adjusting window/level or switching colormaps only triggers the fast recolorization pass. During interactive rotation, the MIP is computed at half resolution (scale = 2) for responsiveness, then recomputed at full resolution on pointer release.

4 Interactive Viewer

The viewer below is a WebAssembly-based application that computes MIP projections in real time. It supports drag-to-rotate (tilt and spin), window/level adjustment, wireframe overlay toggle, optional object masks, and four colormaps (Grayscale, Blue-to-Red, Viridis, Magma) — all the concepts described in the preceding sections.

CPU MIP Viewer

Interactive viewer for .scn volumetric data. Drag to rotate, adjust window/level and colormap. Optionally load a mask to restrict the projection to a segmented region.

5 Appendix A: GPU Implementation

5.1 Overview

The CPU implementation in 3 performs raycasting in C++ compiled to WebAssembly. An alternative approach offloads the entire computation to the GPU using WebGL2 fragment shaders. Instead of iterating over pixels in a double loop, the GPU rasterizes a cube proxy geometry and runs a ray-marching shader per fragment in parallel.

The key architectural differences are:

Volume data is uploaded as a 3D texture (TEXTURE_3D), sampled with hardware-accelerated trilinear interpolation.
Colormap is stored as a 1D texture (256-entry RGBA lookup), applied in the shader.
Rendering is immediate — all parameters (rotation, window/level, colormap) are GPU uniforms, so every change triggers a single draw call with no CPU-side raycasting.

5.2 View Transform

The GPU works in normalized texture coordinates $[0, 1]^3$. The model matrix transforms the unit cube to world space:

M = R_y(\beta) \cdot R_x(\alpha) \cdot S(s_x, s_y, s_z) \cdot T(-0.5, -0.5, -0.5), \tag{5.1}

where $s_i = n_i / \max(n_x, n_y, n_z)$ normalizes the voxel dimensions. An orthographic projection $P$ is sized to contain the bounding sphere: $h_d = 0.55\sqrt{s_x^2 + s_y^2 + s_z^2}$, with bounds $[\!-h_d, h_d]$ in all three axes. The MVP matrix is $P \cdot M$.

The ray direction in texture space is:

\hat{r}_{\text{tex}} = \text{normalize}\!\left(S^{-1} \cdot R_x(-\alpha) \cdot R_y(-\beta) \cdot (0, 0, -1)^T\right). \tag{5.2}

5.3 Fragment Shader

The MIP fragment shader receives the interpolated cube position (the exit point, since front faces are culled) and marches backward along the ray:

#version 300 es
precision highp float;
precision highp sampler3D;

uniform sampler3D u_volume;
uniform sampler3D u_mask;
uniform sampler2D u_colormap;
uniform ivec3 u_dims;
uniform vec3 u_ray_dir;
uniform float u_window_lo;
uniform float u_window_hi;
uniform bool u_use_mask;

in vec3 v_pos;
out vec4 frag_color;

vec2 intersect_box(vec3 orig, vec3 dir) {
    vec3 inv = 1.0 / dir;
    vec3 t0 = (vec3(0.0) - orig) * inv;
    vec3 t1 = (vec3(1.0) - orig) * inv;
    vec3 tmin = min(t0, t1);
    vec3 tmax = max(t0, t1);
    return vec2(max(tmin.x, max(tmin.y, tmin.z)),
                min(tmax.x, min(tmax.y, tmax.z)));
}

float wang_hash(int seed) {
    seed = (seed ^ 61) ^ (seed >> 16);
    seed *= 9;
    seed = seed ^ (seed >> 4);
    seed *= 0x27d4eb2d;
    seed = seed ^ (seed >> 15);
    return float(seed % 2147483647) / float(2147483647);
}

void main() {
    vec3 rd = normalize(u_ray_dir);
    vec3 origin = v_pos - 2.0 * rd;
    vec2 t_hit = intersect_box(origin, rd);
    if (t_hit.x >= t_hit.y) discard;
    t_hit.x = max(t_hit.x, 0.0);

    // Adaptive step size: at least one sample per voxel
    vec3 dt_vec = 1.0 / (vec3(u_dims) * abs(rd));
    float dt = min(dt_vec.x, min(dt_vec.y, dt_vec.z));

    // Stochastic jitter to reduce banding artifacts
    float offset = wang_hash(int(gl_FragCoord.x + 640.0 * gl_FragCoord.y));

    float max_val = 0.0;
    vec3 p = origin + (t_hit.x + offset * dt) * rd;

    for (float t = t_hit.x; t < t_hit.y; t += dt) {
        if (u_use_mask && texture(u_mask, p).r < 0.5) {
            p += rd * dt;
            continue;
        }
        max_val = max(max_val, texture(u_volume, p).r);
        p += rd * dt;
    }

    // Window/level remap + colormap lookup
    float mapped = clamp(
        (max_val - u_window_lo) / (u_window_hi - u_window_lo + 1e-6),
        0.0, 1.0
    );
    frag_color = texture(u_colormap, vec2(mapped, 0.5));
}

Key design choices:

intersect_box implements the slab method (compare eq. (3.4) and eq. (3.5)) in three lines of GLSL vector math.
Adaptive step size $\Delta t = \min(1/(n_i |r_i|))$ ensures at least one texture sample per voxel along any axis.
Stochastic jitter via the Wang hash offsets the ray starting position by a random fraction of $\Delta t$, reducing the banding artifacts that arise from regular sampling.
Mask support skips samples where the mask texture reads below 0.5.

5.4 Wireframe Pass

The wireframe is drawn in a second pass using GL_LINES with the 12 edges of the unit cube (24 vertices). Back-face culling is disabled so all edges are visible:

// Vertex shader
#version 300 es
layout(location=0) in vec3 a_pos;
uniform mat4 u_mvp;
void main() { gl_Position = u_mvp * vec4(a_pos, 1.0); }

// Fragment shader
#version 300 es
precision highp float;
uniform vec3 u_color;
out vec4 frag_color;
void main() { frag_color = vec4(u_color, 1.0); }

The same MVP matrix from the MIP pass is reused, and the wireframe color is set to $(0.8, 0.8, 0.8)$.

5.5 GPU Interactive Viewer

The viewer below uses the WebGL2 implementation described above. Unlike the CPU viewer, every interaction (rotation, window/level, colormap change) triggers a single GPU draw call with no raycasting delay.

For a hosted example, see brain.scn.gz. The original course page, MO815 - Visualization of Volumetric Images, taught by Alexandre Xavier Falcao, also provides other volumetric images.

GPU MIP Viewer

Interactive GPU-based viewer. Drop a .scn file for real-time MIP rendering with WebGL2 fragment shader ray marching.