Skip to content

Universal Approximation Theorem: Interactive Demos🔗

Theorem (Cybenko, 1989): A single hidden layer network with sufficient neurons can approximate any continuous function to arbitrary accuracy. \(\(F(x) = \sum_{i=1}^{N} w_i \sigma(v_i x + b_i) + w_0\)\)

Mathematical statement: For any continuous \(f: [0,1] \to \mathbb{R}\) and \(\epsilon > 0\), there exists \(N\) and parameters such that \(|F(x) - f(x)| < \epsilon\) for all \(x \in [0,1]\).

Step Function Approximation🔗

ReLU Network Approximation Visualization🔗

This interactive demo shows how a neural network decomposes functions into ReLU components. The example network uses 5 ReLU neurons to approximate a cubic function.

For a ReLU function ReLU(wx + b), the bias term b determines the activation threshold where the function "turns on." The ReLU switches from outputting 0 to outputting the linear part when \(wx + b = 0\), which gives us the inflection point at \(x = -b/w\). You can see this in the visualization: ReLU(x-2) activates at x = 2 (where -b/w = -(-2)/1 = 2), and ReLU(x+1) activates at x = -1 (where -b/w = -(1)/1 = -1).

Interactive ReLU Decomposition

Approximation Error (L∞): -

Neural Network Architecture

External Visualization🔗

Try Desmos Graph

Activation Function Comparison🔗

This demo compares how different activation functions approximate a target function. Notice how parabolic activation (y = x²) may seem to work for sine but fails for other functions because it's not part of the UAT family.

Key Observations🔗

Watch how different activation functions approximate various target functions: - ReLU & Sigmoid: Universal approximators (work for all continuous functions) - Parabolic: Not universal (may work for specific cases but fails generally)