Universal Approximation Theorem: Interactive Demos🔗
Theorem (Cybenko, 1989): A single hidden layer network with sufficient neurons can approximate any continuous function to arbitrary accuracy. \(\(F(x) = \sum_{i=1}^{N} w_i \sigma(v_i x + b_i) + w_0\)\)
Mathematical statement: For any continuous \(f: [0,1] \to \mathbb{R}\) and \(\epsilon > 0\), there exists \(N\) and parameters such that \(|F(x) - f(x)| < \epsilon\) for all \(x \in [0,1]\).
Step Function Approximation🔗
ReLU Network Approximation Visualization🔗
This interactive demo shows how a neural network decomposes functions into ReLU components. The example network uses 5 ReLU neurons to approximate a cubic function.
For a ReLU function ReLU(wx + b), the bias term b determines the activation threshold where the function "turns on." The ReLU switches from outputting 0 to outputting the linear part when \(wx + b = 0\), which gives us the inflection point at \(x = -b/w\). You can see this in the visualization: ReLU(x-2) activates at x = 2 (where -b/w = -(-2)/1 = 2), and ReLU(x+1) activates at x = -1 (where -b/w = -(1)/1 = -1).
Interactive ReLU Decomposition
Approximation Error (L∞): -
Neural Network Architecture
External Visualization🔗
Activation Function Comparison🔗
This demo compares how different activation functions approximate a target function. Notice how parabolic activation (y = x²) may seem to work for sine but fails for other functions because it's not part of the UAT family.
Key Observations🔗
Watch how different activation functions approximate various target functions: - ReLU & Sigmoid: Universal approximators (work for all continuous functions) - Parabolic: Not universal (may work for specific cases but fails generally)