introduction
The set is one of the most beautiful mathematical objects ever discovered, a fractal so complex that no matter how far you zoom in, you can keep finding infinite details. But what happens when you ask a neural network to learn?
At first glance, this sounds like a strange question. The Mandelbrot set is completely deterministic, with no data, no noise, and no hidden rules. But this simplicity makes it a perfect sandbox for studying how neural networks represent complex functionality.
In this article, we will learn how a simple neural network can approximate the Mandelbrot set, and Characteristics of Gaussian Fourier Completely transform your performance, turning blurry approximations into sharp fractal boundaries.
In the process, a vanilla multilayer perceptron (MLP) generates a high-frequency pattern ( spectral bias problem) and how the Fourier function solves it.
mandelbrot set
The Mandelbrot set is defined on the complex plane. For each complex number \(c\in \mathbb{C}\), consider an iterative sequence.
$$z_{n+1} = z_n^2 + c, z_0 = 0$$
If this sequence remains bounded, \(c\) belongs to the Mandelbrot set.
In practice, we approximate this condition using: escape time algorithm. The escape-time algorithm iterates the sequence up to a fixed maximum number of steps and monitors the magnitude \(|z_n|\). If \(|z_n|\) exceeds the chosen escape radius (usually 2), the sequence is guaranteed to diverge and \(c\) is classified as outside the Mandelbrot set. If the sequence does not escape within the maximum number of iterations, \(c\) is considered to belong to the Mandelbrot set, and the number of iterations is often used for visualization purposes.
Turning the Mandelbrot set into a learning problem
Training a neural network requires two things. First, we need to define . learning problemsIn other words, what should the model predict and from which inputs? Second, what you need is labeled data: A large collection of input-output pairs extracted from the problem.
Defining the learning problem
The core of the Mandelbrot set defines a function on the complex plane. Each point \(c = x +iy \in \mathbb{C}\) is mapped to the result. That is, the sequences produced by the iterations either remain bounded or diverge. This immediately suggests a binary classification problem, where the input is a complex number and the output indicates whether the number is in the set.
However, this formulation poses learning difficulties. The boundaries of the Mandelbrot set are infinitely complex, and any small perturbation of \(c\) can change the classification result. From a learning perspective, this makes the target function highly discontinuous, making the optimization unstable and the data inefficient.
To obtain a smoother and more informative learning objective, reformulate the problem using the following formula instead: escape time Information introduced in the previous section. Rather than predicting binary labels, the model is trained to predict continuous variables derived from the number of iterations a sequence escapes.
We do not use raw escape iteration counts directly to generate continuous target functions. The raw escape iteration count is a discrete quantity, and its use introduces discontinuities, especially near the boundaries of the Mandelbrot set, where small changes in \(c\) can cause large jumps in the iteration count. To address this, we employ a smooth escape time value that incorporates the number of escape iterations to produce continuous targets. Additionally, we also apply logarithmic scaling to spread out early escape values and compress larger values to produce a more balanced target distribution.
def smooth_escape(x: float, y: float, max_iter: int = 1000) -> float:
c = complex(x, y)
z = 0j
for n in range(max_iter):
z = z*z + c
r2 = z.real*z.real + z.imag*z.imag
if r2 > 4.0:
r = math.sqrt(r2)
mu = n + 1 - math.log(math.log(r)) / math.log(2.0) # smooth
# log-scale to spread small mu
v = math.log1p(mu) / math.log1p(max_iter)
return float(np.clip(v, 0.0, 1.0))
return 1.0
With this definition, the Mandelbrot set becomes: regression problem. The neural network is trained to approximate the function $$f : \mathbb{R}^2 \rightarrow. [0,1]$$
Maps the spatial coordinate \((x, y)\) in the complex plane to a smooth escape time value.
Data sampling strategy
Uniform sampling of the complex plane is highly inefficient, with most points located far from the boundaries and conveying little information. To address this, the dataset is biased toward border regions by oversampling and filtering points that contain escape values within a predefined band.
def build_boundary_biased_dataset(
n_total=800_000,
frac_boundary=0.7,
xlim=(-2.4, 1.0),
res_for_ylim=(3840, 2160),
ycenter=0.0,
max_iter=1000,
band=(0.35, 0.95),
seed=0,
):
"""
- Mix of uniform samples + boundary-band samples.
- 'band' selects points with target in (low, high), which tends to concentrate near boundary.
"""
rng = np.random.default_rng(seed)
ylim = compute_ylim_from_x(xlim, res_for_ylim, ycenter=ycenter)
n_boundary = int(n_total * frac_boundary)
n_uniform = n_total - n_boundary
# Uniform set
Xu = sample_uniform(n_uniform, xlim, ylim, seed=seed)
# Boundary pool: oversample, then filter by band
pool_factor = 20
pool = sample_uniform(n_boundary * pool_factor, xlim, ylim, seed=seed + 1)
yp = np.empty((pool.shape[0],), dtype=np.float32)
for i, (x, y) in enumerate(pool):
yp[i] = smooth_escape(float(x), float(y), max_iter=max_iter)
mask = (yp > band[0]) & (yp < band[1])
Xb = pool[mask]
yb = yp[mask]
if len(Xb) < n_boundary:
# If band too strict, relax it automatically
keep = min(len(Xb), n_boundary)
print(f"[warn] Boundary band too strict; got {len(Xb)} boundary points, using {keep}.")
Xb = Xb[:keep]
yb = yb[:keep]
n_boundary = keep
n_uniform = n_total - n_boundary
Xu = sample_uniform(n_uniform, xlim, ylim, seed=seed)
else:
Xb = Xb[:n_boundary]
yb = yb[:n_boundary]
yu = np.empty((Xu.shape[0],), dtype=np.float32)
for i, (x, y) in enumerate(Xu):
yu[i] = smooth_escape(float(x), float(y), max_iter=max_iter)
X = np.concatenate([Xu, Xb], axis=0).astype(np.float32)
y = np.concatenate([yu, yb], axis=0).astype(np.float32)
# Shuffle once
perm = rng.permutation(X.shape[0])
return X[perm], y[perm], ylim
Baseline model: deep residual MLP
The first attempt uses a deep residual MLP that takes the raw Cartesian coordinates \((x, y)\) as input and predicts smooth escape values.
# Baseline model
class MLPRes(nn.Module):
def __init__(
self,
hidden_dim=256,
num_blocks=8,
act="silu",
dropout=0.0,
out_dim=1,
):
super().__init__()
activation = nn.ReLU if act.lower() == "relu" else nn.SiLU
self.in_proj = nn.Linear(2 , hidden_dim)
self.in_act = activation()
self.blocks = nn.Sequential(*[
ResidualBlock(hidden_dim, act=act, dropout=dropout)
for _ in range(num_blocks)
])
self.out_ln = nn.LayerNorm(hidden_dim)
self.out_act = activation()
self.out_proj = nn.Linear(hidden_dim, out_dim)
def forward(self, x):
x = self.in_proj(x)
x = self.in_act(x)
x = self.blocks(x)
x = self.out_act(self.out_ln(x))
return self.out_proj(x)
# Residual block
class ResidualBlock(nn.Module):
def __init__(self, dim: int, act: str = "silu", dropout: float = 0.0):
super().__init__()
activation = nn.ReLU if act.lower() == "relu" else nn.SiLU
# pre-norm-ish (LayerNorm helps a lot for stability with deep residual MLPs)
self.ln1 = nn.LayerNorm(dim)
self.fc1 = nn.Linear(dim, dim)
self.ln2 = nn.LayerNorm(dim)
self.fc2 = nn.Linear(dim, dim)
self.act = activation()
self.drop = nn.Dropout(dropout) if dropout and dropout > 0 else nn.Identity()
# optional: small init for the last layer to start near-identity
nn.init.zeros_(self.fc2.weight)
nn.init.zeros_(self.fc2.bias)
def forward(self, x):
h = self.ln1(x)
h = self.act(self.fc1(h))
h = self.drop(h)
h = self.ln2(h)
h = self.fc2(h)
return x + h
This network has sufficient capacity, is deep, has residuals, and is trained on a large dataset using stable optimization.
result

The overall shape of the Mandelbrot set is clearly visible. However, the fine details near the border are visibly blurred. Areas that should show complex fractal structures appear overly smooth, with thin filaments blurred or missing entirely.
This is not a resolution, data, or depth issue. So what's wrong?
spectral bias problem
Neural networks have a well-known problem called . spectral bias:
they tend to learn Low frequency function First, it has a hard time representing functions with rapid oscillations or fine details.
Mandelbrot boundaries are often filled with highly irregular and small-scale structures, especially near the boundary. To capture that, the network needs to express a great deal. high frequency fluctuation \(x\) and \(y\) change in the output.
Fourier function: encoding coordinates in frequency space
One of the most elegant solutions to the spectral bias problem was introduced in 2020 by Tancik et al. In their paper Fourier features allow the network to learn high-frequency functions in low-dimensional domains.
The idea is to transform the input coordinates before entering them into the neural network. Instead of giving raw \((x, y)\), we pass a sinusoidal projection in a random direction in high-dimensional space.
Formally:
$$\gamma(x)=[sin(2 \pi Bx),cos(2 \ pi Bx)]$$
Here \(B \in \mathbb{R}^{d_{in} ×d_{feat}}\) is a random Gaussian matrix.
This mapping works like this: Random Fourier basis expansionallows the network to more easily represent high frequency details.
class GaussianFourierFeatures(nn.Module):
def __init__(self, in_dim=2, num_feats=256, sigma=5.0):
super().__init__()
B = torch.randn(in_dim, num_feats) * sigma
self.register_buffer("B", B)
def forward(self, x):
proj = (2 * torch.pi) * (x @ self.B)
return torch.cat([torch.sin(proj), torch.cos(proj)], dim=-1)
Multiscale Gaussian Fourier features
A single frequency scale may not be enough. The Mandelbrot set exhibits structure at all resolutions (a characteristic of fractal geometry).
To capture this, use: Multiscale Gaussian Fourier featurescombines several frequency bands.
class MultiScaleGaussianFourierFeatures(nn.Module):
def __init__(self, in_dim=2, num_feats=512, sigmas=(2.0, 6.0, 10.0), seed=0):
super().__init__()
# split features across scales
k = len(sigmas)
per = [num_feats // k] * k
per[0] += num_feats - sum(per)
Bs = []
g = torch.Generator()
g.manual_seed(seed)
for s, m in zip(sigmas, per):
B = torch.randn(in_dim, m, generator=g) * s
Bs.append(B)
self.register_buffer("B", torch.cat(Bs, dim=1))
def forward(self, x):
proj = (2 * torch.pi) * (x @ self.B)
return torch.cat([torch.sin(proj), torch.cos(proj)], dim=-1)
This effectively provides the network with a multiresolution frequency reference that perfectly matches the self-similar nature of fractals.
final model
The architecture of the final model is the same as the baseline model, the only difference is that MultiScaleGaussianFourierFeatures.
class MLPFourierRes(nn.Module):
def __init__(
self,
num_feats=256,
sigma=5.0,
hidden_dim=256,
num_blocks=8,
act="silu",
dropout=0.0,
out_dim=1,
):
super().__init__()
self.ff = MultiScaleGaussianFourierFeatures(
2,
num_feats=num_feats,
sigmas=(2.0, 6.0, sigma),
seed=0
)
self.in_proj = nn.Linear(2 * num_feats, hidden_dim)
self.blocks = nn.Sequential(*[
ResidualBlock(hidden_dim, act=act, dropout=dropout)
for _ in range(num_blocks)
])
self.out_ln = nn.LayerNorm(hidden_dim)
activation = nn.ReLU if act.lower() == "relu" else nn.SiLU
self.out_act = activation()
self.out_proj = nn.Linear(hidden_dim, out_dim)
def forward(self, x):
x = self.ff(x)
x = self.in_proj(x)
x = self.blocks(x)
x = self.out_act(self.out_ln(x))
return self.out_proj(x)
training dynamics
Training without Fourier features
The model slowly learns the overall shape of the Mandelbrot set, but then hits a plateau. Additional training cannot add details.
Training with Fourier features
Here, the coarse structure is first visible, followed by progressively finer details. Rather than plateauing, the model continues to improve its predictions.
final result
Both models used the same architecture, dataset, and training procedure. This network is a deep residual multilayer perceptron (MLP) trained as a regression model with a smooth escape time formulation.
- Dataset: 1,000,000 samples from the complex plane. Due to biased data sampling, 70% of the points are concentrated near the fractal boundary.
- Architecture: Residual MLP with 20 residual blocks and 512 units of hidden dimension.
- Activation function: SiLU
- Training: 100 epochs, batch size 4096, Adam-based optimizer, cosine annealing scheduler.
The only difference between the two models is the representation of the input coordinates. The baseline model uses raw Cartesian coordinates, while the second model uses multiscale Fourier features for representation.
global view


zoom 1 view


zoom 2 view


conclusion
Fractals, such as the Mandelbrot set, are an extreme example of a function dominated by high-frequency structures. Approximating them directly from the raw coordinates forces the neural network to synthesize increasingly detailed oscillations, and MLP is poorly suited for this task.
What this article shows is that the limit isn't architecture capacity, data volume, or optimization. it is expression.
By encoding the input coordinates using multiscale Gaussian Fourier features, we move much of the complexity of the problem into the input space. High-frequency structures become clearer, allowing regular neural networks to approximate functions that are too complex in their original form.
This idea extends beyond fractals. Coordinate-based neural networks are used in computer graphics, physically informed learning, and signal processing. In all these settings, the choice of input encoding can be the difference between a smooth approximation and a rich, detailed structure.
Notes on visual assets
All images, animations, and videos shown in this article were generated by the author from the output of the neural network model mentioned above. No external fractal renderer or third-party visual assets were used. The complete code used to train, render images, and generate animations is available in the accompanying repository.
complete code
Github repository
