# Evolutionary Computation: Genetic Algorithm, Neuroevolution and Novelty Search

## Genetic Algorithm

The genetic algorithm is a nature-inspired algorithm based on natural selection, that the fittest individuals of a population are selected to reproduce the next generation.

The genetic algorithm consists of 5 processes:

- Initial population
- Fitness function
- Selection
- Crossing-over
- Mutation

Terminology:

- Population refers to the set of individuals (solution).
- Individual is defined by its chromosome (set of parameters/variables).
- Fitness function refers to the performance measure of an individual.
- Selection refers to the selection of the fittest.
- Crossing-over refers to a swapping of segments of 2 parents’ genes, producing a child individual with a new gene combination.
- Mutation is a random perturbation of genes based on a probability.

### Optimization Problem: Linear Regression

Evolutionary algorithms can serve as “black box” optimisation algorithms without needing to solving the objective function analytically. To illustrate that evolutionary algorithms can optimise, the simple linear regression problem is used. Define a linear function:

\[y = mx + c + \epsilon\]to be modelled by a linear regression model, where $m=1$, $c=0$, $\epsilon\sim N(0,1)$ represents gradient, y-intercept and Gaussian noise respectively.

```
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0) # for reproducibility
```

```
X = np.linspace(0, 10, 100)
y = X + np.random.normal(0, 1, 100)
plt.title("Linear function with Gaussian noise")
plt.plot(X, y, '.')
plt.show()
```

### Process 1: Generate the initial population of individuals.

Each individual (solution/model) is defined by a set of parameters. Hyperparameters to be specified, which are variables that are not updated at every iteration of optimisation, are the population size (number of individuals in the population at any point in time) and the number of parameters that defines an individual. The initial population’s parameters can be zero-initialised or random-initialised. For your interest, there also exists many other initialisation methods to be used depending on context, such as the He initialisation and Xavier initialisation. The set of parameters that defines each individual is biologically analogous to the individual’s genome (or gene or chromosome, depending on the computational process).

```
population_size = 10
num_parameters = 2
# initial_population = np.zeros(shape=(population_size, num_parameters)) # zero initialisation
initial_population = np.random.normal(0, 1, size=(population_size, num_parameters)) # random normal initialisation
initial_population
array([[ 1.8831507 , -1.34775906],
[-1.270485 , 0.96939671],
[-1.17312341, 1.94362119],
[-0.41361898, -0.74745481],
[ 1.92294203, 1.48051479],
[ 1.86755896, 0.90604466],
[-0.86122569, 1.91006495],
[-0.26800337, 0.8024564 ],
[ 0.94725197, -0.15501009],
[ 0.61407937, 0.92220667]])
```

### Process 2: Compute the fitness of all individuals.

Another 2 hyperparameters are in the form of functions - the solution and the fitness function. The solution is a model that uses the individual’s parameters to compute the output $y$ given input $X$. For simplicity, we use the polynomial regression model (with 2 parameters, it is a simple linear regression model):

\[f(X) = \theta_1 X + \theta_2\]where $\theta_1$ and $\theta_2$ should converge to $m$ and $c$ respectively eventually. The fitness function measures the performance of an individual solution. The evolutionary analogy of the fitness function of an organism would be, for example, its survivability and/or reproductive success. Because we want to model the linear function with Gaussian noise dataset, the negative mean squared error (MSE) is used as the fitness function to determine how well the solution models the dataset:

\[MSE = \frac{1}{n} \sum_{i=1}^n (y_{i} - f(X_i))^2\]Because the fitness function is to be maximised, MSE is negated to reflect a higher value of MSE as more desirable.

```
def solution(params): # Polynomial regression model
return np.sum([params[i] * X**i for i in range(len(params))], axis=0)
def fitness_function(params): # Mean squared error
return -np.sum(abs(y - solution(params))**2) / len(X)
def plot_data():
plt.plot(X, y, '.')
def plot_individual(individual):
plt.plot(X, solution(individual), '.', c='grey')
def plot_population(population):
for individual in population:
plot_individual(individual)
individual = initial_population[0]
fitness_score = fitness_function(individual)
plot_data()
plot_individual(individual)
plt.show()
plot_data()
plot_population(initial_population)
plt.show()
```

```
def compute_fitness(population):
return np.array([fitness_function(individual) for individual in population])
fitness_scores = compute_fitness(initial_population)
fitness_scores
array([-145.00615079, -3.20852429, -21.20939814, -110.93013108,
-21.41800978, -2.83355152, -21.68891802, -2.97834017,
-35.66225857, -1.05527391])
```

### Process 3: Select the fittest individuals.

Like natural selection, select the top $k$ percentage of individuals with the highest fitness scores, where $k$ is a hyperparameter, to form the parent subpopulation that will reproduce to form the next generation of the population later.

```
def get_fittest(population, fitness_scores):
return population[fitness_scores.argmax(), :]
def select_fittest(population, fitness_scores, k=0.5):
return population[np.argsort(fitness_scores)[-int(len(population) * k):], :]
parent_subpopulation = select_fittest(initial_population, fitness_scores, k=0.2)
parent_subpopulation, compute_fitness(parent_subpopulation)
```

```
(array([[1.86755896, 0.90604466],
[0.61407937, 0.92220667]]), array([-2.83355152, -1.05527391]))
```

### Process 4: Perform crossing-over between parents to produce children.

Crossing-over is a biological process that exchanges genetic material to result in new combinations of genetic material. For the benefit of non-biology students, much detail has been abstracted out and interested readers can refer to chromosomal crossovers. In the genetic algorithm, crossing-over is performed during reproduction by swapping a segment of parameters of one parent with another parent. For example, take 2 parents defined by 4 parameters:

\[P1 = [A1, A2, A3, A4], P2 = [B1, B2, B3, B4]\]A crossing-over at the index 3 will result in a child:

\[C = [A1, A2, B3, B4]\]There exists other methods of genetic exchange to introduce variance in the population gene pool, such as swapping elements instead of segments.

```
def perform_crossingover(subpopulation):
children = []
for i in range(population_size - len(subpopulation)):
parents = subpopulation[np.random.randint(0, len(subpopulation), 2)]
gene_indices = np.zeros(num_parameters).astype(int)
gene_indices[np.random.randint(len(gene_indices)+1):] = 1 # segment swap
child = parents[gene_indices, np.arange(num_parameters)]
children.append(child)
return np.append(subpopulation, np.array(children), axis=0)
next_population = perform_crossingover(parent_subpopulation)
next_population, compute_fitness(next_population)
(array([[1.86755896, 0.90604466],
[0.61407937, 0.92220667],
[0.61407937, 0.92220667],
[1.86755896, 0.90604466],
[0.61407937, 0.92220667],
[0.61407937, 0.92220667],
[1.86755896, 0.92220667],
[0.61407937, 0.92220667],
[1.86755896, 0.92220667],
[0.61407937, 0.92220667]]),
array([-2.83355152, -1.05527391, -1.05527391, -2.83355152, -1.05527391,
-1.05527391, -3.04089715, -1.05527391, -3.04089715, -1.05527391]))
```

### Process 5: Perform mutation on the population.

A mutation is defined as a change in the DNA sequence. While the exact differences between DNA, gene and chromosome in the genetic algorithm are not maintained, inspiration is drawn from mutation in biology that usually worsens fitness but can occasionally improve fitness of the individual. To perform mutation on the population parameters, add Gaussian noise $\epsilon\sim N(0, \sigma)$ to the individuals’ parameters, where $\sigma$ is the standard deviation hyperparameter.

```
def perform_mutation(population, sigma=0.1):
return population + np.random.normal(0, sigma, population.shape) # Gaussian noise
mutated_population = perform_mutation(next_population, sigma=0.01)
mutated_population, compute_fitness(mutated_population)
(array([[1.86732417, 0.90693888],
[0.6081361 , 0.9211497 ],
[0.62150733, 0.91465686],
[1.86617613, 0.88479546],
[0.60377918, 0.91703215],
[0.61266137, 0.90944187],
[1.88237409, 0.93879944],
[0.61408077, 0.93726506],
[1.86860417, 0.91984624],
[0.62227372, 0.92623524]]),
array([-2.84393595, -1.0525611 , -1.05282308, -2.58416917, -1.04907953,
-1.04977737, -3.3166936 , -1.07545786, -3.01246558, -1.0622913 ]))
```

### The Genetic Algorithm: All 5 Processes Together

By combining the 5 processes together, we construct the genetic algorithm and run it to find a solution that models the linear function well.

Genetic Algorithm:

- Generate the initial population of individuals.
- Repeat until convergence:
- Compute fitness of the population.
- Select the fittest individuals (parent subpopulation).
- Perform crossing-over between parents to produce children.
- Perform mutation on the population.

- Select the fittest individual of the population as the solution.

```
# Define hyperparameters of the genetic algorithm.
population_size = 20
num_parameters = 2
num_generations = 50
top_k = 0.5
mutation_sigma = 0.01
# Process 1: Generate the initial population of individuals.
population = np.random.normal(0, 1, size=(population_size, num_parameters))
# Misc: Experimental tracking
scores = []
solutions = []
# Iterate the process over multiple generations of populations.
for i in range(num_generations):
# Process 2: Compute the fitness of all individuals.
fitness_scores = compute_fitness(population)
# Process 3: Select the fittest individuals.
fittest_subpopulation = select_fittest(population, fitness_scores, k=top_k)
# Misc: Experimental tracking
fittest = get_fittest(population, fitness_scores)
solutions.append(solution(fittest))
scores.append(fitness_function(fittest))
# Process 4: Perform crossing-over between parents to produce children.
children = perform_crossingover(fittest_subpopulation)
# Process 5: Perform mutation on the population.
population = perform_mutation(children, sigma=mutation_sigma)
# Misc: Experimental tracking
plt.plot(np.arange(num_generations), scores)
plt.show()
```

### Experiment Result

The fittest individual in the final population is a reasonably well-fit linear regression model. The rest of the population have a lower fitness score but are quite well-fit as well.

```
fitness_score = fitness_function(fittest)
y_pred = solution(fittest)
plot_data()
plot_individual(fittest)
plt.show()
plot_data()
plot_population(fittest_subpopulation)
plt.show()
```

By visualising the fittest model at each generation (iteration) of the genetic algorithm, notice that virtually instantly, the linear regression model fits to the dataset. In fact, linear regression is too simple a problem to realise the effectiveness of the genetic algorithm. Nonetheless, the reason for using linear regression is to bring focus to the genetic algorithm without the overhead of needing to understand the model. For a more complex application of the genetic algorithm using neural networks, refer to the second section Neuroevolution. For an evolutionary strategy based on novelty applied on reinforcement learning, refer to the third section.

```
%%capture
from matplotlib.animation import FuncAnimation
fig, ax = plt.subplots()
plot_data()
ga_line = ax.plot([], [])[0]
ax.set_xlim(min(X), max(X))
ax.set_ylim(min(y), max(y))
def animate(i):
ga_line.set_data(X, solutions[i])
ax.set_xlabel(f'Gen {i+1}')
return ga_line, ax
ani = FuncAnimation(fig, animate, frames=np.arange(0, num_generations), interval=80, repeat=False)
ani.save('/images/genetic-algorithm/genetic_algorithm.gif')
```

## Neuroevolution

Neuroevolution is a method of applying evolutionary algorithms to optimise neural networks instead of using backpropagation. Neuroevolution therefore is a non-gradient (or derivation-free) optimisation, which can speed up training as backward passes are not computed. The neural network optimised by neuroevolution can be adapted in terms of parameters, hyperparameters or network architecture. Prominent examples of neuroevolution are NeuroEvolution of Augmenting Topologies (NEAT) and Covariance-Matrix Adaptation Evolution Strategy (CMA-ES). The evolutionary algorithm employed in this notebook is the vanilla genetic algorithm without crossing-over, applying only mutation over neural network parameters (weights).

```
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
```

### The Neural Network Model (“Neuro”-evolution)

The neural network, or a multi-layer perceptron, is a universal function approximator. The neural network in PyTorch with 2 hidden layers and non-linear activation functions hyperbolic tangent (tanh) and sigmoid is defined.

```
net = nn.Sequential(
nn.Linear(in_features=2, out_features=16, bias=True),
nn.Tanh(),
nn.Linear(in_features=16, out_features=1),
nn.Sigmoid()
)
class Net(nn.Module):
def __init__(self, input_size, output_size, n_hidden=16):
super(Net, self).__init__()
self.linear1 = nn.Linear(input_size, n_hidden, bias=True)
self.tanh1 = nn.Tanh()
self.linear2 = nn.Linear(n_hidden, output_size)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = self.linear1(x)
x = self.tanh1(x)
x = self.linear2(x)
x = self.sigmoid(x)
return x
net = Net(2, 1)
```

### The Mutation Function (Neuro-“evolution”)

As with the genetic algorithm, neuroevolution can be implemented by adding an additive Gaussian noise $\epsilon\sim N(0,\sigma)$ to all neural network weights to introduce variance in the “gene pool” of the population.

```
from torch.nn.utils import parameters_to_vector, vector_to_parameters
def get_params(net):
return parameters_to_vector(net.parameters())
def mutate_params(net, sigma=0.1):
mutated_params = get_params(net) + torch.normal(0, sigma, size=get_params(net).data.shape)
vector_to_parameters(mutated_params, net.parameters())
print(f"Before mutation:\n {get_params(net)}\n")
mutate_params(net, sigma=0.1)
print(f"After mutation:\n {get_params(net)}")
Before mutation:
tensor([-0.5949, -0.1591, -0.6217, 0.0710, -0.6687, -0.3964, 0.3319, -0.2988,
0.6695, 0.4645, 0.2398, -0.2250, -0.5464, 0.2512, 0.0582, 0.0818,
0.1810, -0.5316, 0.3275, -0.1162, 0.2542, -0.6751, 0.4344, 0.1846,
0.4996, -0.1422, 0.3201, -0.0814, -0.1195, 0.1880, -0.2272, -0.4236,
-0.0218, -0.6078, -0.0099, 0.1856, -0.4883, -0.2465, 0.0166, -0.1269,
0.4119, 0.0229, 0.2381, -0.0007, -0.2959, -0.4865, 0.0240, 0.0228,
0.2293, -0.0649, -0.1661, 0.0788, 0.2253, -0.1549, -0.2465, -0.0267,
-0.1861, -0.2189, 0.0964, 0.0684, -0.0555, 0.0063, 0.1374, -0.1588,
0.2334], grad_fn=<CatBackward>)
After mutation:
tensor([-0.6005, -0.2099, -0.5364, 0.1007, -0.5897, -0.5340, 0.3688, -0.3615,
0.7335, 0.3900, 0.2519, -0.1144, -0.5839, 0.2251, 0.0043, 0.1630,
0.1419, -0.6593, 0.3079, -0.1238, 0.3217, -0.7810, 0.4419, 0.3621,
0.5246, 0.0064, 0.4284, -0.1177, -0.0700, -0.0537, -0.1281, -0.3613,
-0.0873, -0.6996, -0.1507, 0.1944, -0.6326, -0.1384, -0.0384, 0.0323,
0.3344, 0.0667, 0.1177, -0.1347, -0.3413, -0.5302, -0.1326, 0.3330,
0.2282, -0.1485, -0.1944, 0.2058, 0.2997, -0.0631, -0.1202, -0.0973,
-0.1269, -0.4766, -0.0509, 0.1725, -0.0470, -0.0562, 0.1357, -0.2274,
0.3410], grad_fn=<CatBackward>)
```

### Optimization Problem: Circles Dataset

The optimization problem is the Circles dataset from Scikit-Learn, where the neural network model must learn to predict and discriminate between the inner circles (labelled 1) and outer circles (labelled 0). The Circles dataset is the reason that non-linear activation functions in the neural network architecture are needed. $X$ is 2-dimensional while $y$ is 1-dimensional.

```
from sklearn.datasets import make_circles
def plot_data(X, y):
X = X.detach().numpy()
y = y.detach().numpy().flatten()
plt.plot(X[y==0,0], X[y==0,1], '.', c='b', label='0')
plt.plot(X[y==1,0], X[y==1,1], '.', c='r', label='1')
X, y = make_circles(n_samples=100)
X = torch.from_numpy(X).float()
y = torch.from_numpy(y).float().view(-1, 1)
plot_data(X, y)
net(X[:5, :])
tensor([[0.5009],
[0.4840],
[0.6110],
[0.6143],
[0.5136]], grad_fn=<SigmoidBackward>)
```

### Process 1: Generate the initial population of neural networks.

For illustration purposes, a small population size of 5 and 4 hidden units per neural network layer is used. Inspecting the first 2 neural networks in the population, neural network weights are randomly initialised. The specific initialisation method used for the weights is documented in the PyTorch documentation for interested readers.

```
population_size = 5
initial_population = np.array([Net(2,1,n_hidden=4) for _ in range(population_size)])
for p in initial_population[:2]:
print(get_params(p))
tensor([ 0.5282, 0.4404, -0.6330, -0.1387, -0.2104, 0.5194, 0.3596, 0.2664,
0.1044, 0.5380, 0.1583, 0.1221, 0.0324, 0.1239, -0.3698, -0.3658,
-0.0879], grad_fn=<CatBackward>)
tensor([ 0.3366, -0.6526, -0.1909, -0.4681, -0.3542, 0.2475, 0.1892, -0.1961,
-0.2867, -0.4570, -0.3183, -0.2220, 0.1677, 0.2118, 0.1053, 0.3238,
0.0737], grad_fn=<CatBackward>)
```

### Process 2: Compute the fitness of the population.

The fitness function measures the performance of an individual neural network. Because $y$ is a binary variable of values ${0,1}$, the negative binary cross entropy error (BCE) is employed, negated to reflect a higher value as more desirable.

```
def fitness_function(net):
return -nn.BCELoss()(net(X), y).detach().numpy().item()
def compute_fitness(population):
return np.array([fitness_function(individual) for individual in population])
fitness_score = fitness_function(net)
fitness_scores = compute_fitness(initial_population)
fitness_score, fitness_scores
(-0.7065134644508362,
array([-0.69943392, -0.70006615, -0.70542192, -0.69766504, -0.76979303]))
```

### Process 3: Select the fittest neural networks.

Select the top $k$ percentage of neural networks with the highest fitness score to form the parent subpopulation.

```
def solution(individual):
return individual(X).view(-1).detach().numpy().round()
def get_fittest(population, fitness_scores):
return population[fitness_scores.argmax()]
def select_fittest(population, fitness_scores, k=0.5):
return population[np.argsort(fitness_scores)[-int(len(population) * k):]]
parent_subpopulation = select_fittest(initial_population, fitness_scores, k=0.4)
compute_fitness(parent_subpopulation)
array([-0.69943392, -0.69766504])
```

### Process 4: Perform reproduction of the parents to replenish the population.

In contrast to common implementations of genetic algorithms, no crossing-over is performed. Parent neural networks are simply uniformly sampled with replacement to create an identical copy as the child.

```
import copy
def perform_reproduction(subpopulation):
num_children = population_size - len(subpopulation)
parents = np.random.choice(subpopulation, num_children)
return np.append(subpopulation, [copy.deepcopy(p) for p in parents], axis=0)
next_population = perform_reproduction(parent_subpopulation)
compute_fitness(next_population)
array([-0.69943392, -0.69766504, -0.69943392, -0.69943392, -0.69766504])
```

### Process 5: Perform mutation on the population.

As explained previously, add a Gaussian noise perturbation to all parameters of the neural network.

```
def get_population_parameter(population):
return [get_params(net) for net in population]
def perform_mutation(population, sigma=0.1):
for individual in population:
mutate_params(individual, sigma=0.1)
return population
print("Before mutation:")
print(get_population_parameter(next_population))
perform_mutation(next_population)
print("\nAfter mutation:")
print(get_population_parameter(next_population))
Before mutation:
[tensor([ 0.5282, 0.4404, -0.6330, -0.1387, -0.2104, 0.5194, 0.3596, 0.2664,
0.1044, 0.5380, 0.1583, 0.1221, 0.0324, 0.1239, -0.3698, -0.3658,
-0.0879], grad_fn=<CatBackward>), tensor([ 0.3565, -0.1857, 0.2187, 0.1788, 0.1412, 0.1778, -0.6750, 0.6518,
-0.5023, 0.2402, 0.4160, -0.0343, 0.0818, 0.2978, 0.3582, -0.0858,
-0.3332], grad_fn=<CatBackward>), tensor([ 0.5282, 0.4404, -0.6330, -0.1387, -0.2104, 0.5194, 0.3596, 0.2664,
0.1044, 0.5380, 0.1583, 0.1221, 0.0324, 0.1239, -0.3698, -0.3658,
-0.0879], grad_fn=<CatBackward>), tensor([ 0.5282, 0.4404, -0.6330, -0.1387, -0.2104, 0.5194, 0.3596, 0.2664,
0.1044, 0.5380, 0.1583, 0.1221, 0.0324, 0.1239, -0.3698, -0.3658,
-0.0879], grad_fn=<CatBackward>), tensor([ 0.3565, -0.1857, 0.2187, 0.1788, 0.1412, 0.1778, -0.6750, 0.6518,
-0.5023, 0.2402, 0.4160, -0.0343, 0.0818, 0.2978, 0.3582, -0.0858,
-0.3332], grad_fn=<CatBackward>)]
After mutation:
[tensor([ 0.6775, 0.5253, -0.6425, 0.0200, -0.2527, 0.5128, 0.4704, 0.1674,
0.0913, 0.3902, 0.1255, 0.2492, -0.1159, 0.0734, -0.2054, -0.3601,
-0.1615], grad_fn=<CatBackward>), tensor([ 0.4123, -0.2091, 0.3269, 0.2261, 0.2042, 0.2701, -0.7392, 0.5924,
-0.5903, 0.0912, 0.2945, -0.2038, 0.0711, 0.1820, 0.3112, -0.0067,
-0.3152], grad_fn=<CatBackward>), tensor([ 0.6215, 0.5592, -0.5993, -0.0947, -0.3291, 0.4888, 0.3596, 0.2436,
0.2440, 0.4675, -0.0707, 0.0269, 0.0515, 0.0444, -0.3241, -0.3425,
-0.1050], grad_fn=<CatBackward>), tensor([ 0.4444, 0.4725, -0.6944, -0.2502, -0.1153, 0.5679, 0.2696, 0.4509,
-0.0502, 0.6541, 0.0673, 0.1718, 0.0901, 0.0092, -0.2689, -0.4209,
-0.0223], grad_fn=<CatBackward>), tensor([ 0.2158, -0.1765, 0.0658, 0.1894, 0.0741, -0.1204, -0.7014, 0.5762,
-0.3188, 0.3198, 0.5875, -0.0955, 0.0913, 0.2711, 0.3587, -0.0755,
-0.4120], grad_fn=<CatBackward>)]
```

### The Neuroevolution Algorithm: All 5 Processes Together

By combining the 5 processes together, we construct the neuroevolution algorithm and run it to find a neural network solution that models the Circles dataset well.

Neuroevolution:

- Generate the initial population of individuals.
- Repeat until convergence:
- Compute fitness of the population.
- Select the fittest individuals (parent subpopulation).
- Perform reproduction between parents to produce children.
- Perform mutation on the population.

- Select the fittest individual of the population as the solution.

```
# Neuroevolution hyperparameters
population_size = 100
num_generations = 300
top_k = 0.1
mutation_sigma = 0.1
n_hidden = 16
# Process 1: Generate the initial population.
population = np.array([Net(2, 1, n_hidden) for _ in range(population_size)])
# Misc: Experimental tracking
scores = []
solutions = []
fittests = []
for i in range(num_generations):
# Process 2: Compute fitness of the population.
fitness_scores = compute_fitness(population)
# Process 3: Select the fittest individuals.
fittest_subpopulation = select_fittest(population, fitness_scores, k=top_k)
# Misc: Experimental tracking
fittest = get_fittest(population, fitness_scores)
fittests.append(fittest)
solutions.append(solution(fittest))
scores.append(fitness_function(fittest))
# Process 4: Perform reproduction between parents.
children = perform_reproduction(fittest_subpopulation)
# Process 5: Perform mutation on the population.
population = perform_mutation(children, sigma=mutation_sigma)
# Misc: Experimental tracking
plt.plot(np.arange(num_generations), scores)
plt.show()
```

### Experiment Result

The background colours illustrate the neural network’s decision boundary, while the individual data points are the original dataset. Looking at the fittest individual neural network of the final population, the non-linear decision boundary has been correctly and well-learnt by the fittest neural network in the final population.

```
def plot_individual(net):
x1 = np.arange(X[:,0].min()*1.2, X[:,0].max()*1.2, 0.01)
x2 = np.arange(X[:,1].min()*1.2, X[:,1].max()*1.2, 0.01)
X1, X2 = np.meshgrid(x1, x2)
Y = np.zeros(X1.shape).flatten()
for i, [x1, x2] in enumerate(zip(X1.flatten(), X2.flatten())):
Y[i] = np.asarray(net(Variable(torch.Tensor([x1,x2])).float()).data)
Y = Y.reshape(X1.shape)
plt.xlim(min(X[:,0])*1.2, max(X[:,0])*1.2)
plt.ylim(min(X[:,1])*1.2, max(X[:,1])*1.2)
plt.contourf(X1, X2, Y, cmap='bwr', alpha=0.8)
plt.colorbar()
fitness_score = fitness_function(fittest)
print(f"Fittest score: {fitness_score}")
plot_data(X, y)
plot_individual(fittest)
Fittest score: -0.028863554820418358
```

By visualising the fittest model at each generation of neuroevolution, notice that the circular decision boundary is eventually found. For an evolutionary strategy based on novelty applied on reinforcement learning, refer to Part 3 of the Evolutionary Computation series on Novelty Search. For an introductory treatment of the genetic algorithm, refer to Part 1.

```
%%capture
from matplotlib.animation import FuncAnimation
fig, ax = plt.subplots()
plot_data(X, y)
ax.set_xlim(min(X[:,0]*1.2), max(X[:,0])*1.2)
ax.set_ylim(min(X[:,1]*1.2), max(X[:,1])*1.2)
x1 = np.arange(X[:,0].min()*1.2, X[:,0].max()*1.2, 0.01)
x2 = np.arange(X[:,1].min()*1.2, X[:,1].max()*1.2, 0.01)
X1, X2 = np.meshgrid(x1, x2)
def animate(i):
net = fittests[i]
Y = net(torch.Tensor(np.stack([X1.flatten(), X2.flatten()], axis=1))).detach().numpy().reshape(X1.shape)
ax.contourf(X1, X2, Y, cmap='bwr', alpha=0.8)
ax.set_xlabel(f'Gen {i+1}')
ani = FuncAnimation(fig, animate, frames=np.arange(0, num_generations), interval=80, repeat=False)
ani.save('/images/neuroevolution/neuroevolution.gif')
```

## Novelty Search

Novelty Search is an Evolutionary Strategy (ES) algorithm that optimises using a novelty function instead of a fitness function (like in a vanilla genetic algorithm), which has shown to produce competitive performance for exploration in reinforcement learning. The novelty of a solution is defined by how similar the solution’s behaviour is as compared to the rest of the population. The novelty score is therefore computed by its average distance from the $k$-nearest neighbours in the population.

```
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
import gym
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
```

### Neuroevolution for Reinforcement Learning

Neuroevolution is the application of evolutionary strategies to neural networks. We use a simple neural network in PyTorch, with 2 linear layers and 2 non-linear activation functions tangent and sigmoid. In deep reinforcement learning, the neural network policy $\pi: s \rightarrow a$ serves as function mapping from the observation of the environment to an action chosen by the agent. Over one episode, the agent performs an action and the state of the environment is observed by the agent, along with a reward at that particular timestep. The fitness of an individual neural network is therefore defined by the cumulative reward obtained by the agent over one episode of interacting with the environment. The environment used is CartPole-v1, where the agent’s goal is to balance the pole by pushing the cart. The state observed by the agent is defined as:

\[Observation = [Cart Position, Cart Velocity, Pole Angle, Pole Angular Velocity]\]where the range of values are:

\[Cart Position = [-4.8,4.8]\] \[Cart Velocity = [-Inf, Inf]\] \[Pole Angle = [-24 degrees, 24 degrees]\] \[Pole Angular Velocity = [-Inf, Inf]\]and the action is a single scalar discrete value:

\[Action = [0, 1]\]where $0$ and $1$ represents the action of pushing the cart to the left and right respectively.

```
class Net(nn.Module):
def __init__(self, input_size, output_size, n_hidden=16):
super(Net, self).__init__()
self.linear1 = nn.Linear(input_size, n_hidden, bias=True)
self.tanh1 = nn.Tanh()
self.linear2 = nn.Linear(n_hidden, output_size)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = self.linear1(x)
x = self.tanh1(x)
x = self.linear2(x)
x = self.sigmoid(x)
return x
def get_action(net, obs):
return net(torch.from_numpy(obs.copy()).float()).detach().numpy().argmax()
def evaluate(net):
obs = env.reset()
done = False
total_reward = 0
while not done:
action = get_action(net, obs)
obs, reward, done, _ = env.step(action)
total_reward += reward
return total_reward
def fitness_function(net, episodes=1):
return np.mean([evaluate(net) for _ in range(episodes)])
def compute_fitness(population):
return np.array([fitness_function(individual) for individual in population])
def get_fittest(population, fitness_scores):
return population[fitness_scores.argmax()]
```

### Novelty Selection

The main difference between neuroevolution and novelty search is the selection criterion, changed from the fitness score to the novelty score. Instead of selecting for the fittest individuals in a population, novelty search selects the most novel individuals by the novelty score with respect to the rest of the population. The novelty score indicates the novelty of an individual, defined as the average difference of an individual neural network policy $\pi$ to $k$-nearest neighbours in the population, notably in terms of their behaviour. Therefore, the behaviour of an individual $b(\pi_i)$ must be defined. We employ a simple characterisation of a neural network’s behaviour as the terminal (final) state $s_n$ in the sequence of states observed by the agent $S_{\pi_i} = [s_1, s_2, …, s_n]$ in 1 evaluation:

\[Behaviour(\pi_i) = Terminal(S_{\pi_i}) = s_{n}\]The similarity between 2 individuals’ behaviours is simply the sum of squared difference between final observations:

\[Similarity(\pi_i, \pi_j) = \Vert Behaviour(\pi_i) - Behaviour(\pi_j)\Vert_2\]The novelty of an individual with respect to its $k$-nearest neighbours of the population $P$ is defined by:

\[Novelty(\pi_i, N_{\pi_i}) = \frac{1}{|N_{\pi_i}|} \sum_{\pi_k\in N_{\pi_i}}Similarity(\pi_i, \pi_k)\]where $N_{\pi_i}$ refers to the $k$-nearest neighbours of $\pi_i$. The $k$-nearest neighbours $N_{\pi_i}$ are selected by the $k$ largest similarity scores between $\pi_i$ and $\pi_{k}\in P$.

As shown above, individuals in a dense region of the behaviour space have less novel behaviour, while individuals in a sparse region have most novel behaviour and are selected for.

```
def behaviour(net):
obs = env.reset()
done = False
while not done:
action = get_action(net, obs)
obs, reward, done, _ = env.step(action)
return obs
def similarity(net1, net2):
b1, b2 = behaviour(net1), behaviour(net2)
return np.sum((b1 - b2)**2)
def compute_novelty(population, k=3):
distances = []
n = len(population)
for i in range(n):
distance_i = sorted([similarity(population[i], population[j]) for j in range(n) if i != j])[:k]
distances.append(np.mean(distance_i))
return distances
def get_novel_subpopulation(population, novelty_scores):
return population[novelty_scores.argmax()]
def select_most_novel(population, novelty_scores, k=0.5):
return population[np.argsort(novelty_scores)[-int(len(population) * k):]]
```

### Perform Reproduction

As with neuroevolution, reproduction among the novel parent individuals is performed by simply sampling then making a copy of the parent individual to form child individuals, replenishing the population. There is no change from the neuroevolution implementation the above section on neuroevolution.

```
import copy
def perform_reproduction(subpopulation):
num_children = population_size - len(subpopulation)
parents = np.random.choice(subpopulation, num_children)
return np.append(subpopulation, [copy.deepcopy(p) for p in parents], axis=0)
```

### Perform Mutation

As with neuroevolution, mutation is performed by applying an additive Gaussian noise to the parameters of the neural networks. There is no change from the neuroevolution implementation in Part 2.

```
from torch.nn.utils import parameters_to_vector, vector_to_parameters
def get_params(net):
return parameters_to_vector(net.parameters())
def mutate_params(net, sigma=0.1):
mutated_params = get_params(net) + torch.normal(0, sigma, size=get_params(net).data.shape)
vector_to_parameters(mutated_params, net.parameters())
def perform_mutation(population, sigma=0.1):
for individual in population:
mutate_params(individual, sigma=0.1)
return population
```

### The Novelty Search Algorithm

By selecting the most novel individuals over generations, the individuals in the population will find its behavioural niche, improving exploration in the behaviour space.

Novelty Search:

- Generate the initial population of individuals.
- Repeat until convergence:
- Compute novelty of the population.
- Select the most novel individuals to form the parent subpopulation.
- Perform reproduction between parents to produce children.
- Perform mutation on the population.

- Select the fittest individual of the population as the solution.

```
# Novelty Search hyperparameters
population_size = 20
num_generations = 30
top_k = 0.2
mutation_sigma = 0.1
k_nearest = 3
# CartPole environment initialisation
env = gym.make('CartPole-v1')
# Neural network hyperparameters
input_size = env.observation_space.shape[0]
output_size = env.action_space.n
n_hidden = 16
# Process 1: Generate the initial population.
population = np.array([Net(input_size, output_size, n_hidden) for _ in range(population_size)])
# Misc: Experimental tracking
scores = []
fittests = []
for i in range(num_generations):
# Process 2: Compute the novelty of individuals with respect to closest neighbours in the population.
novelty_scores = compute_novelty(population, k=k_nearest)
# Process 3: Select the most novel individuals.
novel_subpopulation = select_most_novel(population, novelty_scores, k=top_k)
# Misc: Experimental tracking
fitness_scores = compute_fitness(population)
fittest = get_fittest(population, fitness_scores)
fittests.append(fittest)
scores.append(max(fitness_scores))
# Process 4: Perform reproduction between parents.
children = perform_reproduction(novel_subpopulation)
population = perform_mutation(children, sigma=mutation_sigma)
# Misc: Experimental tracking
plt.plot(np.arange(num_generations), scores)
plt.show()
```

### Experiment Results

Plotting the novelty score against fitness score for the final population, the novelty score defined by the $k$-nearest neighbour similarity of terminal states is not linearly correlated with a high fitness score. It is important that the novelty score is not linearly correlated with the fitness score because a linear combination of the fitness score would have been used for selection, defeating the purpose of novelty-based search.

```
plt.xlabel("Novelty Score")
plt.ylabel("Fitness Score")
plt.scatter(novelty_scores, fitness_scores)
plt.show()
```

Visualising an episode of the fittest individual shows that the novelty search algorithm has successfully achieved the goal of CartPole-v1 of balancing the pole.

```
%%capture
from matplotlib.animation import FuncAnimation
def get_frames(net, episodes=3):
frames = []
for i in range(episodes):
obs = env.reset()
done = False
total_reward = 0
while not done:
action = get_action(net, obs)
obs, reward, done, _ = env.step(action)
frames.append(env.render(mode='rgb_array'))
env.close()
return frames
frames = get_frames(fittest, episodes=3)
fig, ax = plt.subplots()
screen = plt.imshow(frames[i])
def animate(i):
screen.set_data(frames[i])
ani = FuncAnimation(fig, animate, frames=np.arange(0, len(frames)), interval=80, repeat=False)
ani.save('/images/novelty-search/novelty_search.gif')
```