How would an evolving website work?

As a web developer who is also interested in A/B testing and evolution, it occurred to me that it would be fascinating to try to build a website that optimizes itself. I’ve been kicking around this idea for a while and wanted to share a few thoughts on it here to get feedback from anyone else that finds the idea promising.

Here’s the idea:

In traditional A/B testing, you specify multiple variations of some aspect of your site, measure which variation performs the best, then make the winner the default, and repeat. The types of things you test can range from simple headline and button color tests to complex tests that affect the user’s entire experience with your site.

In all of these cases though you have to figure out what variations you want to test. If you’re testing the headline, you need to specify which headlines you want to test. If you’re testing button color, you need to specify which colors to test, etc.

In the natural world, we see stunningly complex and optimized life forms that evolved little by little over billions of years. Evolution is similar to A/B testing in a sense, except that in evolution the variations happen by accident through genetic copying errors. Most of those mutations decrease the organism’s odds of reproducing, but occasionally they confer a benefit that causes that organism to have a better chance at surviving and reproducing. When that happens, the mutation gets passed on and may eventually spread to the rest of species over time. That “simple” process is what has led to all of the variety of life on earth.

Optimizing a web page through evolution poses many issues though.

How do you algorithmically mutate something on the page? Could you write a function that generates variations of a headline? Maybe. Would those variations be any good? Would you trust it enough to deploy into production?

I bet by analyzing tens of thousands of webpages, you could algorithmically identify common headline wording structures. Then maybe you could write code that intelligently customizes those headlines to your service.

You might be able to do the same with design. If you tried to categorize hundreds of homepage layouts, I expect you’d probably wind up with 20-30 common layouts that cover 90%+ of websites. Could you then write an algorithm that automatically tweaks your site’s layout to test these different layouts on your visitors? Could you do the same with color schemes? Maybe.

There’s also the problem of statistical significance. Running a simple two variation A/B test can take a long time if you don’t get a lot of traffic. Trying to get significant results for lots of algorithmically generated headlines might take forever. But maybe there are ways around that like the multi-armed bandit algorithm.

To sum it up, I think you’d need the following: a way to intelligently generate mutations on headlines, buttons, layout, etc + a ton of traffic or some novel method for picking the best variations + an organization that is willing to try such an experiment. It would not be easy, but I think it’s possible.

Imagine if it did work. Maybe your homepage converts users into sign ups at 10% now. You flip a switch, and in 6 months it increases your conversion rate 30% without any intervention on your part.

It would be revolutionary… if it worked.

A Simple Genetic Algorithm Written in Ruby

I recently started making my way through An Introduction to Genetic Algorithms by Melanie Mitchell. I’m not too far into it yet but I’ve been pleasantly surprised by how clearly the author explains each concept.

In the first chapter she outlines a simple algorithm that is the basis for most applications of genetic algorithms. There isn’t any code in the book so I decided to implement it on my own in Ruby to understand it better:

POPULATION_SIZE = 24
NUM_BITS = 64
NUM_GENERATIONS = 1000
CROSSOVER_RATE = 0.7
MUTATION_RATE = 0.001
class Chromosome
attr_accessor :genes
def initialize(genes = "")
if genes == ""
self.genes = (1..NUM_BITS).map{ rand(2) }.join
else
self.genes = genes
end
end
def to_s
genes.to_s
end
def count
genes.count
end
def fitness
genes.count("1")
end
def mutate!
mutated = ""
0.upto(genes.length1).each do |i|
allele = genes[i, 1]
if rand <= MUTATION_RATE
mutated += (allele == "0") ? "1" : "0"
else
mutated += allele
end
end
self.genes = mutated
end
def &(other)
locus = rand(genes.length) + 1
child1 = genes[0, locus] + other.genes[locus, other.genes.length]
child2 = other.genes[0, locus] + genes[locus, other.genes.length]
return [
Chromosome.new(child1),
Chromosome.new(child2),
]
end
end
class Population
attr_accessor :chromosomes
def initialize
self.chromosomes = Array.new
end
def inspect
chromosomes.join(" ")
end
def seed!
chromosomes = Array.new
1.upto(POPULATION_SIZE).each do
chromosomes << Chromosome.new
end
self.chromosomes = chromosomes
end
def count
chromosomes.count
end
def fitness_values
chromosomes.collect(&:fitness)
end
def total_fitness
fitness_values.inject{|total, value| total + value }
end
def max_fitness
fitness_values.max
end
def average_fitness
total_fitness.to_f / chromosomes.length.to_f
end
def select
rand_selection = rand(total_fitness)
total = 0
chromosomes.each_with_index do |chromosome, index|
total += chromosome.fitness
return chromosome if total > rand_selection || index == chromosomes.count1
end
end
end
population = Population.new
population.seed!
1.upto(NUM_GENERATIONS).each do |generation|
offspring = Population.new
while offspring.count < population.count
parent1 = population.select
parent2 = population.select
if rand <= CROSSOVER_RATE
child1, child2 = parent1 & parent2
else
child1 = parent1
child2 = parent2
end
child1.mutate!
child2.mutate!
if POPULATION_SIZE.even?
offspring.chromosomes << child1 << child2
else
offspring.chromosomes << [child1, child2].sample
end
end
puts "Generation #{generation} – Average: #{population.average_fitness.round(2)} – Max: #{population.max_fitness}"
population = offspring
end
puts "Final population: " + population.inspect

view raw
gistfile1.rb
hosted with ❤ by GitHub

The algorithm as described in the book is quoted below with my comments following each section.

1. Start with a randomly generated population of n l-bit chromosomes (candidate solutions to a problem).

In this code, n is represented by the POPULATION_SIZE constant and l by NUM_BITS. The initial randomly generated population is created using the Population’s seed! method.

2. Calculate the fitness f(x) of each chromosome x in the population.

The fitness of a Chromosome can be calculated by its fitness method. In this example, the fitness is simply the number of 1’s that the chromosome contains.

3. Repeat the following steps until n offspring have been created:

a. Select a pair of parent chromosomes from the current population, the probability of selection being an increasing function of fitness. Selection is done “with replacement,” meaning that the same chromosome can be selected more than once to become a parent.

A chromosome is chosen from the population using the Population’s select method. This method implements fitness-proportionate selection using roulette-wheel sampling which is conceptually equivalent to giving each individual a slice of a circular roulette wheel equal in area to the individual’s fitness. The roulette wheel is spun, the ball comes to resent on one wedge-shaped slice, and the corresponding individual is selected.

b. With probability Pc (the “crossover probability” or “crossover rate”), cross over the pair at a randomly chosen point (chosen with uniform probability) to form two offspring. If no crossover takes place, form two offspring that are exact copies of their respective parents.

The crossover probability is defined by CROSSOVER_RATE. The crossover is performed by the Chromosome’s bitwise AND (&) operator which I overloaded for this class.

c. Mutate the two offspring at each locus with probability Pm (the mutation probability or mutation rate), and place the resulting chromosomes in the new population. If n is odd, one new population member can be discarded at random.

The mutation rate is defined by the MUTATION_RATE constant. The mutation is performed by the Chromosome’s mutate! method.

4. Replace the current population with the new population.

5. Go to step 2.

Here are the results for 1,000 generations where the population is 24, the number of bits per chromosome is 64, the mutation rate is 0.001, and the crossover rate is 0.7. Pay attention to the average value over time:

Generation 1 - Average: 32 - Max: 40
Generation 2 - Average: 33 - Max: 38
Generation 3 - Average: 34 - Max: 41
Generation 4 - Average: 34 - Max: 37
Generation 5 - Average: 34 - Max: 41
Generation 6 - Average: 35 - Max: 39
Generation 7 - Average: 34 - Max: 43
Generation 8 - Average: 35 - Max: 45
Generation 9 - Average: 36 - Max: 44
Generation 10 - Average: 37 - Max: 44
...
Generation 990 - Average: 49 - Max: 50
Generation 991 - Average: 49 - Max: 50
Generation 992 - Average: 49 - Max: 51
Generation 993 - Average: 49 - Max: 51
Generation 994 - Average: 49 - Max: 51
Generation 995 - Average: 49 - Max: 50
Generation 996 - Average: 49 - Max: 50
Generation 997 - Average: 48 - Max: 50
Generation 998 - Average: 48 - Max: 50
Generation 999 - Average: 48 - Max: 50
Generation 1000 - Average: 48 - Max: 50

The average fitness of the population increases from 32 initially to 48 by the 1,000th generation, just as you’d expect for a population slowly becoming more fit over time.

If this interests you, I encourage you to try it yourself and experiment with different values for MUTATION_RATE, POPULATION_SIZE, and NUM_BITS.

For example, increasing the mutation rate from 0.001 (1 in 1,000) to 0.01 (1 in 100) has the following effect on the average fitness over time:

Given the impact that the mutation rate has on long term fitness of the population, how can we determine what the optimal mutation rate is? Can the chromosome’s mutation rate change over time? Can different sections of the chromosome evolve to have different mutation rates? What about the number of bits or the population size? These are a few of the things I hope to find out :)