pdfcrun.ch

Neuro-evolutionary Tetris

Thu, 17 Oct 2024 11:22:42 +0000

I want a teach a neural network how to play Tetris.

source code: https://github.com/opyate/neuro-evolved-tetris

video: https://youtu.be/rji6zOQgJZs

What are the possible approaches?

A supervised learning approach could be that I record a Tetris world champion playing lots and lots of games, then create a large labelled dataset, then train a model on said dataset. However, that’s a lot of work, and the model might generalise to play exactly like that world champion and potentially miss out on novel Tetris techniques and play styles.

A reinforcement learning (RL) approach could be that the AI agent learns by playing the game, receiving rewards for scoring and staying alive longer, and penalties for filling up the Tetris grid without clearing lines. It adjusts its actions (move a piece left or right) to maximise its score. That involves defining a policy (which is a strategy that maximizes its cumulative rewards over time) and a corresponding reward function to provide feedback for adjusting the policy. In fact, RL works quite well for game playing and robotics!

But, I want to try something a bit different, like genetic algorithms ᵃ. Specifically, neuroevolution.

Like reinforcement learning, neuroevolution allows an AI to learn through interaction with an environment, but instead of directly adjusting actions, neuroevolution focuses on evolving the neural network. Different neural networks with varying weights ᵇ are tested in the game, and the ones that achieve higher scores are selected and “bred” to create new generations of better-performing Tetris players.

Crucially, this approach will differ from the other approaches in that it doesn’t use backpropagation to update the network weights and biases (during which the network uses gradient descent to calculate the gradient of the error with respect to each weight and bias, with the gradient indicating the direction and magnitude of change needed to reduce the error). If you look at the code, you’ll only see the forward pass implemented.

So, neuroevolution it is!

Overview of experiment

Here are some high level steps that describes the simulation.

spawn thousands of bots from the simulation runner
each bot contains
- a brain, aka a PyTorch network, which has
  - 10x10 inputs which represents the Tetris grid
  - a hidden layer
  - 7 outputs which represents the possible player moves (left, right, etc)
  - random initial weights
- a Tetris engine, which
  - maintains the internal game state
  - allows 7 possible moves to modify said game state
the simulation loop now starts, and every bot predicts their next move
after each bot made their move, they get fitness points
- for staying alive,
- for scoring (i.e. clearing lines on the Tetris grid)
after each bot made their move, they might also be in a game over state
after all bots reach game over, they evolve
- evolution step 1: parents are chosen based on a weighted selection
- evolution step 2: the parents reproduce via crossover, to produce a child
- evolution step 3: the child is slightly mutated
the children becomes the brains for the next cohort of bots, with a fresh game state
the simulation loops again

The PyTorch network

I use PyTorch to create a network with 10x10 inputs to represent the Tetris grid ᶜ, and 7 outputs to represent the possible moves, which are:

up
down
left
right
rotate clockwise
rotate counter clockwise
no operation (do nothing)

Given any grid state (e.g. an empty grid with a starting piece at the top, or a mid-game grid with some stacks at the bottom), the model will run a classification task, i.e. which is the best move to make given the current state.

The Tetris Engine

I built a Tetris engine from scratch that was completely decoupled from any game ticks and rendering logic. This allows me to run a simulation as fast as my CPU ᵈ cores will allow, thereby not having to wait for one game tick every 16ms (if we assume 60FPS)

Scoring is standard Tetris scoring and I even implemented rotation using the recommended SRS (Super Rotation System) strategy, which allows for wall kicks.

The engine also detects if the bot plays many repeat moves (that aren’t up or down), upon which is forces game over.

Weighted selection

Our weighted selection function uses a relay-race technique for giving a fair shot to all members of a population, while still increasing the chances of selection for those with higher fitness scores.

It works like this:

imagine a relay race, where each bot runs a distance tied to its fitness
the higher the fitness, the farther they run
pick a random starting line (so, a random distance to the finish line)
the race begins with the first bot, and it runs a distance equal to its normalised fitness score
loop for every bot
the race ends when a bot cross the finish line
that bot is selected as a parent

In essence, every bot has a shot at crossing the finish line, but those with higher fitness can run longer distances, thus have a better chance of being selected to be a parent.

Alternatively, we could just select the bots with the highest fitness each time, but this decreases the amount of variety in the system, as lower-fitness bots might still have interesting play strategies up their sleeves.

Evolution

Crossover

This is the reproduction part of evolution. We start with a blank child network, then for every weight in the network, select either parent A or parent B’s weight as determined by a virtual coin flip.

Mutation

This mimics real-world genetic mutations, which typically introduce minor changes rather than entirely new traits.

We define a very small mutation rate (0.01) which means that only 1% of the weights in the child’s network will be changed slightly by a small Gaussian noise.

Results

After 2,000 rounds of Tetris, the bots are clearly on an upward trajectory and increasing their fitness.

The variance in the mean fitness plot can be explained by the weighted selection algorithm, in that we still sometimes pick less fit parents to reproduce, with the expectation that the mean fitness sometimes takes a hit. However, the moving average shows a clear upward trend in fitness.

Other things to try

Try alternate fitness allocation

At the moment, fitness is

increased by 1 for every tick of staying alive
BUT increased by 100/300/500/800 (depending on number of lines cleared) for scoring in the game.

A bot might score by chance, but not necessarily have learnt a strategy for staying alive longer, which gives it an outsized benefit in the simulation.

Try other weighted selection algorithms

Roulette Wheel Selection (Lipowski, 2011): Each individual is assigned a slice of a roulette wheel proportional to its fitness. A random “spin” of the wheel determines which individual is selected.
Rank-Based Selection (Whitley, D. 1989): Individuals are ranked based on their fitness, and selection probability is assigned based on rank rather than absolute fitness values.
Tournament Selection (Goldberg, D. E., & Deb, K., 1991): A subset of individuals is randomly chosen, and the individual with the highest fitness within that subset is selected.
Elitism (p101 of De Jong, K. A., 1975): A certain number of the best individuals from the previous generation are directly copied into the next generation.

Conclusion

At the time of writing this, I’m not yet seeing a bot which is clearly good at Tetris, so I’ll keep the simulation running, and revisit this post with an update.

Meanwhile, please try the experiment yourself, make modifications, and let me know what you find!

Footnotes

a. Genetic Algorithms (GAs) are a class of algorithms inspired by biological evolution. They operate on a population of candidate models (individuals), using selection (usually the fittest), crossover (recombination, or reproduction by two parent models), and mutation to iteratively improve their fitness for a given task. A battered copy of one of my uni textbooks from 1999 discusses genetic algorithms, and was one of the inspirations for this post! (That crack in the spine aligns with chapter 11, Neural Networks ;-)

b. More elaborate neuro-evolutionary techniques also modify the network’s topology (see NEAT, HyperNEAT, ES-HyperNEAT) and even hyperparameters (CoDeepNEAT) but we’re keeping things simple for now.

c. Standard Tetris has a 20x10 grid, but most of it is open space at the start of a game, so I halve it to get to learning quicker and make the underlying neural network smaller.

d. I used CPU instead of GPU, because the networks were quite small and any gains in GPU acceleration would be impacted by copying data to and from the GPU. This opens the experiment up to others to tinker with. Note that evolutionary processes can sometimes be slow to converge to optimal solutions compared to gradient-based methods, so grab a cup of tea!

Local LLM to replace Github Copilot

Mon, 04 Mar 2024 09:15:02 +0000

We investigate using a local large language model (LLM) to replace Github Copilot. We’ll use the StarCoder2 model from HuggingFace, as it is quite new and shows great promise, and we’ll host it with ollama, which just a few hours ago announced support for StarCoder2. Then we point VSCode to ollama + StarCoder2 using the Continue VSCode extension.

Ollama

We start by downloading ollama.

wget https://github.com/ollama/ollama/releases/download/v0.1.28/ollama-linux-amd64
chmod +x ollama-linux-amd64
./ollama-linux-amd64

I tried LMStudio before, but they don’t support StarCoder2 yet, and it has a known issue with older versions of Ubuntu. Ollama does support StarCoder2 as of a few hours ago:

StarCoder2

StarCoder2 was released just a few days ago, so I’m pretty curious to try it out:

Introducing StarCoder 2 ⭐️ The most complete open Code-LLM 🤖 StarCoder 2 is the next iteration for StarCoder and comes in 3 sizes, trained 600+ programming languages on over 4 Trillion tokens on Stack v2. It outperforms StarCoder 1 by margin and has the best overall performance… pic.twitter.com/LVclRcq5ZM
— Philipp Schmid (@_philschmid) February 28, 2024

From the paper, it seems like StarCoder2 performs really well:

We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages.

More specifically:

StarCoder2-3B is the best in the 3B class of current 3B base models
StarCoder2-7B comes in second place only behind DeepSeekCoder-6.7B
StarCoder2-15B is the best in the 15B class of current 15B base models by a significant margin, and is even competitive with models that are more than twice its size, like CodeLlama-34B

However, some models perform better on some programming languages, so it might be worth consulting tables 10 to 12 in the paper to see which model is best for your use case. But for the team at PDFCrunch, Python is crucial, so we consult section 7.1.3 that discusses the models’ performance on Python data science tasks (using the DS-1000 benchmark):

StarCoder2-3B overall is the best-performing small model on DS-1000. Except for PyTorch and TensorFlow (where it is slightly worse than StableCode-3B), StarCoder2-3B achieves the best performance on all the other popular libraries.
StarCoder2-7B comes in second place out of the medium models, with a performance similar to DeepSeekCoder-6.7B.
StarCoder2-15B is the best-performing large model on DS-1000. It substantially outperforms both StarCoderBase-15B and CodeLlama-13B by large margins, and approaches the overall performance of CodeLlama-34B.

Which size model to use?

Which size model you select depends on your GPU’s VRAM. The token length of the code you want to prompt with and generate would normally also be a consideration, but all model sizes have a 16K context window, using a sliding window of 4K, with FlashAttention-2, and as the models don’t use the default self-attention algorithm, large input contexts won’t exhaust your VRAM.

From the paper, confirming the 16K context window:

We start base model training with a 4k context window and subsequently fine-tune the model with a 16k context window

And

We further pre-trained each model for long-context on 200B tokens from the same pre-training corpus, using a 16,384 context length with a sliding window of 4,096, with FlashAttention-2

Does the context include both the prompt tokens and prediction tokens? During the evaluation of RepoBench, for instance, they restricted the prompt context so that the prediction had a window of 128 tokens:

We constrained the models to generate a maximum of 128 new tokens per prompt, and the first non-empty and non-comment line of the output was selected as the prediction.

And

The maximum token count for prompts was set to 15,800 by truncating excess cross-file context

I typically use OpenAI’s GPT-3.5 or GPT-4 for generating entire pages of code for languages I don’t know, but for in-line use in VSCode, a couple hundred tokens is plenty.

Using StarCoder2 with ollama from VSCode

Run StarCoder2

In a new terminal tab, run

./ollama-linux-amd64 run starcoder2:15b

(Depending on the amount of VRAM you have, you might need to run 7b or 3b, or pick a quantised version of the model .)

Install the Continue VSCode extension

While the model is downloading, install the Continue VSCode extension. Once installed, click Continue’s gear icon, and in the config.json, add the following snippet to the models section:

{
  "title": "StarCoder2 + Ollama",
  "provider": "ollama",
  "model": "starcoder2:15b",
  "completionOptions": {
    "temperature": 0.2,
    "topP": 0.95,
    "topK": 40,
    "presencePenalty": null,
    "frequencyPenalty": null,
    "stop": null,
    "maxTokens": 600
  }
}

Quick test

As a quick test, I gave the chat interface the following prompt:

Write a quick little snippet in Python that opens a file in binary mode, and prints the hashed contents.

Continue by default points to GPT-4 (Free Trial), which responded with this bit of code:

import hashlib

def hash_file(filename):
    h = hashlib.sha256()

    with open(filename, 'rb') as file:
        while True:
            chunk = file.read(4096)
            if not chunk:
                break
            h.update(chunk)

    print(f'The SHA256 hash of file {filename} is: {h.hexdigest()}')

hash_file('your_file_path_here')

Then I switched to the StarCoder2 + Ollama model, and got this response:

import hashlib
with open('file', 'rb') as f:
    print(hashlib.sha256(f.read()).hexdigest())

But, in addition to the Python code, it also exhausted it’s output token budget with what looked like a Python tutorial (which is fine - I can ignore that).

The functions output the same hash.

It’s as if StarCoder2 got straight to the point, but GPT-4 went a step further and wrapped it in a function and read the file in chunks, which probably makes for more maintainable and correct code.

However, not bad considering GPT-4 is SOTA, costs a lot more to run, and is (probably) a much larger (ensemble of) model(s). I’m excited to see what else StarCoder2 can do.

What I’m also very interested in is using my own development data to fine-tune StarCoder2 (or any model I choose to use), and it seems as if Continue has support for this (albeit possibly a paid feature?), but over here at PDFCrunch we’re quite confortable fine-tuning our own models anyway.

Another great thing about Continue is that it indexes your entire codebase, so you can ask the model high-level questions about your codebase, like “Do I use X anywhere?” or “Is there any code written already that does X?”. Powerfull stuff, and I wonder if there are limitations - you might want to index your million-line monolith or mono-repo.

Conclusion

I’ve just dropped $100 on a new year of Github Copilot earlier this week, but I’ll continue using StarCoder2-15b + ollama + Continue for the foreseeable future, and see how it stacks up.

This post was inspired by this video which uses Dolphin Mixtral + LMStudio + Continue instead:

OCR for PDF using Google Vision.

Thu, 13 Sep 2018 05:22:42 +0000

Here we show how to get Google Vision to run OCR on a PDF in Google Storage using NodeJS.

Firstly, install the SDK.

npm install --save @google-cloud/vision

Here’s the code. It will grab the PDF (or TIFF - the other supported format) from the location gcsSourceUri and once completed, put the OCR JSON in the location gcsDestinationUri.

const vision = require('@google-cloud/vision')

function ocr() {
  const gcsSourceUri = `gs://your-bucket-name/path/to/the.pdf`
  const gcsDestinationUri = `gs://your-bucket-name/path/to/ocr.json`

  const inputConfig = {
    // Supported mime_types are: 'application/pdf' and 'image/tiff'
    mimeType: 'application/pdf',
    gcsSource: {
      uri: gcsSourceUri
    }
  }
  const outputConfig = {
    gcsDestination: {
      uri: gcsDestinationUri
    }
  }
  const features = [{ type: 'DOCUMENT_TEXT_DETECTION' }]
  const request = {
    requests: [
      {
        inputConfig: inputConfig,
        features: features,
        outputConfig: outputConfig
      }
    ]
  }

  return client.asyncBatchAnnotateFiles(request)
    .then(([operation]) => operation.promise())
    .then(([filesResponse]) => {
      return filesResponse
    })
}

The response filesResponse contains some more details about the completed operation, but you can now find your OCR result in Google Storage at the defined location.