ML-Agents Platformer: Simple Coin Collector

single_coin_collector.png

Intro

In this tutorial, you’ll learn how to create a very simple 3D Platformer ML-Agent in Unity. This is inspired by games like Super Mario 64 and Super Mario 3D World where you explore the world, jumping on platforms to get coins. To keep things really simple, this tutorial will only have one platform with a coin on it. In future tutorials, we’ll expand to multiple platforms/coins.

Prerequisites

In order to keep this tutorial from being extremely long, I’m going to use a couple other free tutorials I’ve created previously as a foundation: “Penguins” and “Simple Character Controller”. If you’ve done some ML-Agents before and understand how to train a basic agent, you can skip the Penguin tutorial. The Simple Character Controller tutorial is core to this project and will teach you how to make the 3D Platformer character that we use in this tutorial.

Check out the ML-Agents Penguin tutorial here: Reinforcement Learning Penguins (Part 1/4) | Unity ML-Agents — Immersive Limit

Check out the Simple Character Controller tutorial here: Simple Character Controller for Unity — Immersive Limit

Create a Basic Challenge Area

  • Create a new scene in Unity

  • Make a new Empty GameObject that will act as a parent for other objects in your challenge area. Call it “CoinChallengeArea”.

  • Add a flat surface that has a collider on it. You could use a Plane, or maybe you want to 3d model your own. I made a desert terrain in Blender.

  • Create a box that the agent can jump on top of and call it “Platform”. Mine is 0.5 m tall, 1 m wide, 1 m deep.

  • Create a coin, by either using a Cylinder object in Unity or importing a 3d model, and place it on top of the Platform. Call it “Coin”.

  • Make the Coin a child of the Platform, so that it moves around when the platform moves

  • Remove any existing colliders on the Coin and put a Sphere Collider on it. Set IsTrigger to True. Mine has a radius of 0.2 m.

platform_with_coin.png

I also created a small “Collectible” script attached to my coin collectable that rotates it. This is optional, but looks cool.

using UnityEngine;

public class Collectible : MonoBehaviour
{
    void Update()
    {
        transform.Rotate(Vector3.up, 60f * Time.deltaTime, Space.World);
    }
}

Create The Agent

  • Follow the tutorial here to make a simple character and add it as a child of the CoinChallengeArea: Simple Character Controller for Unity — Immersive Limit

  • You should not attach an InputController script (as the tutorial teaches) because our agent class will handle inputs

  • Make sure Allow Jump is enabled.

simple_character_with_jump.png

The Character is going to represent our Agent. It’s similar to creating a normal playable character in a game, but this one will be controlled by machine learning (hence the name ML-Agents).

SimpleCollectorAgent.cs

Now that you have a character, we can write a new script that will control it.

  • Create a new C# script called “SimpleCollectorAgent”

  • Remove the existing code and replace it with the following

using Unity.MLAgents;
using Unity.MLAgents.Actuators;
using UnityEngine;

public class SimpleCollectorAgent : Agent
{
}

Note that we’re inheriting from the “Agent” class, which is part of the Unity.MLAgents package. This contains a ton of functionality related to AI training and decision making. We’re going to build on top of that and override some functionality in this class.

The remaining code will go between the curly brackets { }

Start by adding some variables at the top of the class.

  • platform - will give us access to the platform the coin will sit on and allow us to randomize the position

  • startPosition - we’ll keep track of the position of the agent when play starts, so that we can reset it there

  • characterController - a reference to the SimpleCharacterController script attached to the character

  • rigidbody - a reference to the rigidbody for physics interactions

    [Tooltip("The platform to be moved around")]
    public GameObject platform;

    private Vector3 startPosition;
    private SimpleCharacterController characterController;
    new private Rigidbody rigidbody;

Add an override function called Initialize(). This is called automatically by the Agent class and is intended for us to set up our agent.

    /// <summary>
    /// Called once when the agent is first initialized
    /// </summary>
    public override void Initialize()
    {
        startPosition = transform.position;
        characterController = GetComponent<SimpleCharacterController>();
        rigidbody = GetComponent<Rigidbody>();
    }

Now add an override function called OnEpisodeBegin(). Training happens in episodes. In this project, the episode ends when time runs out or when the coin is touched. So any reset logic needs to happen here. We reset the agent position, turn it in a random direction, and reset its velocity. Resetting the velocity is good practice with ML-Agents just in case the agent finds a way to fall through the floor

    /// <summary>
    /// Called every time an episode begins. This is where we reset the challenge.
    /// </summary>
    public override void OnEpisodeBegin()
    {
        // Reset agent position, rotation
        transform.position = startPosition;
        transform.rotation = Quaternion.Euler(Vector3.up * Random.Range(0f, 360f));
        rigidbody.velocity = Vector3.zero;

        // Reset platform position (5 meters away from the agent in a random direction)
        platform.transform.position = startPosition + Quaternion.Euler(Vector3.up * Random.Range(0f, 360f)) * Vector3.forward * 5f;
    }

Next we need a Heuristic() function which allows us to feed actions into the agent. The default is to allow the neural network to control the actions, but in our case, we’d like to be able to control the agent with the keyboard for testing purposes. It’s always a good idea to test out your challenge manually to make sure it works the way you expect.

    /// <summary>
    /// Controls the agent with human input
    /// </summary>
    /// <param name="actionsOut">The actions parsed from keyboard input</param>
    public override void Heuristic(in ActionBuffers actionsOut)
    {
        // Read input values and round them. GetAxisRaw works better in this case
        // because of the DecisionRequester, which only gets new decisions periodically.
        int vertical = Mathf.RoundToInt(Input.GetAxisRaw("Vertical"));
        int horizontal = Mathf.RoundToInt(Input.GetAxisRaw("Horizontal"));
        bool jump = Input.GetKey(KeyCode.Space);

        // Convert the actions to Discrete choices (0, 1, 2)
        ActionSegment<int> actions = actionsOut.DiscreteActions;
        actions[0] = vertical >= 0 ? vertical : 2;
        actions[1] = horizontal >= 0 ? horizontal : 2;
        actions[2] = jump ? 1 : 0;
    }

Now we’ll add an OnActionReceived() function that will process actions (whether from a human player or the neural network).

    /// <summary>
    /// React to actions coming from either the neural net or human input
    /// </summary>
    /// <param name="actions">The actions received</param>
    public override void OnActionReceived(ActionBuffers actions)
    {
        // Punish and end episode if the agent strays too far
        if (Vector3.Distance(startPosition, transform.position) > 10f)
        {
            AddReward(-1f);
            EndEpisode();
        }

        // Convert actions from Discrete (0, 1, 2) to expected input values (-1, 0, +1)
        // of the character controller
        float vertical = actions.DiscreteActions[0] <= 1 ? actions.DiscreteActions[0] : -1;
        float horizontal = actions.DiscreteActions[1] <= 1 ? actions.DiscreteActions[1] : -1;
        bool jump = actions.DiscreteActions[2] > 0;

        characterController.ForwardInput = vertical;
        characterController.TurnInput = horizontal;
        characterController.JumpInput = jump;
    }

Finally, we can add an OnTriggerEnter() function, which will be called by the Unity physics system any time our agent collides with a trigger. Our collectible coin will have a trigger collider on it and will be tagged “collectible”, so we can check for that and reward the agent for good behavior before ending the episode.

    /// <summary>
    /// Respond to entering a trigger collider
    /// </summary>
    /// <param name="other">The object (with trigger collider) that was touched</param>
    private void OnTriggerEnter(Collider other)
    {
        // If the other object is a collectible, reward and end episode
        if (other.tag == "collectible")
        {
            AddReward(1f);
            EndEpisode();
        }
    }

Character/Agent Setup

Now we’ll do some more setup of the agent. This part seems very complicated, but I promise with some practice it gets a lot easier to understand.

  • Attach the SimpleCollectorAgent script to the Character object (which should already have a Capsule Collider, Rigidbody, and SimpleCharacterController on it).

  • This will automatically add a Behavior Parameters script as well

For Behavior Parameters, set the following values:

  • Behavior Name: “SimpleCollector”

  • Vector Observation Space Size: 0

  • Stacked Vectors: 1

  • Continuous Actions: 0

  • Discrete Branches: 3

  • Branch 0 Size: 3

  • Branch 1 Size: 3

  • Branch 2 Size: 2

  • Model: Leave empty, we won’t have a model until after training completes

  • Everything else left as default

behavior_parameters.png

For SimpleCollectorAgent

  • Max Step: 4000

  • Platform: Drag the Platform object from the Hierarchy tab into this box to create a reference.

simple_collector_agent.png

Now we want to add a RayPerceptionSensor3D component. Think of this like LIDAR. It shoots out raycasts from the agent and informs the agent whether they hit something, what tag they hit, and how far away the hit was. We want our agent to “see” the Platform and the Coin, so we need to add tags to those objects and then we can configure the RayPerceptionSensor to look for them.

  • Create new tags called “collectible” and “platform”

add-tags.png
  • Assign the “collectible” tag to the Coin object

  • Assign the “platform” tag to the Platform object

Set up the following values on the RayPerceptionSensor3D component:

  • Set Detectable Tags Size to 2

  • Element 0: collectible

  • Element 1: platform

  • Rays Per Direction: 6

  • Sphere Cast Radius 0.1

  • Ray Length: 20

  • Start Vertical Offset: 0.4

  • End Vertical Offset: 0.4

ray_perception_sensor.png

Finally, and this is really important, we need to add a Decision Requester component. The Decision Requester automatically tells our agent to make decisions and take actions at a specified period. If this script isn’t attached, your agent won’t do anything.

  • Decision Period: 5

  • Take Actions Between: Enabled

decision_requester.png

Test the Agent

Make sure the agent is placed on the ground in your CoinChallengeArea. A good spot for it is 0, 0, 0. The Platform will automatically be placed 5 meters away from the agent in a random direction, so place the Main Camera of your scene somewhere so that the agent and platform are both visible.

At this point, you should test out your agent to make sure it works. Control agent movement with W, A, S, and D keys on your keyboard and jump with the Spacebar. If you jump on the platform and touch the coin OR stray too far from the start point, the episode should end and reset. If your scene isn’t working correctly, you may need to make some adjustments.

Training

I’ve taught training several times, so I’m going to skip it here. The Penguin tutorial teaches this here Reinforcement Learning Penguins (Part 4/4) | Unity ML-Agents — Immersive Limit.

Just like in the Penguin tutorial, you should duplicate your CoinChallengeArea multiple times (8 to 16 is probably good) to speed up training.

Config YAML

You will need to create a SimpleCollector.yaml file for training configuration (external to Unity). There are lots of these in the ML-Agents GitHub repository if you need other examples.

Here’s what I used. Note that I didn’t really optimize this, so I’m sure it could be improved upon.

behaviors:
  SimpleCollector:
    trainer_type: ppo
    hyperparameters:
      batch_size: 128
      buffer_size: 2048
      learning_rate: 0.0003
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: false
      hidden_units: 256
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 20000000
    time_horizon: 128
    summary_freq: 20000
    threaded: true

Python Training Command

The Python training command should be something like this:

mlagents-learn "<path to config>\SimpleCollector.yaml" --run-id sc_01

Run this command from your Python/Anaconda environment, press Play in the editor, and let it train.

It took less than 10 minutes to train on my machine with 8 duplicates of the area.

tensorboard.png
training.png

License

All code on this page is licensed under the MIT license.


MIT License

Copyright (c) 2021 Immersive Limit LLC

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.