Pathfinding in Video Games
Pathfinding naturally arises in a variety of scenarios, whether it be navigating a labyrinth or determining the best route to get across campus. This is a relatively trivial problem for the human brain to tackle. For a computer however, it is not so simple to find the best path between two points. In video games, for example, an enemy must find the best path to avoid obstacles and pursue the player. There have been algorithms designed to tackle this problem in the past, but until the emergence of neural networks, these algorithms lacked efficiency.
The path planning problem
A* search is a popular search-based planning algorithm that is guaranteed to identify a path between two points, as long as one exists. Neural A* search improves upon the classic A* search algorithm through the use of a convolutional neural network, a common tool for visual pattern recognition. A fully-convolutional encoder is used to transform the problem into an initial guidance map with guidance costs for each move. The standard A* search is adapted to produce a differentiable A* module which allows back-propagation to train the model. By the end of training, Neural A* is able to detect visual cues like dead ends and shortcuts to better find the shortest path.
In the path planning problem, the environment that must be navigated can be represented as a 2D binary map \(X\), where each entry is either \(1\) if it is a passable location or \(0\) if it is obstructed. The start and goal locations \(V_s\) and \(V_g\) are matrices of the same size as \(X\), but all the values are \(0\) except for at the specific entry representing respective start or goal position, where the value is \(1\). This type of matrix with \(0\)s everywhere except for at a certain entry is called a one-hot matrix. Lastly, the open list \(O\) tracks the nodes that are currently available to be explored, and the closed list \(C\) tracks the nodes that have already been explored.
Choosing the next move
The algorithm works by choosing the most promising candidate that is reachable from the current location, using the following equation:
$$V^* = \mathcal{I}_\max \left(\frac{\exp\left(-\frac{G + H}{\tau}\right) \odot O}{\left\langle \exp\left(-\frac{G + H}{\tau}\right), O\right\rangle}\right)$$In this equation, there is
-
The guidance cost \(G\) provides the cost of exploring other nodes from the current one (based on the guidance map)
-
The heuristic function \(H\) is composed of the Chebyshev distance (the max of the distances in the \(x\)- and \(y\)-directions) from the current node to each other node, plus a scaled down Euclidean distance (straight-line distance) for tie-breaking
-
The temperature \(\tau\) is empirically found to be the square root of the map width
These values are combined in an exponential and applied to the open list \(O\) in an element-wise multiplication, then normalized by the inner product (the sum of all the element-wise multiplications). \(\mathcal{I}_\max\) finds the maximum value within the resulting matrix and represents it with a one-hot matrix that contains the location of the most promising node to explore.
Updating values
The selected node is removed from the open list and added to the closed list, effectively marking it as “explored.” The list of passable, unexplored neighbouring nodes is expanded by the following matrix operations:
$$V_{nbr} = (V^* * K) \odot X \odot (\mathbb{1} – O) \odot (\mathbb{1} – C)$$
where the convolution
$$V^* * K := V^* * \begin{bmatrix}1 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 1\end{bmatrix}$$
applies a filter to produce a matrix where each entry contains the number of passable neighbouring nodes. The element-wise multiplication
$$X \odot (\mathbb{1} – O) \odot (\mathbb{1} – C)$$
masks the map and removes all nodes that are in either the open or closed list. Multiplying the convolution and mask element-wise produces a map of all the newly-found neighbours that can be explored. Each node in \(V_{nbr}\) is added to the open list \(O\). There is also
$$\bar V_{nbr} = (V^* * K) \odot X \odot O \odot (\mathbb{1} – C)$$
which indicates the neighbours that are already in the open list.
From here, the guidance cost \(G\) must be updated based on the selected node. The cost \(G’\) for neighbouring nodes is defined as
$$G’ := \langle G, V^*\rangle \cdot \mathbb{1} + \Phi$$
which is essentially the current guidance cost plus guidance cost of a single step towards each neighbour. The actual guidance cost \(G\) is updated to be
$$G’ \odot V_{nbr} + \min(G’, G) \odot \bar V_{nbr} + G \odot (\mathbb{1} – V_{nbr} – \bar V_{nbr})$$
The neighbours \(V_{nbr}\) to be explored are assigned corresponding values from \(G’\), and the neighbours \(\bar V_{nbr}\) that have been explored are assigned to the lower of the two guidance costs at each location. These two terms replace the values at their locations in the original guidance cost.
Reaching the goal
The algorithm repeatedly selects the most promising node to explore, following the previous steps, until the goal has been reached, i.e. once the goal is in the closed list \(C\). The closed list contains all the explored nodes and represents the path taken. To train the model, the path \(C\) must be compared with the ground-truth path \(\bar P\), a path verified to be the shortest path by an optimal planner like Dijkstra’s algorithm. The accuracy of the path \(C\) is quantified by the loss function
$$\mathcal{L} = \frac{\|C – \bar P\|}{|\mathcal{V}|}$$
which simply takes the difference between the path found by the algorithm and the ground-truth path, normalized by the number of locations \(|\mathcal{V}|\) in the map. This function penalizes both false positives and false negatives in the path.
Training the model
The encoder is responsible for transforming the actual environment of an instance of the path planning problem into the guidance map, with guidance costs \(G\), upon which the differentiable A* module works. However, the encoder must be trained to properly identify visual cues and generate an effective guidance map.
The training involves the process of gradient descent (think rate of change) which uses back-propagation to efficiently identify how to update the weights to minimize the loss function. The loss function is back-propagated through each step of the search algorithm back to the encoder. To stabilize the training process, certain matrix values are removed from steps of the back-propagation chain to effectively disable their gradients. This process allows the encoder to learn visual cues, like the shape of dead ends, to enable accurate and efficient path planning.
Extensions
The introduction of convolutional neural networks to A* search vastly improved its efficiency and effectiveness. Neural A* can recognize visual cues to identify the shortest path between two points, while avoiding some of the brute force involved in the original A* search. In addition to the binary nature of a video game map, Neural A* can be adapted to work with raw images from the real world, extending its use case to autonomous vehicle navigation and even robotic arms. Needless to say, Neural A* is a revolutionary change from previous path planning algorithms, and has a wide range of applications, both in video games and real life.
Learn more: