The Manual Tracking Reflex—a Case Study of a Highly Skilled Quake Player
So far, we’ve been talking about particular type of reaction. Something suddenly happens in the environment, and we respond to it with a movement. An enemy pops out from behind a corner, and we press the fire button. A loud sound is played and we explode off the starting block. But in many situations, we are engaged in a continuous manner to real time changes in the environment.
Imagine there is an object hovering in your room, moving around in a complex pattern, and your task is to point at it and track it with your finger. It may change speed or direction and you have to keep your finger aimed at it as best you can. This task is very similar to what happens in many gaming situations, and people who are excel at it are real forces to be reckoned with. The lightning gun (known as the “LG”) in Quake Live is a classic example.
This weapon fires a “beam of electricity” at a rate of 20 “cells” per second, with each cell dealing out a specific amount of damage. It’s a hitscan weapon, which means that, unlike projectile weapons, each shot reaches its target instantaneously. So as long as you have the crosshair aimed at a target, that target will receive damage the moment you press fire (ignoring the vagaries of internet latency and netcode—see this video for an excellent introduction to this topic, presented by two Overwatch developers). In quake, people who are good with the LG can be particularly devastating. Here’s a nice demonstration of how powerful a good LG can be.
One of the interesting things about using the LG (or other similar weapons) is that when you are “in the zone”, it feels almost as if your hand is moving by itself to control the mouse. It’s a fascinating and somewhat strange experience, and one that is shared with a number of skilled players with whom I’ve talked about it. There’s a feeling of effortlessness, and a distinct sensation of observing the mouse moving rather than of consciously controlling it, very similar to descriptions of the ideomotor phenomenon.
Recently, I started wondering whether this experience points to a highly efficient closed loop control circuit in the brain that bypasses cortical processing normally associated with conscious perception. This would certainly explain both the sensation of this experience, and the incredibly fast reactions required for this level of performance. So I decided to run an experiment with a friend who goes by the handle ‘kukkii’, and who has excellent tracking aim.
I had him stand still and fire the LG at me while I dodged left and right. As he was standing still, the only way he could track me was by using the mouse, rather than by using the keyboard to move left and right to help track me. I did my best to make my dodge patterns as unpredictable as possible. Here is footage of a frag at full speed from his point of view (I’m the green guy dodging left and right, and the beep sounds indicate that a cell has hit me).
Here is that same frag slowed down:
The red vertical line indicates the enemy position, and the white vertical line indicates a landmark point in the environment (I extracted each frame of the video and then did some image processing to obtain these lines for each frame). These two lines allow me to find, for each frame, the position of the crosshair, relative to this landmark (crosshair is always in the middle of the frame), and the position of the dodging enemy, relative to the landmark. This then allows me to compare the position of the dodging enemy and the position of the crosshair, all across time.
This is what that position information looks like:
These plots show the position (in pixels of the video frames) of the enemy and crosshair over time. Looking at the red curve, you can see that the enemy changes direction seven times. And if you look carefully at the peaks of these curves, you can see that the blue curve lags the red curve by a small offset. The amount of lag here is directly related to the tracking reaction time. If the player was able to instantaneously lock on to the enemy position without any delay, so that when the enemy changed positions, the crosshair would change position at exactly the same time, then there would be no offset. By measuring this offset, we can actually measure the tracking reaction time.
But before we do that, there are a couple things we need to do to the data. First, we need to filter it to remove the high frequency noise. The image processing used to extract the position information isn’t perfect to begin with (if you look carefully in the slow motion video, the red and white vertical lines are a bit shaky). Human movement is also not perfectly smooth, and we want to discount information that isn’t relevant to the question we’re asking. Filtering out this “noise” means we have a cleaner signal to work with.
Second, there’s a problem with using position information as a basis for extracting reaction time. In Quake Live, enemies do not change direction instantaneously. Instead, the game engine incorporates inertia, so that when you press a key to change direction, you decelerate over a period of time before changing direction. If someone is tracking a target that is changing direction, they might be able to notice this deceleration before the target actually changes direction. And this would provide an advance warning, or cue, that could be exploited.
For example, if it takes 200 ms from the moment a target starts to decelerate to the moment the target switches direction, and it takes a player 200 ms to react to a cue, then that player would be able to change crosshair direction at exactly the same time the enemy switched direction. And in such a case, the peaks of the position waveforms would line up perfectly. A solution to this problem is to instead look at the acceleration waveforms.
Here is the same data, but now filtered and transformed into acceleration (in pixels/s2):
Now we just need to figure out a way to measure the offset between these two curves. One way to do it would be to draw a vertical line at each peak, and count the time between neighboring peaks. It’s unlikely that the offset between each peak would be identical, but we could take an average. A more elegant solution that achieves something very similar is to measure what’s called the cross correlation.
This is a very cool technique that measures the correlation between two signals as a function of the offset between them. So in the above example, if you leave the curves as they are (offset = 0), and measure the correlation between them, you’ll get a particular value. If you then “slide” the red curve to the right by one unit (offset = 1) while keeping the blue curve in the same position, and measure the correlation, you’ll get another value. You keep doing this, sliding one curve over the other, measuring the correlation each time you do this.
Here’s what the result of such a procedure looks like when applied to the acceleration curves:
This shows the correlation for each of those offsets we talked about. The dashed vertical line shows the offset that produced the peak correlation. This is telling us how much we had to slide one curve over the other in order to achieve the best match. And for our purposes, this is a great way of determining the reaction time of the player. For this frag, this was a value of 112.5 ms, which is a very fast reaction time. (Note: cross correlation is also the technique I used to detect the enemy and landmark positions in the image processed video, except there it was a two dimensional cross correlation, where the “signal” was slid over the image to find the best match).
Here’s another example. First the full speed footage:
And now the image processed footage:
Here is the data from this frag:
The reaction time for this frag is 87.5 ms!
Using this technique, I analyzed four separate frags from kukkii, and got the following reaction times:
- 112.5 ms
- 116.6 ms
- 87.5 ms
- 87.5 ms
Keep in mind that each of these values represents something of an average to begin with, since there were multiple individual reactions during each frag (multiple peaks in the plotted curves).
Now there are a few limitations of this approach. First, it’s possible that kukkii was using pattern recognition rather than pure reflex to guide his aim. I was doing my best to generate unpredictable, random strafe patterns when I was dodging, and this is something I’m generally quite good at, but a proper scientific test would require that the target movement is controlled by a random number generator rather than a human. To this end, I have some interesting ideas about how to achieve this in the future.
Second, it would be much cleaner to extract position data directly from the game demo file, rather than using image processing (this can be potentially achieved using UberDemoTools). For one, a lot of that high frequency noise would be eliminated (the jitter in the white vertical line for example). Also, using the position of a single vertical line relative to image frame coordinates to estimate actual position in game world coordinates has limitations.
For example, if a player is moving towards the left in a straight line and at a constant velocity, the projection of this player onto the image is not linear. As the player reaches the edge of the frame, the projection will appear to move slower and slower to the left, even though the player is moving at a constant velocity. This is because objects that are further away from the “camera” appear smaller (we seem to take for granted that objects far away from us appear smaller, but this is really only due to the way that the lenses of our eyes manipulate light).
Because of this, there is a compression of distance as things move further away from us, whether they’re moving to our left and right, or directly away from us. This problem was reduced by using a relatively narrow field of view (FOV) when doing the image processing (the in game footage is zoomed in), but it is something worth mentioning. Finally, the measurements are limited to the precision of the tickrate of the game server (40 Hz), the frame rate and refresh rate that kukkii was running Quake Live at (250 fps @ 144 Hz), and the frame rate used to capture the action (120 fps).
If it turns out that kukkii, and others like him, are genuinely able to achieve reaction times of ~90 ms when tracking targets, then this could change our understanding of the limits of human performance (and if this finding does indeed survive more rigorous scrutiny, then I propose the name “manual tracking reflex“).
Sound may also play a role here. When the beam is hitting the target, there are a series of beeps that are heard. Perhaps the sudden absence of the beeps that occurs when enemy dodges away from beam is an auditory cue that can be used to enhance reaction time. Testing with and without sound would be important to more fully understand this phenomenon. There is one more piece of evidence in favor of the manual tracking reflex being something distinct from a simple reaction task: When I tested kukkii in a simple detection task, his average reaction time was about 220 ms, which is a bit better than average, but certainly not spectacular. This marked difference in performance between the two tasks is suggestive of two distinct underlying neural mechanisms.
Ok, so what does any of this have to do with input lag? Even if we assume that the upper limits of human reaction time are around 85 ms, this is still at least an order of magnitude greater than the input lag of many displays. Well, here’s a great video from Microsoft that shows how noticeable 10 ms of input lag is. Also, consider this observation from the human benchmark website:
It’s interesting to see that the recorded reaction times have actually gotten slightly slower over the years, which is almost certainly due to changes in input / display technology.
(source)
What they’re referring to here is the change from CRTs (cathode ray tubes) to LCDs (liquid crystal displays). CRTs have an incredibly low latency, essentially limited by the speed of electrons. Prad.de measured the time between the moment a CRT receives information from the VGA cable to the moment a photo diode detected light from the phosphors as 670 nanoseconds!
There are 1000 nanoseconds in a microsecond, and 1000 microseconds in a millisecond, so 670 nanoseconds is about half a thousandth of a millisecond! CRTs are rarely used these days, although a few people, myself included, enjoy them tremendously. Modern LCDs are much faster than their predecessors, and a good gaming LCD will have less than 5 ms of display latency. But to return to the question we posed earlier, does it really matter if one display is 5 ms faster than another? To help answer this question, I’ve run two separate simulations to see how display latency affects in-game performance…