Tools: Why?

What’s faster? To build a tool and then perform a task or to attempt to perform that task without the tool?

What about debugging? Is it better to build a tool to help debug an issue or is it better to attempt to debug it with what you have?

What about performance? Is it better to build a tool to measure performance or to just go for what seems like it’s the slowest?

I have found that in nearly any non-trivial case, it is better to create tools. Let’s look at some examples.

Making the Tedious Easier: The Rock Wall

There was one day where I was working on putting an art asset into a level. It was a rock wall split into three art assets: The top, middle and bottom section. In order to put this into the game, I was creating a separate entity for each section. Then, I’d place one large entity over the whole group that had a collider on it.

It was a complete mess. It was error prone because I had to perfectly line up all the pieces. It was time consuming because the artists always noticed the slightest misalignment and there were a ton of these walls in multiple levels.

Quick aside: The artists will always notice. Don’t be lazy. If you scale an image up, they’ll notice. If you don’t align something perfectly, they’ll notice. If you use an asset for anything but what it was made for, they’ll notice. Don’t mess with them. It’s literally their only job and if you put it in wrong, they look bad. They. Will. Notice.

Anyways, the point was it was taking forever and it was mentally taxing. Could I have done all of it that way? Sure, but is that a good use of my time? I’m suppose to be putting the level together. Should 30% of my time be aligning these stupid rocks? What if I want to test something, but I have to move the rocks in order to test it? There’s a pretty decent chance I’ll just not because it’s too much work.

Be very keenly aware of that feeling. If you ever think “Eh, it’s too much work to try that”, look into your team’s processes. That’s generally a problem. You should be experimenting as much as you can.

In the end, my teammate next to me saw my misery and offered to make a tool to help put them in. He’d use a graphics trick to tile the rocks at exactly the right size and even have the collider on the same entity. I didn’t want to bother him, so I said it was fine and continued on with my work.

That lasted about 3 more minutes. About 20 minutes later he finished the tool. The level was completed the same night.

Identifying Heisenbugs: The Physics Bug

For the rest of your life, be prepared to hate physics in video game development. They cause all the problems. They have a bajillion tiny edge cases. Bugs will happen once and disappear off the face of the Earth until submission. The custom APIs out there are written and thrown into a code obfuscator. The debugging tools for them is an ascii middle finger. They’re often one of the slowest parts of the game. They involve all kinds of algorithms that you look at and just nope the hell out of. It’s a nightmare.

Well, guess what. Someone’s someday is going to come up to you and say these words:

There’s a bug in physics.” ~Someone who’s problem it isn’t

Story time.

Our junior year game was a voxel based engine. Think Minecraft but with guns/parkour and it doesn’t look like blocks. Well, we had to create a bunch of colliders for the world as it was being destroyed. This involved threads doing all kinds of stuff, colliders being swapped out, Bullet physics (the API) being notified, etc. The bug was stated as such:

Eventually, the game just crashes. It’s always in physics. No one knows why.

Sweet. On it boss.

You know a bug sucks when the description has the introductory clause “eventually”.

After a few weeks of being annoyed by this bug, I decided to take a crack at it. The stack trace was absolutely useless. There was no hint as to what entity it was representing.

What was worse was that it happened at random times. Sometimes it would happen after a few minutes, sometimes after a few seconds. Bigger changes tended to cause it to happen faster, but not always! Sometimes the game would still last multiple minutes. Sometimes it would happen if you wall climbed. Sometimes if you just fired the gun. Nothing was consistent.

Do yourself a favor. Try to recognize that scenario in your own project. When the problem seems random, create ways to make it not. Break the problem down to the simplest possible case. Cause the crash and repeat it multiple times to be sure. Remember, computers aren’t random. You are handing it different inputs! Whether it’s because the CPU is taking you off at a different time, a thread is beating another thread or you’re slightly changing the timing of your inputs, it’s different! If you can find ways to make your game deterministic, it will help solve any bug.

I took about three hours creating a system to mock inputs deterministically. I literally created my own rand() function so particle effects using rand() wouldn’t mess it up. It was partially built for stress testing and partially for debugging. I set the system to apply the same explosions in the world over and over again. I quickly found a deterministic set of explosions that would cause the game to crash.

I poked the physics programmer, turned on collider drawing and let him get to work. Within 20 minutes, a bug plaguing the team for nearly a month was solved.

Imagine if we had this system from the start. This bug would never have lasted a day.

The more tools you have at your disposal, the faster problems can be identified and solved.

The Performance Boss Fight: 1000 Frames of Pain

The faster your game can run, the more wizardry you can invoke.

Want your game to look beautiful? Going to need more particle effects, shaders and lights. Want to add that new crazy meteor shower spell? Going to need to handle lots of entities/colliders in the space.

This is the fun part of performance testing. Setting a ridiculous bar that you don’t think you can pass, watching the game chug away at 1 second per frame, and having it hard lock your computer. The best is coming back after a major performance upgrade to the game and clowning on that test with triple digit FPS on a potato.

The input mocker from before came back multiple times. Our favorite was the Thousand Frames of Pain test. In our game, the worst possible case for performance is if the world is a checkerboard of voxels. It would result in the maximum number of collider edges and faces to draw. There was one tool that was incredible at doing this. The plane tool. Set that thing to a thickness of 1 and watch the performance hit the floor.

The Thousand Frames of Pain was 1000 frames of doing the most expensive possible operation repeatedly every single frame at the biggest possible sizes. It would take about 2-3 minutes to actually finish (1000frames / 60FPS = 16.66s for reference).

This doesn’t identify a problem. It basically is a big fancy tool that says your code sucks. That’s fine. It’s a bar to cross. It gets your team thinking about how to solve a defined problem.

The player isn’t near the planes, do we really need to build colliders for that?”

“We spend a lot of time building colliders up front. Could we put them in a queue instead?”

“A lot of the voxels are already there from previous applications. Do we need to resend them to graphics/physics?”

The key to this tool is the mindset it puts people in. When a challenge is put before programmers, people start thinking of how to solve it.

We had another test where it went nuts with the particle effects. It would crash the game. So, we made it queue the particles if there were too many, and deleted/reused them more efficiently.

We did the same with our models. And our lighting. And level loading. And shadows. We did stress tests for every thing in the game.

It resulted in our game being incredibly fast and able to pull off amazing tech. One level literally builds itself in front of you. Without these tests, that would never have happened.

Identifying Performance Issues: Silent Problems

If you don’t have the ability to see the average run times of all systems in your game displayed on screen during gameplay, you are going to make a much worse game than you would have.

Do you know how to do the moving average? No? Look it up. Then just store the last 10 frames and slap that moving average on screen. Ours looked something like this:

Total:  10.87001ms
Graphics: 8.33ms
Physics: 2.34ms
Gameplay: 0.2ms
Voxels: 0.00001ms
Metrics: 0.0ms
Resource: 0.0ms
Volumetric: 0.0ms

It’s important to have this displaying at run time and not just in some file afterwards. While that can be helpful, knowing roughly where and why performance dips occur is absolutely vital to solving problems.

There’s no big story to this. This helped so often that the whole game is basically one big story of this one tool helping debug everything.

Why are we lagging?

Well, when we look at metallic materials, the performance completely tanks. If we look at paper, the performance goes back up.

Welp, it’s SSR then. (Screen space reflections)

Why are we lagging?

Well, when the level loads in, physics hits 18ms per frame and slowly goes down to 0.

Welp, it’s how we’re building our colliders.

If you don’t have this in your game, stop. Right now. Go put it in your game.

I mean it.

Leave.

In fact, even better, let’s talk about how to make some easy tools in the next one.