First off, let’s go over the wide variety of tests that can be performed. I swear the YWSSS will be worth it.
Manual Test
This is probably what you do if you don’t write test. For the lazy programmer, this is opening up the game, executing the actions they’re working on, and checking if it did what they expect. It’s error prone, slow, and not repeatable from person to person.
In the real world, these actually exist. Some people are hired at companies to take a spread sheet, do all of the commands it says to do and report back what happened. It’s terrible. It slows release cycles a ton. It’s, again, error prone. And most importantly, it’s hard to repeat. Why would you want a test that costs 15 bucks an hour and only works 9-5 on weekdays?
Unit Test
This is the most basic test possible. Unit tests verify small bits of functionality work. For example, a unit test for a vector might ensure that if you push_back() the value you pushed back is in the vector. Another test might ensure that you can randomly access from anywhere in the vector and get the right value. Another might ensure that the vector starts empty. You can also test that an error does occur when you feed it certain inputs (NaN, nullptr, popping from an empty stack, etc).
Another important factor is the idea of equivalence classes. In short, test the boundaries and special cases and assume everything in between works because it’s basically the same test.
For example, lets say a function takes an integer and prints out a letter grade. We could test it by sending it the numbers 0-100 and ensuring every single one was correct. The problem is that this does a lot of extra testing that we don’t need. All we care about are the numbers that are on the boundaries of changes. For example, 69 and 70 are two different grades, so we’d want a test for each. We don’t need a test for 71 because there’s no change between 70 and 71.
Another useful metric I learned from my host at Google is thinking of the important cases for a particular data type. If you see an integer, think INT_MIN, -1, 0, 1, INT_MAX. If you see a string, think empty, one letter, two letters, a lot of letters. If the contents of the string matter, think symbols, letters, numbers, punctuation, spaces and unprintable characters. If you see a float, think float_min, -1, –DENORM, +DENORM, NAN, 0, 1, a decimal, a negative decimal and float_max.
This is incredibly helpful during interviews! Interviewers want to see if you can spot edge cases and test for them!
Unit tests are particularly helpful for engine, physics and tools programmers. Don’t try making an entity parenting system without a unit test framework. Don’t make a history system either. They will help you not only create those features, but maintain and expand upon them! After six months of not using a system, does it still work? Run the tests!
Integration Test
It’s a functional air drier. It’s a functional trash can. Put them together, paper everywhere.
An integration test simply takes two separate systems and ensures they work together. When you parent objects together, does their transform update? If you combine your history system with the job system, does it still work?
End to End Test
This is the complete run through. For example, can you actually render that cube with lighting that so many systems work together to create?
It’s pretty tempting to make this a manual test, but please, for the love of Mead, please don’t. There’s many ways to automate a lot of this!
In graphics, you can set up a scene with predefined objects and ensure when you run the test that it creates exactly what you expect. You don’t have to take screenshots and pixel match them (though that does exist in the industry and it’s powerful!). Just ensuring it looks right is more than enough. At least now you have a repeatable way to get the same result, rather than opening up an editor and manually placing objects into some level like a Ted.
In gameplay, you may need an input mocking system. For some games, this just isn’t feasible, but for many it can be. What’s nice is that once you implement it, you never have to do it again. And now you can do more deterministic testing and ensure your game actually works the way you expect it to!
For games that are a bit more complex, it’s good to set up levels specifically designed to test certain mechanics. We had multiple dev levels designed for just that purpose, whether it was pickups, weapons or parkour. This is still a manual test, but it’s better than nothing.
It may be worth your team creating frameworks that allow you to manipulate the game more finely from code. For example, being able to place the player in specific locations, forcing them to look at certain points and forcing them to use skills. If you want to ensure weapons work, those are the only functions you actually need! Having those utilities makes it much easier to test your code. It also makes it easier to write cheat codes near the end of the project!
Physics should absolutely have a suite of tests that can be run. I highly suggest creating a level where you can swap between different simulations/tests at the press of a button. Keep a list of the entities you create, delete those ones on swap, and spawn the next set and ensure it looks right.
Static Analysis
There’s a lot of anti-patterns, common bugs and style issues that can be caught by a program. Many of these tools exist online as plugins for github. CodeFactor.io is a tool that looks at code complexity and common errors and reports them to you when you push code. While there are some false positives, it can catch mistakes you don’t often think of like not using the explicit keyword on single parameter constructors.
There’s also tools called linters that will automatically go through your code and apply fixes to them. Some can completely reformat them to a specific style of your choice!
You can even write custom ones yourself!
Let’s say you wanted to ensure that no one added three or more newlines in a row. First, create another executable project in your solution. In the main file, write the logic to load each file and find any code that follows the pattern of “\n\n\n” with only spaces/tabs in between newlines. Then, write those strings back to the same file.
Next, set the build order so this is built first before your engine is. You can then add a Pre-Build event that runs your executable on your code.
Want to enforce “const std::string &” to be “const std::string&”? Build it into that step!
If you think this will take a while and hurt build times, I promise you it will not. Your project probably doesn’t have over, at most, a few hundred files. If you restrict your files to just .h/.cpp and only files your team has created, that will be over in much less than a second.
This is absolutely vital for code reviews. If code reviews are primarily talking about style, you’re doing code reviews wrong. I’ll go into detail about this in a later post.
Stress Test
If you’ve been through Mead’s class, you already know what this is. In short, throw everything at it and see if it chokes.
A small note: try to ensure it is still deterministic. If you’re just throwing rand() at a function, it could become a flaky test (multiple runs may pass or fail). Let’s say that negatives are the failure case. If your random numbers are between INT_MIN and INT_MAX, and you pick 10 random numbers, you have a (1/2)^10 chance of randomly passing. That’s not what you want.
Use seeds, use large sets, use vectors, use for loops. Don’t just throw a time based rand() at it. I recommend taking the values from a seeded rand and restricting them to your different equivalence classes. For example, generate 5 numbers that are negative and 5 that are positive.
If you skipped ahead, I recommend reading this for more insight on stress testing.
Benchmark Test
A benchmark test is a test where a piece of code is repeatedly tested to see how it performs on certain metrics. This could be time, throughput, QPS and a lot of other metrics. Really, I just have one warning about this. Don’t pass/fail on a slow benchmark.
Performance analysis is a complicated beast. There’s a lot to learn about different metrics, noise analysis, and finding regressions over time. I spent three months learning all about this and I still feel like I don’t understand all of it.
A simple example I had was a piece of code that we timed to be on average about 1s total after a number of runs. We even gave it a little bit of a buffer of 0.2s so it wouldn’t fail.
It still failed. Often, actually. Even with a lot of tests, sometimes, the scheduler is being mean to you. Sometimes you have YouTube on. Sometimes it hot in your room. Sometimes it’s being run on a toaster. There’s so many variables when it comes to these benchmarks that to pass and fail based on it is a recipe for disaster.
Record it. Graph it. Study it. Analyze it. Don’t fail based on it.
If you would like to learn about this more deeply, I highly recommend this CppCon talk.
Continuous Integration
Remember when Ted pushed broken code to master? That’s a little bit your fault. Before something is pushed to master, it should have to go through a continuous integration process (CI). All it does is pull master, pull your changes on top of it and try to build. If it fails, it gives you the logs and prevents a push to master.
There are many CI frameworks out there, nearly all that I’ve heard of will work directly with GitHub. Microsoft has Azure Pipelines which you can get one free build server for. Travis CI is free for open source projects and costs money for private projects (it’s a bit expensive). CircleCI is free for a single server. With Azure, you’re going to be setting up a build agent on your own computer.
You might be thinking “Well of course I tested that it runs. Why would we need this if we’re competent programmers?”
There’s a lot of ways to have something work on your machine and not someone else’s. Did you forget to push a file? Well, it’ll work on your machine and not someone else’s. Are you missing a resource file? Broken. Is your computer faster and thus avoids a bug? Broken.
Alright. I think that’s enough on testing for now. I genuinely hope you will integrate testing more deeply into your workflow because it really will make the difference come submission time.
Enough of the abstract. Let’s actually set up a project from scratch that is ready to use Visual Studio’s unit testing framework!