Some Code Organization Patterns

Lately I am settling into a new job over at Neversoft. There are some awesome folks over there, and I am really enjoying it so far. Along with starting a new job comes learning a completely different codebase. This can be especially arduous for tools folks since tools code typically sits atop a mountain of engine, pipeline, and foundation code.

In trying to wrap my head around an entirely new chunk of tech, I keep re-discovering patterns that make it easier to get your bearings on a lot of new code quickly. There are lots of these patterns that studios follow when organizing their code, and following these can make it easier to dive and and start getting work done (or just make getting work done in general). Some or all of these may be obvious to experienced engineers, but I figure it never hurts to reinforce best practices, and you never know when someone will have the total opposite opinion for really interesting reasons.

Maintain just a handful of high level solutions so its easy to gain grand perspective.

The lower the solution count in your project the better. Ideally they should all be in the top level folder of your code. The key here is to create awareness of the major chunks of technology in your project. I think most people agree that the bar should be low for any engineer to get in and look at tools, engine, or game code. The more you hide solutions within your code tree the more arcane knowledge is required to even know who the major players are in your codebase.

Direct all compiler output to a single folder.

Nothing hurts broad searches more than having large binary files mixed in with the source you are trying to search. It’s probably the reason why Visual Studio has preconfigured laundry lists of source code file filters in their Find in Files tool. If you redirect all your compiler output folders to its own root folder then broad searches gets orders of magnitude faster since it doesn’t have to wade through compiler data.

If your compiler output is directed to a separate dedicated folder then doing a clean build is just a simple matter of destroying the output folder and re-running your build. Explicit cleans are just slower, and its just easier to delete a folder when scripting things like build server operations.

Code generated via custom build steps counts as compiler output too! Add your output location an include path and #include generated code, even c/cpp files. Doing this keeps a very clear distinction between generated code and code which belongs in revision control (and hopefully you aren’t storing generated code in revision control!).

Keep 3rd party library code and solutions separate.

A big part of effectively searching through your codebase is being able to differentiate your code from external library code. Littering 3rd party libraries in with your own code can muddle search results.

Frequently its not necessary to clean build both 3rd party code and your project code, so having separate solutions can save time. It also makes performing search and replaces within solutions that only have your project code in them safer (you don’t want to search and replace within a 3rd party lib do you!?).

Install large 3rd party SDKs directly onto workstations.

Revision control isn’t the only software delivery mechanism on the planet. Nobody should be making changes within the CellSDK, DirectX SDK, or FBX SDK so they shouldn’t be checked into revision control. These packages tend to be very easy to script for unattended installation (msiexec). This makes it easy to write a simple SDK checkup script to make sure that any given client (even build servers) have the latest kit installed.

Most large SDKs have environment variables that make them easy to find on the system, and even if they don’t you can typically assume where it should be installed. If they are missing it’s a simple thing to track down and install t (even for junior or associate engineers). Also, it never hurts to add compile asserts to validate that the code is being built against the correct version of those libraries.

If you happen to develop on a system with a package manager, they are awesome for making it easy to pull down 3rd party libraries directly off the internet. Microsoft’s CoApp project aims to do just that on Windows.

Only check in binaries of what you cannot easily compile.

The less compiled binaries you check in the better your revision control will perform, and everyone you work with is served better when revision control works well. Source code is much quicker to transfer and store on servers and peers. Not checking in compiled binaries means less waiting for transfers, less locking for centralized servers, and less long term size creep for distributed repositories.

Checking in built versions of libraries will create a headache for yourself in the future when you want to deploy a new compiler or support a new architecture (which will require you to recompile using a bunch of crusty project files that haven’t been used in months or years). It’s always worth a little extra time when adding a new external library to take command over your build configuration management. Sometimes this can involve making your own project files instead of using ones that may be included with the library source code. High level build scripting tools like Premake, CMake, and boost::build are worth spending time to learn, and can make hand-creating IDE-specific projects seem archaic. If updating external libraries in your engine is easy you will do it more often, and hence reap the benefit of more frequent fixes and improvements you don’t have to do yourself.

This article was also posted to AltDevBlogADay.

Common Problems: Preserving Atomic Changes When Checking In Builds

One of the things I’d like the Toolsmiths to be is a place where we can discuss our common problems, and hopefully come up with common solutions. Toward that end, I’m starting a new series on the blog called “Common Problems”, and I’m kicking it off with something that I’ve seen as a common problem recently.

We all know the benefits of having continuous integration and / or nightly builds. What I’ve found to be problematic, though, is when distributing that build to other members of the team means checking the build into source control, specifically when it is checked in to the same directory that other team members use to do their work. This setup is beneficial in many ways. This directory, we’ll call it the “data” directory, is basically a snapshot of the project. Team members pull from that directory and it has the most recently compiled executable plus all configuration, data, and art files needed to run the game. They can then easily change anything in the directory, test, and commit. It’s quick easy and painless, for the most part.

Generally artists and designers only check out the “data” directory, make their changes, and check back in so that everyone can partake. If they’re good artists and designers, they make sure that their changes work before checking in, and everything they’ve worked on becomes an atomic commit in any modern source control system. Since their not editing the executable, these changes almost always remain atomic.

Coders, however, check out both the “data” and the “code” directories. They will frequently edit the code and the data to get something working, and, after testing, they will then check in both directories atomically. However here’s the problem: there is a period of time between when the coder checks in new code and when the build machine will check in changes to that code into the data directory. During this time there is a disconnect between the executable and what’s in the data directory. In the best case scenario this doesn’t affect the team in any significant manner. Worst case, the game will crash because expected data has changed or been removed. Again, best case here is that someone realizes this is just a disconnect in the data and waits for the next build. Worst case, an erroneous bug gets created that someone actually spends time trying to solve.

I’ve tried to come up with possible solutions for this, but only have half answers:

  1. Do not build continuously, and instead have programmers check in builds whenever they change the executable. This can be accomplished by setting the target directory to your data directory. The down side of this is that, on large teams, it would be a race to check in your executable before others. In addition, a careless coder could stomp out another’s executable changes. This would be hard, but not impossible.
  2. Hold checkins to the data directory that modify code until the build is complete, and then check them in. This can be problematic because if the same data changes while the build is working, the source control server will reject the change. Furthermore, coders that pull during this time will get the code, but not the data. This is also extremely hard to implement.

What are your solutions for this problem? Do you have this problem? Why or why not?