Some Code Organization Patterns

Lately I am settling into a new job over at Neversoft. There are some awesome folks over there, and I am really enjoying it so far. Along with starting a new job comes learning a completely different codebase. This can be especially arduous for tools folks since tools code typically sits atop a mountain of engine, pipeline, and foundation code.

In trying to wrap my head around an entirely new chunk of tech, I keep re-discovering patterns that make it easier to get your bearings on a lot of new code quickly. There are lots of these patterns that studios follow when organizing their code, and following these can make it easier to dive and and start getting work done (or just make getting work done in general). Some or all of these may be obvious to experienced engineers, but I figure it never hurts to reinforce best practices, and you never know when someone will have the total opposite opinion for really interesting reasons.

Maintain just a handful of high level solutions so its easy to gain grand perspective.

The lower the solution count in your project the better. Ideally they should all be in the top level folder of your code. The key here is to create awareness of the major chunks of technology in your project. I think most people agree that the bar should be low for any engineer to get in and look at tools, engine, or game code. The more you hide solutions within your code tree the more arcane knowledge is required to even know who the major players are in your codebase.

Direct all compiler output to a single folder.

Nothing hurts broad searches more than having large binary files mixed in with the source you are trying to search. It’s probably the reason why Visual Studio has preconfigured laundry lists of source code file filters in their Find in Files tool. If you redirect all your compiler output folders to its own root folder then broad searches gets orders of magnitude faster since it doesn’t have to wade through compiler data.

If your compiler output is directed to a separate dedicated folder then doing a clean build is just a simple matter of destroying the output folder and re-running your build. Explicit cleans are just slower, and its just easier to delete a folder when scripting things like build server operations.

Code generated via custom build steps counts as compiler output too! Add your output location an include path and #include generated code, even c/cpp files. Doing this keeps a very clear distinction between generated code and code which belongs in revision control (and hopefully you aren’t storing generated code in revision control!).

Keep 3rd party library code and solutions separate.

A big part of effectively searching through your codebase is being able to differentiate your code from external library code. Littering 3rd party libraries in with your own code can muddle search results.

Frequently its not necessary to clean build both 3rd party code and your project code, so having separate solutions can save time. It also makes performing search and replaces within solutions that only have your project code in them safer (you don’t want to search and replace within a 3rd party lib do you!?).

Install large 3rd party SDKs directly onto workstations.

Revision control isn’t the only software delivery mechanism on the planet. Nobody should be making changes within the CellSDK, DirectX SDK, or FBX SDK so they shouldn’t be checked into revision control. These packages tend to be very easy to script for unattended installation (msiexec). This makes it easy to write a simple SDK checkup script to make sure that any given client (even build servers) have the latest kit installed.

Most large SDKs have environment variables that make them easy to find on the system, and even if they don’t you can typically assume where it should be installed. If they are missing it’s a simple thing to track down and install t (even for junior or associate engineers). Also, it never hurts to add compile asserts to validate that the code is being built against the correct version of those libraries.

If you happen to develop on a system with a package manager, they are awesome for making it easy to pull down 3rd party libraries directly off the internet. Microsoft’s CoApp project aims to do just that on Windows.

Only check in binaries of what you cannot easily compile.

The less compiled binaries you check in the better your revision control will perform, and everyone you work with is served better when revision control works well. Source code is much quicker to transfer and store on servers and peers. Not checking in compiled binaries means less waiting for transfers, less locking for centralized servers, and less long term size creep for distributed repositories.

Checking in built versions of libraries will create a headache for yourself in the future when you want to deploy a new compiler or support a new architecture (which will require you to recompile using a bunch of crusty project files that haven’t been used in months or years). It’s always worth a little extra time when adding a new external library to take command over your build configuration management. Sometimes this can involve making your own project files instead of using ones that may be included with the library source code. High level build scripting tools like Premake, CMake, and boost::build are worth spending time to learn, and can make hand-creating IDE-specific projects seem archaic. If updating external libraries in your engine is easy you will do it more often, and hence reap the benefit of more frequent fixes and improvements you don’t have to do yourself.

This article was also posted to AltDevBlogADay.

Common Problems: Preserving Atomic Changes When Checking In Builds

One of the things I’d like the Toolsmiths to be is a place where we can discuss our common problems, and hopefully come up with common solutions. Toward that end, I’m starting a new series on the blog called “Common Problems”, and I’m kicking it off with something that I’ve seen as a common problem recently.

We all know the benefits of having continuous integration and / or nightly builds. What I’ve found to be problematic, though, is when distributing that build to other members of the team means checking the build into source control, specifically when it is checked in to the same directory that other team members use to do their work. This setup is beneficial in many ways. This directory, we’ll call it the “data” directory, is basically a snapshot of the project. Team members pull from that directory and it has the most recently compiled executable plus all configuration, data, and art files needed to run the game. They can then easily change anything in the directory, test, and commit. It’s quick easy and painless, for the most part.

Generally artists and designers only check out the “data” directory, make their changes, and check back in so that everyone can partake. If they’re good artists and designers, they make sure that their changes work before checking in, and everything they’ve worked on becomes an atomic commit in any modern source control system. Since their not editing the executable, these changes almost always remain atomic.

Coders, however, check out both the “data” and the “code” directories. They will frequently edit the code and the data to get something working, and, after testing, they will then check in both directories atomically. However here’s the problem: there is a period of time between when the coder checks in new code and when the build machine will check in changes to that code into the data directory. During this time there is a disconnect between the executable and what’s in the data directory. In the best case scenario this doesn’t affect the team in any significant manner. Worst case, the game will crash because expected data has changed or been removed. Again, best case here is that someone realizes this is just a disconnect in the data and waits for the next build. Worst case, an erroneous bug gets created that someone actually spends time trying to solve.

I’ve tried to come up with possible solutions for this, but only have half answers:

  1. Do not build continuously, and instead have programmers check in builds whenever they change the executable. This can be accomplished by setting the target directory to your data directory. The down side of this is that, on large teams, it would be a race to check in your executable before others. In addition, a careless coder could stomp out another’s executable changes. This would be hard, but not impossible.
  2. Hold checkins to the data directory that modify code until the build is complete, and then check them in. This can be problematic because if the same data changes while the build is working, the source control server will reject the change. Furthermore, coders that pull during this time will get the code, but not the data. This is also extremely hard to implement.

What are your solutions for this problem? Do you have this problem? Why or why not?

Premake 4.3

Industrious One has announced availability of next major release of its excellent build configuration tool, Premake. The announcement and download link is here. Premake is a BSD open source, lua based, cross-platform IDE project and Makefile generation tool.

Premake lets you define common settings on the solution level and add configuration-specific settings based on wildcards. For example, I can define WIN32 as a common preprocessor variable, but set UNICODE to be defined only for configurations whose name matches “*Unicode”. Premake can be a huge benefit to managing the combinatorial explosion of settings for build configuration (ASCII/Unicode, Debug/Release, Win32/x64).

Premake has support for generating PS3 and Xbox360 visual studio solutions, but version 4.3 is still missing a couple of things that game developers need to handle every scenario. These include generation of projects that need to call out to make, and projects with custom build steps (for shaders, assembly, and code-generating scripts). Support for this is planned for subsequent releases, and there are already some patches to evaluate. Premake itself is simple to download and build (its hosted on BitBucket). If you do decide to take the plunge and switch to Premake, you will find starkos (the project runner) to be very courteous and responsive.

If you deal with build configuration at your studio, you owe it to yourself to evaluate Premake. It has vastly simplified managing our builds at WMD.

CoApp

One open source project I have been keeping an eye on is CoApp. Microsoft is currently paying a Garrett Serack to develop an open binary and source package management platform for Windows. The goal is provide the ease of use and flexibility of linux-style package management on the Windows platform. This is exactly what Microsoft needs to do to keep its operating system competitive in the current climate. Anyone that has developed or broadly deployed an open source application on windows knows the pain that can be avoided if this project succeeds.

CoApp Presentation from Garrett Serack on Vimeo.

Improving Builds (GameX Talk)

As many of you know, I’m a stickler for a good build process.  In my mind, a any game team can loose a lot of time and money just waiting for their builds to complete, or waiting for a build that won’t crash every 10 minutes.  This is mitigated somewhat by programming processes like unit testing and the like, but even with these, it is important that you have a clear and defined process for getting the build from check-in to team without any significant snags.

A few months ago, I gave a talk at GameX about improving builds and build process, and I’ve finally gotten around to posting the slides on my website, here.

There are a few things I wish I’d hit in the talk that just didn’t make it in, including talking about ways to distribute asset optimization and best practices for version control, but much of that is in flux for me right now, especially with my new found fascination with Mercurial and distributed version control (and it’s very real lack of binary / large file support).  Even without those concerns, I’ve yet to see anyone really tackle best practices in distributed asset optimization, including best practices in file composing (taking multiple files that make up a level, and composing so that they load faster), so it wasn’t something I was prepared to address.

What about from the readers?  What would you have liked to see in this talk that never got mentioned?  What would have rather I’d spent more time on?

Building on the Cloud

Over the past few years, cloud computing has become the next big thing for enterprise software.  The ability to easily scale resources to meet the needs of the end users cheaply is very attractive.  Amazon, Sun, Google and now Mictrosoft (among others) are all offering cloud computing solutions.  I’ve recently been playing around with the AWS (Amazon Web Services) to see what you can do with this technology, and I can already see a few ways it could be applied to games.

Running games on the cloud is an obvious use of these resources.  Need a game server accessable from anywhere in the world?  Start one up on a virtual server.  The ability to build machine images (AMIs on Amazon), complete with your own software running on operating systems like Linux, OpenSolaris, or even Microsoft Windows Server gives you that possibility for pennies a day.

But, where cloud computing could really come in handy is in game development.  Imagine starting a build distributed across the cloud, in which thousands of virtual machines simultaneously start processing individual bits data.  You might see builds going from minutes or hours to just a few sconds.

And the cloud isn’t just for processing either.  Some companies offer services for managing data that would traditionally reside in a relational database, and as well as file storage services.  You could even use your own machine image running some flavor of SQL.  With that capability, why not store assets in the cloud?  An asset control vendor could use the software as service (SAS) model for asset control, supplying developers with web and client based views into an asset database on the cloud itself.

The big problem here is that we’re trading bandwidth for processing power and flexibility.  The build process may take a few seconds, but retrieving the results to local machines may eat up every bit of build-time savings and then some.  We may see overnight builds turn into overnight downloads, and that’s no savings at all. 

Bittorrent file serving (available on AWS) may be useful as a build distribution model, but with most users on a single network, it doesn’t seem likely to make a difference.  Limiting the download process to necessary files only is simply the flipside of building necessary files only, so may also offer little in the way of savings.  Doing a bit by bit comparrison of files built on the cloud, and downloading just the file differences, may be a way to reduce the download time, assuming there are chunks of data in a binary file that remain constant between builds.  Other optimiztions almost certainly exist.

All in all, it could be a big win, but until someone proves it, we can’t know for sure.

Debugging in the Field

Developing in-house game tools presents a myriad of debugging issues. You can’t always nail down bugs to reproducible steps (if you even have QA resources to concentrate on that). Frequently content creators will complain about rare issues that force them to reboot the tools or use bizarre workarounds then things go wrong. Remote debugging works in some of these scenarios, but is mainly useful for debugging crash bugs. Errant “drag and drop” or “click and drag” problems require sitting at the machine to properly deal with.

In these cases its handy to be able to deploy a debugger onto the user’s machine so you can dive in and see where your code is going wrong. To be successful at this you need a couple of components: the debug symbols from the compile, the source code, and a debugger.

On Windows the debugging symbols are separate files from the executables. PDB files contains the information debuggers need to map addresses of code and data in a running tool to the source code counterparts. In Visual Studio, PDBs are only generated in the Debug configuration by default, so assuming you distribute something like a Release build to your users you will need to turn on PDB generation in that configuration. Its under Linker… Debugging… Generate Debugging Info. Set it to Yes (/DEBUG). When you prepare and publish your tool set, make sure to include these PDBs with the executables (EXE and DLL files).

PDBs can get quite large, so it may be a good idea to not always pull down PDB files when users get the latest tools. Insomniac’s tools retrieval script has some command line flags to pull down PDBs and code only when we know we want to debug something on a user’s machine. Using -pdb will get the executables and PDBs, and -code will get the executables, PDBs, and source (all from a label populated when the tools executables were checked in).

Once you have the PDB and code on the machine you just need a Debugger to dig in with. On Windows you have a choice: Visual C++ Express Editions or WinDBG (from Debugging Tools for Windows). Both are free to install so you aren’t bending any license agreements here. Visual C++ should work pretty much like you expect on your development box, but can take a while to install and patch to the latest service pack. WinDBG on the other hand is very quick to install, but takes a little getting used to. Typically you must show the UI you want to use (Callstack, Memory, etc…), as well as potentially manually setting the PDB search path (from File… Symbol File Path). It’s a very different experience but its so quick to deploy it may be worth checking out.