Some Code Organization Patterns

Lately I am settling into a new job over at Neversoft. There are some awesome folks over there, and I am really enjoying it so far. Along with starting a new job comes learning a completely different codebase. This can be especially arduous for tools folks since tools code typically sits atop a mountain of engine, pipeline, and foundation code.

In trying to wrap my head around an entirely new chunk of tech, I keep re-discovering patterns that make it easier to get your bearings on a lot of new code quickly. There are lots of these patterns that studios follow when organizing their code, and following these can make it easier to dive and and start getting work done (or just make getting work done in general). Some or all of these may be obvious to experienced engineers, but I figure it never hurts to reinforce best practices, and you never know when someone will have the total opposite opinion for really interesting reasons.

Maintain just a handful of high level solutions so its easy to gain grand perspective.

The lower the solution count in your project the better. Ideally they should all be in the top level folder of your code. The key here is to create awareness of the major chunks of technology in your project. I think most people agree that the bar should be low for any engineer to get in and look at tools, engine, or game code. The more you hide solutions within your code tree the more arcane knowledge is required to even know who the major players are in your codebase.

Direct all compiler output to a single folder.

Nothing hurts broad searches more than having large binary files mixed in with the source you are trying to search. It’s probably the reason why Visual Studio has preconfigured laundry lists of source code file filters in their Find in Files tool. If you redirect all your compiler output folders to its own root folder then broad searches gets orders of magnitude faster since it doesn’t have to wade through compiler data.

If your compiler output is directed to a separate dedicated folder then doing a clean build is just a simple matter of destroying the output folder and re-running your build. Explicit cleans are just slower, and its just easier to delete a folder when scripting things like build server operations.

Code generated via custom build steps counts as compiler output too! Add your output location an include path and #include generated code, even c/cpp files. Doing this keeps a very clear distinction between generated code and code which belongs in revision control (and hopefully you aren’t storing generated code in revision control!).

Keep 3rd party library code and solutions separate.

A big part of effectively searching through your codebase is being able to differentiate your code from external library code. Littering 3rd party libraries in with your own code can muddle search results.

Frequently its not necessary to clean build both 3rd party code and your project code, so having separate solutions can save time. It also makes performing search and replaces within solutions that only have your project code in them safer (you don’t want to search and replace within a 3rd party lib do you!?).

Install large 3rd party SDKs directly onto workstations.

Revision control isn’t the only software delivery mechanism on the planet. Nobody should be making changes within the CellSDK, DirectX SDK, or FBX SDK so they shouldn’t be checked into revision control. These packages tend to be very easy to script for unattended installation (msiexec). This makes it easy to write a simple SDK checkup script to make sure that any given client (even build servers) have the latest kit installed.

Most large SDKs have environment variables that make them easy to find on the system, and even if they don’t you can typically assume where it should be installed. If they are missing it’s a simple thing to track down and install t (even for junior or associate engineers). Also, it never hurts to add compile asserts to validate that the code is being built against the correct version of those libraries.

If you happen to develop on a system with a package manager, they are awesome for making it easy to pull down 3rd party libraries directly off the internet. Microsoft’s CoApp project aims to do just that on Windows.

Only check in binaries of what you cannot easily compile.

The less compiled binaries you check in the better your revision control will perform, and everyone you work with is served better when revision control works well. Source code is much quicker to transfer and store on servers and peers. Not checking in compiled binaries means less waiting for transfers, less locking for centralized servers, and less long term size creep for distributed repositories.

Checking in built versions of libraries will create a headache for yourself in the future when you want to deploy a new compiler or support a new architecture (which will require you to recompile using a bunch of crusty project files that haven’t been used in months or years). It’s always worth a little extra time when adding a new external library to take command over your build configuration management. Sometimes this can involve making your own project files instead of using ones that may be included with the library source code. High level build scripting tools like Premake, CMake, and boost::build are worth spending time to learn, and can make hand-creating IDE-specific projects seem archaic. If updating external libraries in your engine is easy you will do it more often, and hence reap the benefit of more frequent fixes and improvements you don’t have to do yourself.

This article was also posted to AltDevBlogADay.

Common Problems: Preserving Atomic Changes When Checking In Builds

One of the things I’d like the Toolsmiths to be is a place where we can discuss our common problems, and hopefully come up with common solutions. Toward that end, I’m starting a new series on the blog called “Common Problems”, and I’m kicking it off with something that I’ve seen as a common problem recently.

We all know the benefits of having continuous integration and / or nightly builds. What I’ve found to be problematic, though, is when distributing that build to other members of the team means checking the build into source control, specifically when it is checked in to the same directory that other team members use to do their work. This setup is beneficial in many ways. This directory, we’ll call it the “data” directory, is basically a snapshot of the project. Team members pull from that directory and it has the most recently compiled executable plus all configuration, data, and art files needed to run the game. They can then easily change anything in the directory, test, and commit. It’s quick easy and painless, for the most part.

Generally artists and designers only check out the “data” directory, make their changes, and check back in so that everyone can partake. If they’re good artists and designers, they make sure that their changes work before checking in, and everything they’ve worked on becomes an atomic commit in any modern source control system. Since their not editing the executable, these changes almost always remain atomic.

Coders, however, check out both the “data” and the “code” directories. They will frequently edit the code and the data to get something working, and, after testing, they will then check in both directories atomically. However here’s the problem: there is a period of time between when the coder checks in new code and when the build machine will check in changes to that code into the data directory. During this time there is a disconnect between the executable and what’s in the data directory. In the best case scenario this doesn’t affect the team in any significant manner. Worst case, the game will crash because expected data has changed or been removed. Again, best case here is that someone realizes this is just a disconnect in the data and waits for the next build. Worst case, an erroneous bug gets created that someone actually spends time trying to solve.

I’ve tried to come up with possible solutions for this, but only have half answers:

  1. Do not build continuously, and instead have programmers check in builds whenever they change the executable. This can be accomplished by setting the target directory to your data directory. The down side of this is that, on large teams, it would be a race to check in your executable before others. In addition, a careless coder could stomp out another’s executable changes. This would be hard, but not impossible.
  2. Hold checkins to the data directory that modify code until the build is complete, and then check them in. This can be problematic because if the same data changes while the build is working, the source control server will reject the change. Furthermore, coders that pull during this time will get the code, but not the data. This is also extremely hard to implement.

What are your solutions for this problem? Do you have this problem? Why or why not?

Building on the Cloud

Over the past few years, cloud computing has become the next big thing for enterprise software.  The ability to easily scale resources to meet the needs of the end users cheaply is very attractive.  Amazon, Sun, Google and now Mictrosoft (among others) are all offering cloud computing solutions.  I’ve recently been playing around with the AWS (Amazon Web Services) to see what you can do with this technology, and I can already see a few ways it could be applied to games.

Running games on the cloud is an obvious use of these resources.  Need a game server accessable from anywhere in the world?  Start one up on a virtual server.  The ability to build machine images (AMIs on Amazon), complete with your own software running on operating systems like Linux, OpenSolaris, or even Microsoft Windows Server gives you that possibility for pennies a day.

But, where cloud computing could really come in handy is in game development.  Imagine starting a build distributed across the cloud, in which thousands of virtual machines simultaneously start processing individual bits data.  You might see builds going from minutes or hours to just a few sconds.

And the cloud isn’t just for processing either.  Some companies offer services for managing data that would traditionally reside in a relational database, and as well as file storage services.  You could even use your own machine image running some flavor of SQL.  With that capability, why not store assets in the cloud?  An asset control vendor could use the software as service (SAS) model for asset control, supplying developers with web and client based views into an asset database on the cloud itself.

The big problem here is that we’re trading bandwidth for processing power and flexibility.  The build process may take a few seconds, but retrieving the results to local machines may eat up every bit of build-time savings and then some.  We may see overnight builds turn into overnight downloads, and that’s no savings at all. 

Bittorrent file serving (available on AWS) may be useful as a build distribution model, but with most users on a single network, it doesn’t seem likely to make a difference.  Limiting the download process to necessary files only is simply the flipside of building necessary files only, so may also offer little in the way of savings.  Doing a bit by bit comparrison of files built on the cloud, and downloading just the file differences, may be a way to reduce the download time, assuming there are chunks of data in a binary file that remain constant between builds.  Other optimiztions almost certainly exist.

All in all, it could be a big win, but until someone proves it, we can’t know for sure.

Rethinking Asset Control

Many of the available source control solutions out there are great if you are a programmer.  Both Subversion and Perforce adequately handle the storing of assets, but neither is very friendly to creative types.  How often do “bad checkins” happen because some new and obscure file created on the user’s machine didn’t get added?  Or maybe the user didn’t get latest, merge the data, build the game and test it one last time before checking everything in. 

Team sizes are increasing.  So are the assets, themselves.  The more users stressing the system, the more fragile it becomes.

NxN had the right idea with Alienbrain but never really got anywhere due to serious technical issues with their back-end.   It’s been a few years since I used it last, so they may have fixed a lot of those problems.   Anyway, it also had some very nice features you don’t get in other source control solutions.  You could easily redesign the whole interface (it was mostly html and javascript as I recall), and they included a feature that was very art-centric.  Previews.

You could generate previews of assets and view them right in the Alienbrain interface.  It was a very slick feature and a selling point of the software.  Finally, a user could see a preview of a model or texture (and many other asset types) without doing a get and opening the files in Maya or Photoshop, etc.  That’s a real time-saver if you don’t remember the filename that was used for a specific asset.  You have the chance to browse all the assets of that type and find the one you want pretty easily.

Like I said, though, NxN had its share of troubles.  Still, I believe we can do better than the source control status-quo.  I imagine an asset database solution that integrates with every asset generating tool, as well as the build process, generates a preview for each asset (even if it’s a bitmap that says “Preview Not Available”), and is searchable by its meta-data, including tags, creator, last modified, and so on. 

The classic view of assets as a collection of files inside of folders, with users having to know exactly what files need to be checked in and out of source control when changes are made seems a little antiquated.  Instead of searching through folders ten layers deep, how about using a tag cloud to find assets instead?

I imagine being able to open a web-based interface, searching for an animated character from an old project and clicking a button to copy it to a new project, including all of it’s vertex, texture and animation data and using it as the starting point for a brand new character, or maybe just as a placeholder until a new character is created.  How many walk cycles does one studio need to recreate every time a new project is started, anyway?  Why not take something you have and modify it to fit a new character in a completely different game?

I really beieve that asset databases are the wave of the future for game development.  When the Xbox360 and PS3 came along, team sizes doubled, and assets got bigger and more complex.  What’ll happen next time there’s a hardware revolution?  We need to streamline the way we manage assets, or else, it’s going to bite us in the ass… even more.

Sharing code with p4share

Recently Insomniac Games has expanded to include a second studio in Durham, NC. Durham has their own Perforce server instance to support engineering and asset production for their titles. While the Core Team (engine and tools engineering) is still located in Burbank, Durham has a small group that add features and improvements to help get their games done. Until recently we got by okay with Durham taking code drops from Burbank, but we needed something better. We needed a way to share code bidirectionally.

Unfortunately Perforce was not designed as a distributed revision control system, so we needed to come up with our own solution. We needed to allow sharing code across Perforce server instances. p4share is a Perl script I wrote to help solve this problem without involving a huge list of complicated manual steps.

To get the job done it does a lot of deleting, syncing, and copying of files on the local client… nothing too exciting. I was however able streamline the process in a interesting way given Perforce’s ability to open a file for edit at the client have revision (as opposed to the head revision). When you open a file for edit at the client have revision then all of the changes that have been made in subsequent revisions must be resolved into your edits before you submit your changes. This resolve step is only necessary when your have revision does not equal the head revision when checking out a file, or the file in question allows for multiple checkout and someone edited and submit changes before you can submit.

Given the ability of Perforce to open a historical revision for edit, I was able to make p4share less likely to loose edits on files that have changed on both servers. p4share uses a label to store the revision at which each file was last shared. When sharing happens again in the future, the client is synced back to the revision that was submitted the last time files were shared. The files are then opened for edit at that historical revision and overwritten with files from the other server. In this state any file that had changed on both sides will require resolution to submit, but the resolving mechanics of Perforce has all the information it needs to do automatic resolution (two versions of a file and a base revision… which in this case is the revision labeled during the last share session).

Using this technique is a win because it removes the possibility of stomping files and losing changes on either side when manually merging changes from both servers.

You can find p4share on Nocturnal Initiative‘s Perforce server: nocturnal.insomniacgames.com:1666 at //Source/Trunk/p4share/p4share.pl, and via p4web here.

Perforce Search Tool

Today, we have an interesting tool writen by Toolsmiths reader Eddie Schooltz.  The tool, which he wrote about on his blog, is for searching for Perforce change sets, and I can see how it would be incredibly useful (provided you have good changelist descriptions) when searching for when things changed.

Eddie says he’ll be posting the source to the tool, and we’ll keep you updated about when that happens.