Vessel Post Mortem: Part 1
I started looking at Vessel in January 2013 - initially just in my evenings. In my ‘spare’ time. I was looking at it because John and Martin (the founders of Strange Loop Games and the creators of Vessel) had asked me to consider porting it to PS3. We knew each other from our time together at Pandemic Studios in Brisbane where I had worked closely with Martin in the Engine team while John worked as an AI programmer. I was keen to do the port but I wanted to be as sure as possible that I knew how much work was involved in it before committing.
I found the idea of porting such a game daunting - it had a custom rendering engine, a bespoke fluid physics system with major parts of the game code written in Lua - but I figured it was time to step up and take on such a challenge. I have an immense amount of respect for John and Martin - they are both incredibly smart coders and I relished the chance of working with them again. I wanted to do it but I didn't want to let them down. They were going to be my clients, but they were also friends.
I started by looking at their code - in order to give a quote on the porting process I needed to have the game running so that I could profile it. Vessel had already been partially ported to the PS3 by a third party, but wasn't compiling when I first got my hands on it. So I spent a few weeks of long evenings fixing the FileIO, the template compilation errors (GCC and MSVC disagree on some of the finer points of templates), and just commenting out anything I didn't think was relevant in order to get the game up and running. Once it was working, I ran my trusty profiler over the game and tried to guess how much time it would take to get this baby to a shippable state. What I found was a little scary.
Visualisation of the code changes made during the porting of Vessel to PS3.
Vessel runs predominantly on three threads. The game thread which runs all the Lua scripts which control the majority of the game logic and in game objects, the render thread which takes the output of the game thread and converts it to a format that can be fed to the SPUs which then build the graphics command buffer and feed it to the GPU in parallel, plus the physics thread which manages all the physics processing - fluid dynamics, rigid body collisions, everything. John had built the physics system himself and it was pretty impressive (and complicated). It was too much to grok initially, so all I could do was look at the game’s performance on some of the worst levels, profile it there and see how much I needed to speed it up.
I investigated a few of the levels on the PS3 build and picked two that I thought were the worst. What they told me was that the physics thread was the primary cause of slowdown and so I concentrated my analysis on that thread. The game was running at about 60ms per frame in the worst levels I was testing - about 15 to 20 frames per second. From discussions with Strange Loop, I knew that the physics thread had to run at 60fps so this meant that I needed a factor of 4 speed up. From experience, I knew that this was something that was achievable. The game thread was running at around 30ms per frame and the Render thread was clocking in at around 5ms per frame. SPU rendering was pretty fast, and I figured that wouldn't be much of a problem.
Months of fannying about took place and a contract was finally signed in June. So I did the only appropriate thing one could do in that situation - I did two weeks of work on Vessel and then went on a 6 week family holiday to the UK.
In anticipation of the Vessel contract I’d hired an experienced programmer, Ben Driehuis. Ben came highly recommended and had worked on the Bioshock series of games, plus, most importantly, he was living in the same city as me. Having spent the last 6 years working remotely I knew that my best bet of delivering this game on time was to be in the same physical space as my team. I threw Ben in the deep end. He was hired and then given work on platforms and engines that he’d never had to deal with before. Then when I buggered off to the UK he had to manage by himself, dealing with a performance contract on a big name title and then kicking on with Vessel in my absence. Poor bugger.
I also had a friend who was interested in the QA side of things, so James (an old Uni mate) was responsible for playing Vessel over and over and over again, reporting bugs to us and telling us that the performance improvements we’d just made hadn't made any difference. James also helped us out on the testing for the TRC (Technical Requirements Checklist - a massive list of tests that need to be checked off in order to get your game through submission to Sony) and the actual submission of the game.
Once I got back from my holiday we moved into the new office and started working in earnest on the Vessel port. Our goal was to finish Vessel by the end of August - we had about four man-months to do the work, and I had carefully calculated how much time each section of the port would take. Surely that was enough time?
Ben and I divided up the work between us: I took on the optimisation of the physics and Ben was to deal with pretty much everything else - audio was broken, some of the in game objects didn't animate, loading needed to be optimised, memory usage had to be reduced, we needed save games and trophies and a whole swathe of PS3 specific features implemented. Ben has written about some of the fun he had during the port here, here and here.
From my initial inspection of the code, I had noticed a lot of memory allocations taking place, so I was optimistic that I could reduce (or remove) these and scrape back a lot of frame time. I also figured that rearranging the memory usage to be more cache friendly would help the PlayStation 3’s main processor so I could leave the core of the physics code alone. The last thing I wanted to do was to port all of the physics to SPU.
Three weeks later I started porting the physics to SPU. The changes I was making were incrementally improving the performance, but I could see that we’d never hit the speed we needed without dramatic changes. Porting code to SPU isn’t like porting code from one platform to another. You can program SPUs in C++, but they have very limited memory available to them - only 256kb each. So, in order to port any significantly complex task, you need to understand how that task uses memory - where it is, what order it reads it, where it writes it - the things that most programmers of high level languages take for granted. Not only that, but you have to consider how the task will parallelise. Running on one SPU is good, but running on 6 is awesome. (I’ll go into more detail on the SPU porting process in later articles.)
The first SPU task I built was one that looked for drops that were near a set of objects. My first shot at this shaved almost 2ms off the frame time - the function in question dropped from a peak of 2.4ms to 0.6ms! That meant that in order to hit 16ms per frame I just needed to do this 21 more times!
Still, Ben and I kept hammering away at the code. By the end of the first week in August I’d shaved 15ms off the Physics thread and yet that thread remained the bottleneck. Ben had freed up over 200MB of memory and we were close to running on a retail PS3. But we’d started to notice a few things - some of the levers and triggers in the game weren’t working, fluid shaders weren’t working properly, we were getting spikes of over 200ms in some cases and the game still wasn’t playable due to not just the levers not working but the AI of the fluros seemed to be broken (some would teleport away).
It was evident at this point that we weren’t going to hit the end of August deadline. Other projects had taken some time from both Ben and myself, so we still had billable time available going into September, so I let Strange Loop know that things were going to take a little longer than I had expected. They were very understanding and comfortable with the release date pushing out a little more.
Things were progressing well though - by the end of August the physics thread was running at better than 33ms per frame - we were over halfway there! Save games were in place, loading was optimised, and trophies were functioning. Unfortunately, the game thread was slowing down - as Ben was fixing problems with the code and data, more things were being processed. For example, audio wasn’t working before and now there was a lot of audio being triggered so this obviously took more processing time.
It was the second week in September when I realised that I’d made a horribly flawed assumption about the game thread in Vessel. I discovered that it had to function in lock step with the physics thread and so it too had to run at 16ms per frame. Reducing the framerate of the physics wasn’t an option as it was fixed to 16ms per frame - experimentation with varying that resulted in fluid collisions breaking horribly.
So, at that point we had the physics running at sub-30ms per frame, and the game thread running at 30ms and above. We were close, but we still had a long way to go - we now had to figure out how to optimise the Lua heavy game thread down to 16ms per frame. Actually, it had to be faster than that - as the PS3 has only two hardware threads, the 3 software threads have to share those two hardware threads which meant that the game thread plus the render thread had to be faster than 16ms per frame.
Then Ben discovered the reason for the teleporting fluros - a compiler bug triggered only in optimised builds was causing fluros to incorrectly calculate their trajectory during jumps, making them disappear. Turning off optimisation for that function fixed the problem. That was a tricky bug to find, so we were pretty stoked - until we realised that this meant that the number of fluros in view was now higher that it was before. Much higher. More fluros means more processing - more physics, more drops, more rendering, more audio. This one fix resulted in the Physics thread jumping back up to 50ms from around 24ms, the render thread jumping up to 12ms and the Game thread climbing to 40ms.
I was horrified. We were effectively back to where we had started. We now needed to save a total of 60ms over three threads. I wasn't sure we could do it. It was like that bit in horror movies where the protagonist has killed the evil monster, only to have it rise up, stronger than it was before.
The initial optimisations on a project are the easiest. As time goes on and all the low hanging fruit has been picked, the remaining optimisations are harder and harder to perform. We really had our work cut out for us - and we'd come too far to go back now.