A Profiling Primer
The most interesting thing about this port is the optimisation feat ahead of us. Vessel was already (mostly) running on the PS3 at the start, but needed to run much, much faster. So, where does one begin? The first thing you need to do, is profile it - if you don't know where the program is slow (or why) then you can't optimise it. But in order for me to be able to talk about profiling and optimisation, I'll need to delve (briefly) into the profiling tools on the PS3 (If you're an experienced PS3 dev, then your probably familiar with this information - skip ahead, I won't mind). Here's a screen capture of part of the tool I use for PlayStation3 profiling (SN System's excellent Tuner - comfortably the best profiler I've used)
Like all the best profilers, it has lots of colourful bars (the PS2 Performance Analyser was awesome like that too). Colourful bars are crucial for dazzling and confusing artists who may wander by and happen to glance at your monitor. The top of the screen has a time line in seconds - you can see it ends at 0.073, or 73ms. Yup, that's right, 73ms. (Note that in this blog I will almost always use milliseconds rather than frame per second. Frames per second is a useless metric only used by teenage boys who like to put lights inside their PCs).
The display is broken up into roughly 3 parts that we can see here. Each group of horizontal bars corresponds to a thread (click on the image to see it in higher resolution in another window). The top thread is the Physics thread, the next the Game thread and then the Render thread.
The top, segmented bar is the overall thread usage. When the thread isn't running (like at the start during the stall for sync) you'll see a gap or a bar indicating an OS function (like a mutex or even fileIO). The colourful collection of bars beneath that are the result of marking up important functions in the code base with calls that tell the profiler when the function starts and stops. The longer the bar, the more time it takes. Bars that appear under other bars indicate function calls from that parent. For example, WaterTime calls MoveWaterFromInternalPressure which calls ApplyViscosity and the functions that call DoubleDensityRelaxation and then ProcessSkeletons. It is a very powerful way to view the flow of your code and I often find myself using Tuner just to see how a foreign codebase flows. For those of you out there who don't have access to the PS3 devkits or Tuner, you can use Chrome Tracing to read the output of your profiling in JSON format and display it in your browser (assuming of course, that your browser is Chrome).
So, basically, the longer the bar the longer the function it represents takes to run. Now, how do we choose which bar to optimise?
Generally, when looking for what to optimise first, I try and find something easy, something obvious. Occasionally you'll get lucky and make a simple fix that will save milliseconds - check your compilation options (you wouldn't believe how often people miss one library/project), check that your code is inlining properly (check the generated assembly), look for lots of string based functions as well as intra frame allocations - basically make sure that everything you're seeing makes sense and that there are no piles of stupid lying around. There are two ideal candidates for optimisation: One function that does a lot of processing or one or more small functions that are called a lot.
There is another feature of Tuner (and most other good profilers) and that is Program Counter sampling. Program Counter (PC) sampling is done by the profiler reading the program counter register to get an indication of which line of code is being executed at that time. The more times a particular line is sampled, the more often it is being executed - on PC the free Very Sleepy does a good job of this. You'll either get a hierarchical view of the functions being called, or a flat view of what is being called when - like in the following image:
The red and blue marks correspond to samples that are performed during the frame. On the left you can see the names of the functions that the samples have occurred in. In Tuner you can also drill down into the actual functions and see which lines and even which assembly instructions have samples attributed to them. This allows you to find hotspots - sections of the code that are called a lot and contribute heavily to the cost of each frame.
Tuner has a lot of additional functionality that I won't go into here - this should be enough to help you to understand captures that myself and Ben will post over the upcoming weeks.
In my next post I'll be looking in detail at the optimisation of one of the sections of the Physics thread. I'll just say this; my initial hope was that I could gain a lot of performance back by simply rearranging memory and minimising memory allocation in the physics code. That hope was very quickly and resoundingly dashed - this wasn't going to be an easy task.