- Reference Benchmarks :.
Cross Platform DAW Performance - Part I :
Hard to believe its been over 3 years since I last did a comparable shootout between O.S's for DAW use. The last series of reports were focused purely on Windows with the release of Vista and the minefield needing to be navigated with Microsoft's then new shiny toy.
This time I am donning my helmet and goggles and tackling an area that for many is a no go zone, make no mistake - the dreaded platform wars are still alive and kicking, and nothing can bring out the best and worst in the tech community than a good ole MAC v PC , Windows v OSX debate.
With Microsoft and Apple rolling out their much hyped new toys , Windows 7 and OSX 10.6 ( Snow Leopard) , both promising a faster, leaner and performance optimized experience for the end users, it was time to ride the whirlwind and do some cross platform head to head DAW performance testing.
First off a quick look at what each new O.S has touted
to have brought to the table that could be of benefit to our specific
area of DAW performance.
|Windows 7 - Leaner, Meaner, Faster - Take I :
Vista was definitely not a shining moment for Microsoft , for many in the PC DAW community it was simply too bloated , lethargic and resource hungry to make the move from their tried and true XP/XPx64.
Windows 7 Rev 1 is definitely a different beast to the
first rev Vista , its less resource hungry , faster out of the box and
delivers comparable performance to the leaner siblings XP/XPx64 in regards
to DAW performance, something that Vista even after 2 major service
packs failed to deliver.
Windows 7 is not a huge departure from
Vista , yes it is definitely leaner and faster out of the box, which
gives an immediate impression of it being better than its predecessor
, but it isn't that different a beast.
The Performance improvements for Windows 7 IMO are a direct result of the work that Mark Russinovich has done with streamlining and optimizing the kernel, and improvements in multiprocessor/threading scheduling.
Windows 7 has a substantially smaller memory and resource footprint than Vista and although nowhere near as lean as XP/XPx64 , is far more refined in regards to memory management - which works well with the larger memory resources available in the current 64 bit environments that we are moving to.
In short, its what Vista should have been out of the gate, and is more evolutionary than revolutionary, despite all of the hype.
|OSX 10.6 - Leaner, Meaner, Faster - Take II :
Apple have been refining, tweaking and bloating OSX for over a decade, with each upgrade traditionally being packed with more and more features , gadgets , baubles , etc. 10.6 however was a departure from that familiar M.O in that is focused purely on streamlining and performance.
10.6 is claimed to be leaner , meaner and faster than its predecessor, finally with full 64 bit support - gasp, and better multithreading and multiprocessor support.
Apple and their fans
certainly capitalized on Vista's shortcomings , from the endless and
often mind numbing stream of I'm a MAC, I'm a PC onslaught in the media,
to the endless debates across forums etc, they were having a field day,
and rightfully so.
With Microsoft backed into a corner and promising huge performance improvements with Windows 7, Apple no doubt realized that they needed to get their ship in order as well, and focused their attention to pretty much doing exactly what M.S was doing to Vista, trimming the fat and optimizing the internals for better overall performance.
OSX 10.6 definitely delivered , but again it is more evolutionary than revolutionary, no matter how much spin Apple applied.
|Preparing for Battle :
This latest round being cross platform set some new challenges to navigate in regards to reference computer hardware . I could have easily used a Mac Pro and bootcamped into Windows, but there were some serious questions being raised in regards to scaling issues that were attributed to clocking and thermal arbitration on the official Mac Hardware which was impossible to resolve with the closed nature of the EFI as apposed to a BIOS being used.
Whereas on a 3rd party motherboard with a BIOS, where thermal and clocking states like C Halt , EIST, Turbo Boost can be easily disabled , this was not an option on the Mac hardware, so I decided to tackle the challenge from the other direction.
My tried and tested X58 / i7 920 is close to identical to a single processor MacPro running the Xeon variant of the chip -W3520 , identical clock speed, same motherboard chipset, same memory speed, etc.
The main difference is I have full access to the required thermal and clocking parameters via BIOS, as well as Hyperthreading options which was another issue being hotly debated in regards to scaling performance on OSX, more so than Windows where it was working well.
The challenge of course was to install OSX onto my hardware, and ensure it was performing to par with the official hardware. There is some legal grey areas to doing this so I am obviously not going to detail how I did it , but the information is easily obtained for those that are interested in investigating the option further.
The initial testing was done with OSX 10.5.8 , and results cross referenced on a 2009 MacPro Dual Nehalem owned by a trusted and respected Cubendo community member - Pal Svennevig in Norway, who helped me enormously during the entire testing phase across both OSX 10.5.8 and OSX 10.6 , ensuring that my OSX installs were performing as expected.
There have certainly been times in the past where myself and Pal have not seen eye to eye, we have disagreed at some fundamental levels and approaches, but we never disagreed when it came to hard facts about performance and hardware.
With both of us being long term and seasoned Steinberg users, and both having access to the same audio hardware , it was a given that the first cab off the rank would be Steinberg's 2 flagships- Nuendo 4.3 and Cubase 5.1 , from now on referred to as simply Cubendo.
I can already hear the howls from the back that the Steinberg application is not as well optimized for OSX as Windows, and that may well be true, so in the following series of tests I will be using Digidesigns Protools and numerous other app's to try and balance the testing. Of course we will then have arguments over RTAS v VST v AU, ASIO v DAE v Core Audio and on it goes.
But we need to start somewhere.
DAW Application Details :
|Round One : Windows XPSP3 v OSX 10.5.8 :.
To set the stage it was decided to primarily do the testing on the outgoing O.S's to get a baseline result that we could use to quantify the performance improvements - if any, of the newer O.S's. To say the results raised a few eyebrows is an understatement.
My initial reaction on running the tests on my development system on 10.5.8 was that the installation was screwed , I knew there was a cross platform performance variable with Cubendo between Windows and OSX, but I wasn't expecting anything near the results I achieved.
I needed to ensure that OSX 10.5.8 was running to spec on my system, so I commissioned Pal to run the identical test on his 2009 MAC Pro - Dual Nehalem system , it was slightly down on clockspeed, but obviously had twice the number of cores / threads so I was expecting a huge performance increase over my initial and possibly flawed results.
The results of the MACPro highlighted some interesting and controversial issues , firstly my system was performing admirably in comparison so the OSX install was obviously fine, it was performing so well it actually outperformed the official hardware at any setting below 256 samples - with 1/2 the number of cores .
If that wasn't enough the virtual threads being initiated via Hyperthreading were causing further performance issues on OSX at the lower latencies more so on the MACPro than on my Single Quad system. So the more cores/threads, the more OSX's MP scheduling seemed to be falling over itself.
To add insult to injury, XPSP3 completely wiped the floor with both OSX systems, the low latency results clearly highlighting the huge variable between the cross platform scaling performance - with and without Hyperthreading, and also highlighted that HT was scaling extremely well on the Windows platform.
The lack luster performance of the MACPro against my development system was indicating that there was more to this puzzle than just the O.S at play here , and this is where it gets controversial as we get into the area of thermal / clock arbitration - both at the hardware and software level that I had mentioned earlier.
The further we investigated the more we discovered that
there are some concerning arbitration issues within the Apple implementation
of EFI that go a lot deeper than the commonly know less than stellar
EIST - Speedstepping implementation within OSX.
The issues are at very low level, which is not being
indicated by OSX's inbuilt metering- for what its worth- and is showing
that Apple's EFI implementation is not handling the Intel thermal /
clocking routines well at all. The end result is that the clock speeds
are constantly being ramped up and down even under heavy load, which
is severely effecting the scaling potential.
For the most part , the majority of MAC OSX end users have been none the wiser of these arbitration and scaling issues , although many in the Logic community have been scratching their heads over the recent issues they have also been experiencing with the scaling of the latest Nehalem systems, but never really dug deep enough simply because it was very difficult to do any type of cross platform /hardware comparisons.
Interestingly the initial catalyst for diving further into the cross platform comparative testing was Steinberg's insistence that Hyperthreading on the current architecture was detrimental across the board , which I disputed as HT was working very well on Windows. HT was certainly causing a few more curves on OSX , but that was due to OSX's task scheduler being , hmmm, less than impressive at arbitrating the virtual cores , more so than an inherent issue across the board.
This was something that I was determined to get thru
to the Steinberg brain trust so that they would hopefully amend their
official line in regards to HT to reflect a more accurate state of affairs.
Time to move onto the 2 latest and greatest , Windows 7 x64 and OSX 10.6.x - Snow leopard
|Round Two : Windows 7 64 v OSX 10.6.2 :.
Moving onto the current O.S's two things were most evident , one being that the performance of XP v Win7 remained very similar across the board , indicating that despite XP's kernel being quite a few years old, its MP scheduling/scaling was actually very efficient, and the second was that the low latency performance of Cubase had improved quite substantially between 10.5.8 and 10.6.x , without changing a single line of code in the application itself.
This indicated to me that Apple had definitely done a fair amount of work in regards to optimizing and improving the MP scheduling on 10.6 as they had promised. This was most evident on the Dual Processor MP system which showed huge performance improvements over the results achieved under 10.5.8 . My single processor system also showed some substantial improvement, but nothing compared to the results achieved on the dual socket system.
What was also interesting with the OSX 10.6 results on the Dual MP was that for the first time the results with Hyperthreading were showing noticeable improvement , where as on OSX 10.5.8 that was clearly not the case.
Now all of the above is certainly positive, it is showing that Apple have focused on optimizing and improving performance in the areas of MP/MT scheduling , and that further improvement is plausible at the application level as well, so its all good. However while comparing OSX 10.5.8 and OSX 10.6.2 in isolation is all rosy, once we have a closer look at the comparative against Windows 7 , again despite the measurable improvements on OSX , the performance variable is still huge , where the single socket system running Windows 7 is still outperforming the Dual Processor system running OSX.
The question now arises whether that variable is solely specific to the application, or is the operating system and its associated protocols also a contributing factor ?
Of course this is difficult to conclude until we have tested more cross platform applications, using a variety of driver /plugin protocols, and also investigated the inherent difference between those driver protocols.
In the case of driver protocols we are comparing ASIO
on Windows v Core Audio on OSX , each have their advantages in their
respective environments - Core Audio is multi client for example whereas
ASIO is not , but ASIO has apparently far less overhead and is more
efficient at low latencies.
|ASIO v Core Audio :
Dealing with the the 2 opposing preferred low latency protocols also introduces some other variables into the mix. With OSX we are dealing with the inbuilt generic multi client low latency driver which in Windows 7 would be the WaveRT driver , while in Windows we are dealing with the ASIO driver spec that has been developed by Steinberg , and has also been widely accepted by pretty much all DAW applications running on Windows.
WaveRT would be a fairer head to head but as I had mentioned earlier, it still isn't being widely accepted by the vast majority of hardware and software developers. Main reason is that it simply cannot match ASIO in regards to low latency performance.
I suspect that there are other technical fine details why the protocol is not being embraced , but the simple fact that it still does not out perform the preferred and well established ASIO protocol is more than enough reason for the developers on both sides of the fence to not embrace it.
Moving on to Core Audio purely in regards to low DAW latency performance is an area that at times a little hard to approach simply because the competing DAW applications on OSX that are using the protocol have wide and varied buffering mechanisms in place.
Logic is a prime example with its Hybrid engine where its playback buffer is not in any way associated with the actual buffer setting that is set at the core audio level within the hardware , instead it has a set playback buffer ( 1024 - Default) that is independent to input / monitoring latency.
Very clever and well executed for the most part, as
most Logic users are non the wiser.
Despite the reported I/O latencies in Cubase being very close for the RME reference card across the set buffer settings, the performance variable was something that I wanted to dig into a little further in regards to whether the actual protocols were in fact a contributing factor.
Of course my personal testing and observation would only go so far in regards to what could be going on at the lower level, so I approached a trusted contacted at an audio hardware developer and asked if there were any significant or obvious reasons he could share why ASIO would be outperforming Core Audio in these specific round of tests.
While we were waiting for a fully detailed reported from higher up the chain in regards to those directly responsible in developing and programming the respective drivers, we did manage to discuss one variable that would explain some aspects in regards to the performance discrepancies I was experiencing.
In short, ASIO per sample buffer will make 3 calls to the O.S whereas Core Audio makes 5 , so at any given sample buffer setting obviously Core Audio has a lot more low level arbitration to navigate before it can deliver the audio stream.
This definitely helped me in understanding that comparative low latency scaling for Cubendo cross platform has more than one challenge to overcome , and why Steinberg are less than open to the fact.
Unfortunately the full detailed report was not forth coming, no reasons were given but obviously because of the information re the ASIO v Core Audio arbitration calls were made in a passing conversation, I will not cite my reference and simply say that the information is plausible but I have no detailed confirmation or evidence to substantiate the claim past personal experience that correlates to the information given.
|Incremental Benchmarks v Real World Sessions :
There has been criticism leveled at numerous times to the benching work I have done over the years, the main qualm being that the incremental benching methodology does not reflect Real World working environments, and in that respect are not relevant for the most part. Of course the question arises what exactly is a Real World session , as what is relavent for one user may not be relavent to the next , as there is a wide and varied net to try and cast.
Also trying to qualify accurate quantifiable scaling results when using real World Sessions is a lot more difficult as we really don't have an incremental reference we can use, and trying to quantify performance variables using the ASIO / CPU Meters have really not proven to be an accurate medium to qualify scaling.
All that aside, the concerns raised in regards to just how closely the incremental results correlate to working environments were enough for us to look into creating a test project based on a Real World recording/mixing session , which we did using a recent track Pal had recorded and engineered for a Blues Rock band.
Both 48K and 96K versions of the session were created and configured so that all plugins used were Cubendo Native VST3 and the session padded out with some extra tracks and plugins to the point that both sessions at their respective latencies ( 128/96K , 64/48K ) had tapped out the MacPro Dual Nehalem.
Those sessions were then run on my single CPU system on both OSX and Windows , the results were again quite revealing in regards to cross platform comparative performance.
The initial testing was done on OSX 10.5.8 / XP, so with the sessions on the Dual Mac Pro running 10.5.8 at the absolute limit I was interested to see whether the sessions would playback on my single CPU system running OSX 10.5.8, and if they did, just how well. The sessions loaded and played back at their respective latency settings with plenty of headroom to spare.
This was again highlighting the performance variable between the open hardware and Apple's official proprietary hardware. The sessions on XP both successfully ran at 032 samples , which for the 96K session was quite remarkable.
Moving to 10.6 the 2 session configurations were updated and expanded to the point of collapse on the Dual Mac Pro and then played on my single CPU system. Again the sessions played back without any issue at the designated latency settings, the 48K session even managing to play at 032 samples, so there definitely was a measurable improvement over 10.5.8.
We then moved to Windows 7 , which again effortlessly played back both sessions at 032 samples with substantial headroom to spare. Just how much headroom was still available on the sessions is the real question but one that is not easy to answer.
As I noted earlier, the comparative performance data is a lot harder to accurately quantify when our only reference apart from whether the sessions actually played successfully are the ASIO/CPU meter readings, and although they do give some indication, I have never really trusted them past a quick visual guide, as they do not take into account the fact that playback could easily collapse at moderate ASIO/CPU readings.
What is interesting and has remained consistent across both of the testing methodologies is that not only has the open single CPU system performed remarkably well against the Dual Mac Pro , but also that the comparative performance of Cubendo across OSX and Windows remains greatly in favor of Windows by quite a substantial amount.
By using both methodologies in this round of testing , we can at least answer the qualms of those who questioned the validity of the incremental methodology in regards to how it correlates to real working environments. Although the results are not as clear and easily presented using the RW sessions, they are still consistent to the performance variables being experienced and reported cross platform by the incremental methodology.
The results are in no way conclusive in regards to overall performance for each respective operating system, obviously with the Steinberg product the performance on Windows is far superior on the current revisions.
That is not really going to be a great surprise to anyone who has had experience with the application over the years, but the size of the variable was definitely an eye opener and something that is clearly something that Steinberg should be focusing on.
|Steinberg are set to release Nuendo 5.0 and Cubase 5.5 shortly, which promises significant improvement on OSX in regards to MP scaling. There were some large % figures being quoted by the reps early in the piece in respect to low latency performance, but of course without any way of knowing how those quoted numbers were quantified there really is no point in even repeating them here.
I will do a follow up report once the new versions have gone Gold , and we can then get an accurate appraisal of the performance improvements achieved.