Monday, May 07, 2012

Tales of the IUnexpected<Int32>

Being curious of the comparison of LINQ and classic actual-code (henceforth referred to as CAC) techniques, I wanted to know how big the gap was. You can tell how exciting my bank holiday Monday has been so far.

Wonder what would happen with strings? Reference types and immutable - good string operations can make average hardware work far better.

Generally I thought the major factor would be an ordinal comparison - or lack of in the LINQ mechanism, for example, the difference between:

There's about a 20% performance [speed] improvement on the optimised route so tried the LINQ version with and without this optimisation but it only seemed to have a very small benefit - probably around 3% tops.

The lhs / rhs values make a difference too - if the comparison finds a difference in earlier characters of the string it won't bother comparing the rest of the string [Assumption: you're doing a case-sensitive comparison]  and if the string length isn't equal it won't bother running the comparison at all, so tried a few combinations of differently cased characters in different positions in the two strings each with the same letters.

Hit the test five times each on debug versions of the exe outside of VS2010 to reduce complications and get an average to make it fairer.

So generation of the source for comparison done on a quasi-random character basis, but so that both method calls would use the same array:


Then use the result for the following comparison:



On the string comparison routines you can only really see the improvement with a string[] of around 1M elements but all the way from 100 elements up to the million the CAC version is faster (by about 100ms at the lM element range) for the same source string. That source may or may not contain the character it's checking for.

I would have expected a bigger difference with the predicate too (commented out in the above example) and both value and reference types to experience the same performance footprint. But would you expect numerical operations to have the same behaviour?

So if we play with performance for a very specific scenario where an array of integers is iterated through, and any element which is even is selected into a result set. So the CAC method is pretty much just a straight for loop (not a foreach):
I purposely created a variable scoped within the loop to hold the value from the particular slot rather than access the array element twice.

And the LINQ operation is a simple from object query:
The IL for the different mechanisms was pretty varied too. Will come back to that.

When I changed the size of the source array which both methods search on I found that below 450000 elements the CAC method was quicker than the LINQ version. Around 500000 elements they averaged about the same but when expanding the test source to over 600000 array elements the CAC version was actually marginally slower.

Mind you, we're talking of the result set examples as 1ms and 3ms for the smaller source arrays, 13ms and 14ms for that middle-size, around 20ms and 15ms for the large array measured across CAC and LINQ versions respectively.

Some LINQ mechanisms outperform CAC equivalents with much larger arrays? I suppose it's like the difference between Hashtable and Dictionary in that Dictionary performs better for smaller collections, but Hashtable far better with large collections. HybridDictionary is ok but there's a bit of a hit if the collection size moves over the threshold too often.

The IL seemed to show that the extensions and LINQ mechanisms use a specialised call set whereas the CAC was mostly flat stack instructions. More of it, just faster execution in the case of the Strings and smaller numerical arrays.

So, in summary, nothing really conclusive but something I didn't expect with the integer operations. The gap between the two is there but only if the software product you're working on has performance constraints.

Mind you, all software has performance constraints to one extent or another :)