Vainolo's Blog

Filtering Collections using Guava – Performance Analysis

leave a comment

As I showed in my previous post, Guava Predicates let you write code that is clearer (at leas IMHO). As explained in the Guava site, the most basic use of Predicates is for collection filtering, which is also where I am using them. But since there is no such thing as a free beer, I wondered whether this came with a price tag.

To solve this question, I created a mini-benchmark to check the performance of Guava filters. The typical use of a collection filter is when you are doing something to part of the elements in the collection – so this is what I did:

  1. Create a collection of random integers in a given range (the collection has 10*range items).
  2. Create a predicate that filter the elements in the collection by randomly selecting and integer and filtering using less than comparison.
  3. Get the filtered collection (using both guava and collection copying) and do something with it (otherwise a very smart JIT may simply ignore the code if it has no visible results…). Time how much this takes.

Steps 2-3 were repeated 1000 times so that outliers are averaged. I executed these tests changing the number of element in the source list and changing the range of the elements in the list. The results were divided by counting both the time it took to filter the elements and the time it took to access them.

These are the results:

#ElementsValue RangeCopy FilterCopy AccessTotal CopyGuava FilterGuava AccessTotal Guava

From the results:

  1. The rage of the filter does not matter much when doing a one-pass access.
  2. Total time is very similar for small collections, but Guava is slightly better for large collections.
  3. Since the actual filter in Guava is done when the collections is accessed, the time payment is incurred later.

I also checked what happened if the filter class was accessed 10 times instead of only one time. Here the results are very different:

#ElementsValue RangeMultiple Copy AccessMultiple Guava Access

Here we can see the hidden cost of the Guava filter. Since the collection is filtered as you access it, every time we do an iteration on the filtered collection we are actually iterating over the full collection and filtering on-the-fly. For a large number of elements the cost is even tenfold.

So using Guava uses in places when you are filtering a collection for a one time use is definitely OK. If you are going to use the filtered collection more times, then refrain from directly using the collection you received and copy the filtered collection (using either Java’s constructor or Guava’s Lists.newArrayList(Iterable)).

The code for the performance test can be found here. Your comments are welcome. Happy coding!.

Enhanced by Zemanta

Written by vainolo

July 29th, 2012 at 1:50 pm

Posted in Programming

Tagged with , , ,

Leave a Reply

%d bloggers like this: