Filtering Collections using Guava – Performance Analysis

As I showed in my previous post, Guava Predicates let you write code that is clearer (at leas IMHO). As explained in the Guava site, the most basic use of Predicates is for collection filtering, which is also where I am using them. But since there is no such thing as a free beer, I wondered whether this came with a price tag.

To solve this question, I created a mini-benchmark to check the performance of Guava filters. The typical use of a collection filter is when you are doing something to part of the elements in the collection – so this is what I did:

  1. Create a collection of random integers in a given range (the collection has 10*range items).
  2. Create a predicate that filter the elements in the collection by randomly selecting and integer and filtering using less than comparison.
  3. Get the filtered collection (using both guava and collection copying) and do something with it (otherwise a very smart JIT may simply ignore the code if it has no visible results…). Time how much this takes.

Steps 2-3 were repeated 1000 times so that outliers are averaged. I executed these tests changing the number of element in the source list and changing the range of the elements in the list. The results were divided by counting both the time it took to filter the elements and the time it took to access them.

These are the results:

#ElementsValue RangeCopy FilterCopy AccessTotal CopyGuava FilterGuava AccessTotal Guava
10010121022114051
100100101112
1001000415156
10010000415101
1000103303311314
10001002902962329
100010002212332225
1000100003103102121
1000010223322614160174
10000100211221311182193
100001000235023514201215
1000010000242124311169180

From the results:

  1. The rage of the filter does not matter much when doing a one-pass access.
  2. Total time is very similar for small collections, but Guava is slightly better for large collections.
  3. Since the actual filter in Guava is done when the collections is accessed, the time payment is incurred later.

I also checked what happened if the filter class was accessed 10 times instead of only one time. Here the results are very different:

#ElementsValue RangeMultiple Copy AccessMultiple Guava Access
10010125196
100100320
1001000324
10010000223
10001013248
100010020254
1000100025283
10001000018204
10000101161759
100001001281822
1000010001671877
10000100001981899

Here we can see the hidden cost of the Guava filter. Since the collection is filtered as you access it, every time we do an iteration on the filtered collection we are actually iterating over the full collection and filtering on-the-fly. For a large number of elements the cost is even tenfold.

So using Guava uses in places when you are filtering a collection for a one time use is definitely OK. If you are going to use the filtered collection more times, then refrain from directly using the collection you received and copy the filtered collection (using either Java’s ArrayList.new(Collection) constructor or Guava’s Lists.newArrayList(Iterable)).

The code for the performance test can be found here. Your comments are welcome. Happy coding!.

Enhanced by Zemanta

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.