Filtering Collections using Guava – Performance Analysis

As I showed in my previous post, Guava Predicates let you write code that is clearer (at leas IMHO). As explained in the Guava site, the most basic use of Predicates is for collection filtering, which is also where I am using them. But since there is no such thing as a free beer, I wondered whether this came with a price tag.

To solve this question, I created a mini-benchmark to check the performance of Guava filters. The typical use of a collection filter is when you are doing something to part of the elements in the collection – so this is what I did:

  1. Create a collection of random integers in a given range (the collection has 10*range items).
  2. Create a predicate that filter the elements in the collection by randomly selecting and integer and filtering using less than comparison.
  3. Get the filtered collection (using both guava and collection copying) and do something with it (otherwise a very smart JIT may simply ignore the code if it has no visible results…). Time how much this takes.

Steps 2-3 were repeated 1000 times so that outliers are averaged. I executed these tests changing the number of element in the source list and changing the range of the elements in the list. The results were divided by counting both the time it took to filter the elements and the time it took to access them.

These are the results:

#Elements Value Range Copy Filter Copy Access Total Copy Guava Filter Guava Access Total Guava
100 10 12 10 22 11 40 51
100 100 1 0 1 1 1 2
100 1000 4 1 5 1 5 6
100 10000 4 1 5 1 0 1
1000 10 33 0 33 1 13 14
1000 100 29 0 29 6 23 29
1000 1000 22 1 23 3 22 25
1000 10000 31 0 31 0 21 21
10000 10 223 3 226 14 160 174
10000 100 211 2 213 11 182 193
10000 1000 235 0 235 14 201 215
10000 10000 242 1 243 11 169 180

From the results:

  1. The rage of the filter does not matter much when doing a one-pass access.
  2. Total time is very similar for small collections, but Guava is slightly better for large collections.
  3. Since the actual filter in Guava is done when the collections is accessed, the time payment is incurred later.

I also checked what happened if the filter class was accessed 10 times instead of only one time. Here the results are very different:

#Elements Value Range Multiple Copy Access Multiple Guava Access
100 10 125 196
100 100 3 20
100 1000 3 24
100 10000 2 23
1000 10 13 248
1000 100 20 254
1000 1000 25 283
1000 10000 18 204
10000 10 116 1759
10000 100 128 1822
10000 1000 167 1877
10000 10000 198 1899

Here we can see the hidden cost of the Guava filter. Since the collection is filtered as you access it, every time we do an iteration on the filtered collection we are actually iterating over the full collection and filtering on-the-fly. For a large number of elements the cost is even tenfold.

So using Guava uses in places when you are filtering a collection for a one time use is definitely OK. If you are going to use the filtered collection more times, then refrain from directly using the collection you received and copy the filtered collection (using either Java’s ArrayList.new(Collection) constructor or Guava’s Lists.newArrayList(Iterable)).

The code for the performance test can be found here. Your comments are welcome. Happy coding!.

Enhanced by Zemanta

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.