Tweet |
Eagle-eyed regular readers of this blog may have noticed that I dropped away
during the build up to Christmas this year, and there was a good reason for
this. I was taking part in
Advent of Code,
having been introduced to it by our
Credera brethren. A
challenge a day for 25 days - two a day in fact, as if/when you solve the
first, it gets tweaked to make it harder.
I opted to use Apex to take on the
challenges, which ensured that I couldn't brute force any solutions, and
that I typically had to switch to a batch/asynchronous approach when I
finally ran out of CPU or heap. This didn't always help by the way - a few
times I had to give up after encountering variants of "the batch class is too
large" error. A few other times I had to accept solving one challenge of
the two when I couldn't figure out how to even start on the second! I ended
up with 39/50 successes though, which didn't seem like a bad return for a
constrained language. It was extremely enjoyable, but be aware that this can
easily soak up all of your spare time and more, and may not make you popular
at home!
A common theme of the challenge was walking a route through a grid and
keeping track of the tiles that I'd encountered before, usually with some
additional information around which path I was following, the direction I'd
moved in and how many steps I'd taken in a direction. If it was net new then
I needed to add it to a collection, but I also needed to order the
collection, as I was looking for the shortest or longest route possible, so
they needed to end up in a List.
A Tale of Two Contains Methods
It was the best of methods, it was the worst of methods
One difference between Apex and some other languages I've worked with in the past is the contains method on the List class - this handy helper returns true if the list contains the element passed in to it. This saves me from either iterating the list each time I consider adding an element, or maintaining a separate "lookup" collection - typically a Set that matches the list and I'd check if the element was in there first.
I used the List contains method in my first attempt on one of the challenges,
and found that I had to quickly go to batch apex. In order to walk the path I
was carrying out a
breadth-first search, adding every possible option for each step to a queue as a complex object,
but always processing the shortest option first. Once the queue got to around
3,000 elements (complex objects), I found that I could only process a few of
them before breaching the 60,000 millisecond CPU limit, and I was looking at
an extremely large set of batches that would likely take multiple hours to
complete. After a bit of digging it looked like the check/add to the
queue of steps wasn't scaling overly well, so I switched back to maintaining a
separate lookup Set and using the
Set contains method
to determine if I'd seen it before. Once I did this, the CPU use dropped to
the point where I could complete the whole thing in 2-3 batches, which I did.
I was somewhat taken aback by this, as I'd assumed that the List contains
method would be using an efficient mechanism under the hood and would
perform well regardless of the size of the list/complexity of the object.
This turns out not to be the case, but that's really my fault for assuming -
there's nothing in the docs to suggest that it will be doing anything of the
sort.
This is much more the kind of result I want to see - the performance isn't really changing regardless of the size of the Set, and while the final hundred takes slightly longer than the first hundred, the average is 0.07ms, which leaves me plenty of CPU to play with in the rest of the transaction.
Now that Advent of Code has completed, I've had the time to run some
numbers on the CPU consumption of each of these contains methods (hence
the witty title, with apologies to Charles Dickens of course), and present the results.
The Methodology
I have defined a (not particularly) complex object so that there's a bit
of work involved to determine if another object matches it:
public class ContainsObject { public String stringRep; public Integer intRep; public Long square; public DateTime timestamp; public ContainsObject(Integer idx) { intRep=idx; stringRep=''+idx; square=idx*idx; timestamp=System.now(); } }
I then add two thousand of these to a List, checking each one to see if
I've seen it before. The CPU consumed is captured for every 100 elements
and gives the following results:
Count CPU 0 - 100 76 400 - 500 97 900 -1000 169 1400 - 1500 460 1900 - 2000 582from this I can deduce that there isn't anything particularly efficient going on under the hood - as the size of the List increases, so does the time taken to check and add 100 elements. In the 1900-2000 range the average is over 5ms per check/insert, which is quite a chunk for a couple of statements.
Switching to the List and lookup Set approach, I create a Set of the complex objects to mirror the contents of the List, but without any ordering, that I can use for the check part. If the element isn't present in the Set, I add it to both the Set and the List.
Executing this for the same number of complex objects gives:
Count CPU 0 - 100 4 400 - 500 6 900 -1000 5 1400 - 1500 4 1900 - 2000 7
No Downside?
As always, there's no such thing as a free lunch - the fact that I have to maintain another collection for speedy lookup does incur a heap penalty. It is a pretty cheap lunch though, as I'm only holding references to the objects stored in the List rather than copies, so the 2,000 entries in the Set consume another 8kb of heap. This feels like a pretty decent trade off to me, but if your transaction has plenty of spare CPU and is butting up against the heap limit, you'll likely feel different.