Sunday, 14 January 2024

Scratch Org Snapshots in Spring '24

Note: This feature is in beta in Spring '24. Like all other betas, this functionality may never go GA and may disappear at any time. Caveat emptor.

Image generated by DALL-E 3 based on a prompt by Bob Buzzard

Introduction

The Spring '24 release of Salesforce moves the scratch org snapshot functionality into beta. I've been waiting to get my hands on this and so far it hasn't disappointed.

Whenever we get new features of this nature, I like to reflect on how far we've come in Salesforce land. In this case I was testing with the codebase of our BrightMEDIA accelerator, and when we started building this in mid-2014 (aka nearly a decade ago), we typically allowed a week to get a new developer set up. We had to spin up a Developer Edition, raise a bunch of tickets to get various features enabled and increase the Apex character limit, install a number of packages, carry out a number of manual setup steps,  deploy the code and assign permission sets. For whatever reason, no two Developer Editions appeared to have the same setup, so typically the deployment was an iterative process where we discovered what was missing or off instead of on by default. Then they'd go through and set up some standing data to be able to work in the org. 

Fast forward to the end of 2023 and I have a node script that creates a scratch org, installs the packages, deploys the code, loads the standing data and produces a ready to go development environment in around 30 minutes. I'm always interested in speeding things up though!

Creating a Snapshot

Thanks to a pre-release environment that I've also had for a decade, I have a pre-release dev hub which meant I could enable the beta before the Spring '24 release goes live. Then I assigned myself the appropriate object permissions for Org Snapshots and I was ready to create. 

I set up my scratch org using my existing script, which creates an org with the following applied:

  • Four managed packages
  • Approximately 9,000 metadata components
  • Approximately 2,000 records
Creating a snapshot of this org took 11 minutes, which I must admit was quite a bit faster than I was expecting.

Using the Snapshot


This started off with a bit of a challenge, in that attempting to use the snapshot kept giving the error that the snapshot wasn't Active, but listing the Dev Hub snapshots showed that it was indeed Active. I spent a while searching through the CLI Github issues list and the snapshot pilot Trailblazer group, but it seemed like I was the lucky one who got to experience this first. This was quite soon after the pre-release had gone live, so I figured it might be a simple bug and played the waiting game.

About 7 hours later my masterful inactivity was rewarded, as my snapshot sprang to life and I was able to run the commend to create an org from it. In fairness, it might have started working 10 minutes later, but it was around 7 hours later I had the time to try it out again.

The even better news was that creating a scratch org from the snapshot took 6 minutes - an 80% saving on the 30 minute creation time for my script. The org was flawless too - all the metadata and data was there.

The End of Sandboxes?


So does this mean that we can all create scratch org snapshots rather than sandboxes going forward? They even contain data, so maybe we can do away with full or partial copy sandboxes too.  I don't think so, for a few reasons.

Lifespan


Scratch orgs and org snapshots, have a 30 day lifespan. From a developer perspective this is fine - we treat these orgs and disposable and typically create a fresh one when we start a new piece of work. That isn't necessarily the case for orgs used for training, QA, integration testing or testing against a new release. It's particularly unsuitable for pre-production environments which mirror production - imagine having to recreate all your test integrations at the start of every month!

Storage


Scratch orgs and org snapshots are limited to 200Mb of data. Again, probably fine for many development tasks, but again likely to be too small for training, pre-production and test environments that are indicative of production. 

Licenses


Sandboxes replicate your production org licenses, so all of your users can have access. Scratch orgs are a much more restrictively licensed, usually somewhere between 1 and 10 seats per feature. When we were adding community (now Experience Cloud) features to BrightMEDIA, we had the princely sum of 1 partner community license available in our scratch orgs - you'd have to be quite brave to promote to production with that kind of limitation on your testing!

Completeness of Version Controlled Metadata


This is where developer/developer pro sandboxes will retain their usefulness once scratch org snapshots are live. Some organisations with large, mature Salesforce orgs won't have all their metadata in version control, because why would they invest the time and money to do that when they don't need to. They'll likely have Apex, flows, lightning components, and maybe some second generation packages in version control, but things like sharing rules, report and dashboard folders, duplicate rules that are managed by administrators probably won't. Yes this is a sweeping generalisation, but you get the general idea. Being able to create a guaranteed replication of production to work in will be an important capability for years to to come in my view.  That said, they'll probably become less used as time goes on and maybe scratch org snapshots get longer lifespans.

So not a sandbox killer, but that was never the intention. For those of us with a very source-centric development approach however, this is another great addition to the developer toolbelt.

More Information



Sunday, 7 January 2024

Breaking Batch

Image generated by DALL-E 3 from a prompt by Bob Buzzard

Introduction


In my last blog post (A Tale of Two Contains Methods) I mentioned that I'd spent quite a bit of December taking part in Advent of Code.  Each day there were two challenges - a (relatively) straightforward one, that could potentially be brute forced, and an extended version where brute forcing would take days so using the a more thoughtful approach was required. As I was tackling these challenges using Apex, brute forcing wasn't really an option, so my solution typically involved building structures of complex objects in memory in order to be able to process them quickly. Pretty much every extended version required batch Apex to handle the volumes, and in a few cases the (relatively) straightforward one did too.

The combination of the complex object structure and batch Apex threw up some interesting errors, so I decided to blog about one of these. A couple of things to note:
  • This isn't a moan about batch Apex - I was using it in a way that I'm pretty sure it wasn't intended for, and there was a simple workaround
  • By complex object I just mean one that is made up of primitives, simple(r) objects and collections - it doesn't mean it was a particularly difficult structure to comprehend or change.

The Challenge


(Some of the challenge detail has been removed for clarity - you can see it in its full glory here)
Part 1 of the challenge in question was around bricks of varying length in a 3-dimensional structure (essentially a large cube) that had landed on top of each other like a weird Jenga puzzle. Based on the starting coordinate and dimensions of each brick, I needed to figure out how the bricks were supported in the structure. 

The approach I took was to represent a brick as an object and hold two associated collections for each Brick instance:
  • Supporters - these are the Bricks that are directly beneath this Brick and in contact with it.
  • Supporting - these are the Bricks that this brick is directly beneath and supporting. 
The answer I had to calculate to complete the challenge was number of bricks that I could remove without causing any other bricks to fall. This could be accomplished by iterating the bricks and adding up all of those where all of the Supporting bricks are also supported by others. 

Part 2 was to find sum of the bricks that would fall if each of the bricks were removed. With the structure that I had in place, this was actually quite simple. I iterated the bricks, found all of the Supporting entries where that brick was the only Supporter, and then found all of their Supporting entries where they were the only Supporter and so on until I reached the end. This would definitely need batch Apex though, as there were 1,500 bricks in the actual challenge input.

Each challenge includes a small example with the workings and answers - 6 bricks in this case - so I was  able to test my batch Apex before executing with the larger volume of data.

My Brick class was as follows:
public class Brick
{
    public String brickNo;
    public Point3d startPoint;
    public Point3d endPoint;
    public Integer width;
    public Integer depth;
    public Integer height;
    public Set<Brick> supporters=new Set<Brick>();
    public Set<Brick> supporting=new Set<Brick>();
    public Integer totalSupporters=0;
}
The start method of the Batch class converted the input into a collection of Bricks and then returned a collection of Integers, one per Brick. I implemented Database.Stateful so that the collection of Bricks was available across each execute method, and then processed the Bricks who's brickNo appeared in the scope. Essentially I'd broken up my iteration of the Bricks across a number of transactions, while ensuring I only had to build the Bricks structure once at the start.

When I ran this with the example, it worked fine and gave me the correct answer. 

The Problem


I then fired it off with the (much larger) challenge input, and was initially pleased to see that I was able to build the in-memory structure without running into any issues around heap or CPU. Sadly this pleasant sensation was short lived, as the first batch that executed generated the following output:


Based on the debug that I had in the class, it was clear that the batch job was failing before it was getting to any of my code. After some binary chop style debugging, where retried the batch with various parts of the code commented out, it turned out that the issue was my collections:
    public Set<Brick> supporters;
    public Set<Brick> supporting;

As I already had the full collection of Bricks stored in a Map keyed by brickNo, turning these into sets of Strings and storing the brickNo rather than a reference to the Brick itself didn't need much in terms ot changes to the code, and allowed the batch to complete without issue.

So why were Sets of Strings okay by Sets of Bricks not? Once I was into a large cube with 1,500 bricks in it, it looked like the sets got pretty big. As the Bricks were stored in an instance variable, they were part of the state of the batch and thus de/serialised for each batch processed. Obviously I'm not privy to exactly how the batch processing in Apex works, but I'd imagine that serialising ended up with a pretty huge structure with a lot of repetition, as the same Brick instances were expanded many times as part of  the Supporters and Supporting collections. Deserialising this structure clearly proved too much, hence the internal error. 

In Conclusion


As mentioned earlier, this isn't intended to throw shade on batch Apex. Storing large collections of complex objects that contain collections of other complex objects so they can be accessed across transactions really isn't a valid use case. This kind of information belongs in the database rather than in the batch class, while Database.stateful is more appropriate for managing things like running totals.

This is one of the reasons that I really enjoyed taking on Advent of Code with Apex - I'm trying to solve problems that (a) I'd never encounter in a customer implementation and (b) the Salesforce platform is really not suited to handling.

This was also a lesson in the need to test with indicative data - everything worked fine with the small amount of test data I had available, but once I hit the real data the flaws were revealed!

Related Posts


Thursday, 28 December 2023

A Tale of Two Contains Methods


Image generated by DALL-E 3 from a prompt by Bob Buzzard 


Introduction


Eagle-eyed regular readers of this blog may have noticed that I dropped away during the build up to Christmas this year, and there was a good reason for this. I was taking part in Advent of Code, having been introduced to it by our Credera brethren. A challenge a day for 25 days - two a day in fact, as if/when you solve the first, it gets tweaked to make it harder. 

I opted to use Apex to take on the challenges, which ensured that I couldn't brute force any solutions, and that I typically had to switch to a batch/asynchronous approach when I finally ran out of CPU or heap. This didn't always help by the way - a few times I had to give up after encountering variants of "the batch class is too large" error. A few other times I had to accept solving one challenge of the two when I couldn't figure out how to even start on the second!  I ended up with 39/50 successes though, which didn't seem like a bad return for a constrained language. It was extremely enjoyable, but be aware that this can easily soak up all of your spare time and more, and may not make you popular at home!

A common theme of the challenge was walking a route through a grid and keeping track of the tiles that I'd encountered before, usually with some additional information around which path I was following, the direction I'd moved in and how many steps I'd taken in a direction. If it was net new then I needed to add it to a collection, but I also needed to order the collection, as I was looking for the shortest or longest route possible, so they needed to end up in a List.

A Tale of Two Contains Methods


It was the best of methods, it was the worst of methods

One difference between Apex and some other languages I've worked with in the past is the contains method on the List class - this handy helper returns true if the list contains the element passed in to it. This saves me from either iterating the list each time I consider adding an element, or maintaining a separate "lookup" collection - typically a Set that matches the list and I'd check if the element was in there first.

I used the List contains method in my first attempt on one of the challenges, and found that I had to quickly go to batch apex. In order to walk the path I was carrying out a breadth-first search, adding every possible option for each step to a queue as a complex object, but always processing the shortest option first. Once the queue got to around 3,000 elements (complex objects), I found that I could only process a few of them before breaching the 60,000 millisecond CPU limit, and I was looking at an extremely large set of batches that would likely take multiple hours to complete.  After a bit of digging it looked like the check/add to the queue of steps wasn't scaling overly well, so I switched back to maintaining a separate lookup Set and using the Set contains method to determine if I'd seen it before. Once I did this, the CPU use dropped to the point where I could complete the whole thing in 2-3 batches, which I did.

I was somewhat taken aback by this, as I'd assumed that the List contains method would be using an efficient mechanism under the hood and would perform well regardless of the size of the list/complexity of the object. This turns out not to be the case, but that's really my fault for assuming - there's nothing in the docs to suggest that it will be doing anything of the sort.

Now that Advent of Code has completed, I've had the time to run some numbers on the CPU consumption of each of these contains methods (hence the witty title, with apologies to Charles Dickens of course), and present the results.

The Methodology


I have defined a (not particularly) complex object so that there's a bit of work involved to determine if another object matches it:

public class ContainsObject 
{
    public String stringRep;
    public Integer intRep;
    public Long square;
    public DateTime timestamp;
    
    public ContainsObject(Integer idx)
    {
        intRep=idx;
        stringRep=''+idx;
        square=idx*idx;
        timestamp=System.now();
    }
}

I then add two thousand of these to a List, checking each one to see if I've seen it before. The CPU consumed is captured for every 100 elements and gives the following results:

   Count            CPU

   0 - 100           76
 400 - 500           97
 900 -1000          169
1400 - 1500         460
1900 - 2000         582

from this I can deduce that there isn't anything particularly efficient going on under the hood - as the size of the List increases, so does the time taken to check and add 100 elements. In the 1900-2000 range the average is over 5ms per check/insert, which is quite a chunk for a couple of statements.

Switching to the List and lookup Set approach, I create a Set of the complex objects to mirror the contents of the List, but without any ordering, that I can use for the check part. If the element isn't present in the Set, I add it to both the Set and the List.

Executing this for the same number of complex objects gives:

   Count            CPU

   0 - 100           4
 400 - 500           6
 900 -1000           5
1400 - 1500          4
1900 - 2000          7

This is much more the kind of result I want to see - the performance isn't really changing regardless of the size of the Set, and while the final hundred takes slightly longer than the first hundred, the average is 0.07ms, which leaves me plenty of CPU to play with in the rest of the transaction.

No Downside?


As always, there's no such thing as a free lunch - the fact that I have to maintain another collection for speedy lookup does incur a heap penalty. It is a pretty cheap lunch though, as I'm only holding references to the objects stored in the List rather than copies, so the 2,000 entries in the Set consume another 8kb of heap. This feels like a pretty decent trade off to me, but if your transaction has plenty of spare CPU and is butting up against the heap limit, you'll likely feel different.

Related Posts




Saturday, 11 November 2023

OpenAI GPTs - Meet Bob Buzzard 2.0



Introduction


During OpenAI DevDay, the concept of custom GPTs was launched - Chat GPT with a bunch of preset instructions to target a specific problem domain, additional capabilities such as browsing the web, and extra knowledge in terms of information that may not be available on the web. 

In order to create and use GPTs, you need to be a ChatGPT Plus subscriber at $20/month, although in the UK there's VAT to be added so it works out around £20/month. This also gives priority access to new features, the latest models and tools, faster response times and access even at peak times. I signed up just to try out GPTs though, as they looked like a world of fun.

The Replicant


My first custom GPT is my replicant - Bob Buzzard 2.0. A GPT that has been pointed at most of my public and some of my private information. Instructed to respond as I would, you can expect irreverent or sarcastic responses as the mood takes (AI) me. Obviously very focused on Salesforce, and keen on Apex code. 

Right now you'll need to be a ChatGPT Plus user to access custom GPTs, but if you are you can find Bob Buzzard 2.0 at : https://chat.openai.com/g/g-DOVc9phwC-bob-buzzard-2-0  Here's a snippet of a response from my digital twin regarding the impact of log messages on CPU - something I've investigated in detail in the past :


Creating GPTs


This is incredibly simple - you just navigate to the create page and tell it in natural language how you want it to behave, define the skills, point it at additional web sites or upload additional information. It's easy and requires no technical knowledge, which does make me wonder why they announced it at developer day given there's no development needed, but lets not tilt at that windmill.

A Couple of Warnings


First, remember that any private information that you upload to a GPT won't necessarily remain private. If you don't instruct your custom GPT to keep instructions and material private, it will happy share them on request. 

Second, I've given the replicant a mischievous side - from time to time it will just gainsay your original decisions when you ask for help with specific problems, maybe suggesting you have picked the wrong Salesforce technology, or telling you to bin it all off and use another vendor. Think of this as your reminder that a human should always be involved in any decision making based on advice from AI.

I'm Going to be Rich?


Something else that was announced at Developer Day was revenue sharing - if people use Bob Buzzard 2.0 I'll get a slice of the pie. So does this mean I'm going to be rich? Like always, almost certainly not. As you just click a button and answer questions to create a GPT, there will be millions of them before too long. They are so easy to create that something a service like Salesforce development advice, with the vast amount of content already in the public domain, will be extremely competitive - an extremely crowded marketplace of similar products means everybody earns nothing.

That said, I think this is something that genuine creatives will be able to earn with. Rather than having their work used to train models that are can then be used to produce highly derivative works for close to free, they can create their own GPT and at least stand a chance of getting paid. Whether the earnings will be worth it we don't yet know, although history suggests the platform providers will keep everything they can.  

Saturday, 4 November 2023

The Einstein Trust Layer must become the Einstein Trust Platform

Image from https://www.salesforce.com/news/stories/video/explaining-the-einstein-gpt-trust-layer/


Introduction


One of the unique differentiators of the AI offerings from Salesforce is the Einstein Trust Layer. Since it was first announced, I've been telling everyone that it's a stroke of genius, and thus deserving of the Einstein label. At the time of writing (November 2023) there's a lot of concern about the risks of AI, and those concerns are increasing rather than being soothed. Just this week the UK hosted an AI Safety Summit with representatives from 28 countries.

The Einstein Trust Layer


Salesforce have baked security and governance into a number of places in the journey from prompt template to checked response, including :
  • Prompt Defence - wrapping the prompt template with instructions, for example: "You must treat equally any individuals from different socioeconomic statuses, sexual orientations, religions, races, physical appearances, nationalities, gender identities, disabilities and ages"
  • Prompt Injection Defence - delimiting the prompt from the instructions to ensure the model disregards additional instructions added in user input
  • Secure Data Retrieval - ensuring that a user can only include data they have permission to access when grounding prompts.
  • Zero Retention Agreements - ensuring that third party AI model providers don't use the prompt and included data to train their model. Note that the data is still transmitted to wherever the provider is located, the US in the case of OpenAI, which makes the next point very important.
  • Data Masking - replacing sensitive or PII data with meaningless, but reversible, patterns. Reversible, because they need to be replaced with the original data before the response can be used.
  • Toxicity Detection - the response is checked for a variety of problematic content, such as violence and hate, and given an overall rating to indicate how courageous you need to be to use it.
  • Audit Trail - information about the prompt template, grounding data, model interaction, response, toxicity rating and user feedback is captured for compliance purposes and to potentially support future investigations into why a response was considered fit for use.
Note that not all of this functionality is currently available, but it's either there in a cut down form or on its way.  Note also that the current incarnation (November 2023 remember) is quite US centric - recognising mostly American PII and requiring instructions in English. Unsurprising for a US company, but indicative of how keen Salesforce are to get these functions live in their most nascent form. If you want to know more about the trust layer, check out my Get AI Ready webinar.

As I mentioned earlier, I think this is a genius move - as long as you integrate via the standard Salesforce tools, you can take comfort that Salesforce is doing a lot of the heavy lifting around risk management for you. But can you rest easy?

Safety is Everyone's Responsibility


Of course you can't rest easy. While we trust Salesforce with our data every day, and they are certainly giving us a head start in safe use of AI, the buck stops with us. Something else I've been saying to anyone who will listen is that we should trust Salesforce, but it can't be blind trust. We know quite a lot about how the Einstein Trust Layer works, but we have to be certain that it is applying the rules that we want in place, rather than a set of generic rules that doesn't quite cover what we need. One-size-doesn't-quite-fit-all if you will. 

The Layer must become the Platform


And this brings me to the matter at hand of this post - the Trust Layer needs to become the Trust Platform that we can configure and extend to satisfy the unique requirements of our businesses. In no particular order, we need to be able to :
  • Define our own rules and patterns for data masking
  • Create our own toxicity topics, and adjust the overall ratings based on our own rules
  • Add our own defensive instructions to the prompts. 
    Yes, I know we'll be able to do this on a prompt by prompt basis, but I'm talking about company standard instructions that need to be added to all prompts. It will get tedious to manually add these to every prompt, and even more tedious to update them all manually when minor changes are required.
  • Include additional information in the audit logs
and much more - plugins that carry out additional risk mitigation that isn't currently part of the Salesforce "stack". Feels like there's an AppExchange opportunity here too!

Once we have this, we'll be able to say we are using AI responsibly, to the best of our ability and current knowledge at any rate.

Saturday, 14 October 2023

Einstein Sales Emails


Image created by StableDiffusion 2.1 based on a prompt by Bob Buzzard

Introduction

Sales GPT went GA in July 2023, and then went through a couple of "blink and you'll miss it" renames, before it's was rolled into (October 2023, so it might have changed by the time you read this!) Einstein for Sales. From the Generative AI perspective, this consists of a couple of features - call summaries, and the subject of this post - Sales Emails. 

Turning it On

This feature is pretty simply to enable - first turn on Einstein for Sales in Setup:


Then assign yourself the permission set:


Creating Emails


Once the setup is complete, opening the Email Composer shows a shiny new button to draft with Einstein:



In this case I'm sending the email from the Al Miller contact record, and I've selected Al's account - Advanced Communications - from the dropdown/search widget at the bottom. This will be used to ground the prompt that is sent to OpenAI, to include any relevant details from the records in the email.

Clicking the Draft with Einstein button offers me a choice of 5 pre-configured prompts - note that Salesforce doesn't yet offer the capability to create your own prompts, although that is definitely coming soon.



Since GA this feature has been improved with the ability to include product information, so once I choose the type of prompt - Send a Meeting Invite in this case - I have the option to choose a product to refer to. 


Once I choose a product, the Name and Description is pulled from the record, but I can add more information that might be relevant for this Contact - the words with the red border below.  Note that there's a limited number of characters allowed here - I was within 5-10 of the limit.



Clicking the Continue button starts the process of pulling relevant information from the related account to ground the prompt, adding the guardrails to ensure the response is non-toxic, and validating the response before offering it to me:


You can see where the grounding information has been used in the response - Al's name, role and company appear in the second paragraph, and the product information (including my added info) is in the third to try to entice Al to bite at a meeting.

If I don't care for this response I can edit and tweak it, or click the button again to get a new response:


Gotchas


Note that this just adds the next response under the previous one, as you can see by the 'Best regards' at the top of the screenshot. If I don't want to use a response, it's up to me to delete it. Make sure to check the entire content before sending, as this it would be pretty embarrassing to let one go out with 4-5 different emails in it! Note also that I'm expected to add the date and time that I want to meet, to replace the

   [Customize: DATE AND TIME]

I can't help thinking that we'll all start receiving emails with these placeholders still there, much like at the moment when merge fields go bad!

Related Information

Saturday, 30 September 2023

Apex Comparator in Winter 24

Image generated by Stable Diffusion from a prompt by Bob Buzzard

Introduction

The Winter 24 release of Salesforce introduces a few new Apex features, including one that I'm very pleased to see - the Comparator interface. Simply put, this new interface allows the List.sort() method to take a parameter that determines the sort order of the elements in the List. 

The Problem

Now this might not sound like a big change, but it simplifies the support of sorting Lists quite a bit. The way we used to have to do it was for the items in the List to implement the Comparable interface. I've written loads of custom Apex classes over the years that implement this, and it's very straightforward - here's an example from a class that retrieves the code coverage values for all Apex classes in an org and displays them in order of coverage with the lowest covered (problem!) classes first :

public Integer compareTo(Object compareTo) 
{
    CoverageRecord that=(CoverageRecord) compareTo;
    	
    return this.getPercentage()-that.getPercentage();
}	
In this case, implementing compareTo isn't any real overhead - I've created a custom class that contains a whole bunch of information about the coverage for an Apex class - total lines, lines covered etc, so an extra method with a couple of lines of code isn't a big deal. It's a little less convenient if I need to sort a class from an App Exchange package - in that case I'll need to create a new class from scratch to wrap the packaged class and implement the method. If we assume that my coverage class is now in a package, it would look something like :
public class CoverageWrapper 
{
    public BBCOVERAGE__CoverageRecord coverage {get; set;}
    
    public Integer compareTo(Object compareTo) 
    {
        BBCOVERAGE__CoverageRecord that=(BBCOVERAGE__CoverageRecord) compareTo;
    	
        return this.coverage.getPercentage()-that.getPercentage();
    }	
}

Slightly less convenient - I now have a whole new class to maintain to be able to sort, and I have to store all the elements of the list in a CoverageWrapper rather than their original CoverageRecord. Again, not a huge amount of overhead but it gets a bit samey if I'm doing a lot of this kind of thing.

Much the same thing applies if I want to sort sObjects - I need to create a wrapper class and turn my list of sObjects into a list of the wrapper class before I can sort it. All those CPU cycles gone forever!

The Solution

This all changes in Winter 24 with the Comparator interface. I still need to create a class that implements an interface - the Comparator in this case :

public with sharing class CoverageComparator implements Comparator<CoverageRecord> 
{
    public Integer compare(CoverageRecord one, CoverageRecord tother) 
    {
    	return one.getPercentage()-tother.getPercentage();
    }	
}
 

but I don't need to wrap the class/sObject that I am processing in this class and create a new list. Instead I call the new sort() method that takes a Comparator parameter:

List<CoverageRecord> coverageRecords;
    ...
CoverageComparator covComp=new CoverageComparator();
coverageRecords.sort(covComp);

CPU Impact


Regular readers of this blog will know that I'm always interested in the impact of changes on CPU time. In enterprise implementations CPU limits are something that I run up against again and again, so if a new feature improves this I want to know!\

I used my usual methodology to test this - execute anonymous with most logging turned off and Apex at the error level, executing the same code three times and taking the average.

For each test I created the records before capturing the CPU time, then sorted the list. First using a comparator on the list of CoverageRecord objects:
List<CoverageRecord> covRecs=TestData.CreateRecords(100);

Integer startCpu=Limits.getCpuTime();
CoverageComparator covComp=new CoverageComparator();
covRecs.sort(covComp);
Integer stopCpu=Limits.getCpuTime();
System.debug(LoggingLevel.ERROR, 
   'CPU for comparator = ' + (stopCpu-startCpu));

And secondly wrapping them with classes that implement Comparable:

List<CoverageRecord> covRecs=TestData.CreateRecords(100);

Integer startCpu=Limits.getCpuTime();
List<CoverageWrapper> wrappers=new List<CoverageWrapper>();
for (CoverageRecord covRec : covRecs)
{
    CoverageWrapper wrapper=new CoverageWrapper();
    wrapper.coverage=covRec;
    wrappers.add(wrapper);
}

wrappers.sort();
Integer stopCpu=Limits.getCpuTime();
System.debug(LoggingLevel.ERROR, 
      'CPU for comparable = ' + (stopCpu-startCpu));

The results were broadly what I was expecting, as sorting a list in place is always going to be quicker than iterating it, wrapping the members, and then sorting, but it's always good to see the numbers:

  • For 100 records, Comparator took 9 milliseconds versus 11 milliseconds for wrapping
  • For 1,000 records, Comparator took 126 milliseconds versus 150 for wrapping
  • For 10,000 records, Comparator took 1844 millseconds versus 2058 for wrapping
So once you get up to decent sized lists, there's a 10% difference, well worth saving. 

Columbo Close


Just one more thing ... 1844 milliseconds is still quite a chunk of the CPU limit. This is because my original implementation of the CoverageRecord calculates the percentage on demand, based on stored total and covered lines. Clearly in a large list this is being called a lot. I then re-ran the 10,000 test with a new implementation - CoverageRecordCachePercent - which stores the percentage after calculating, which knocked 300 milliseconds, or 15%, off the time. I'm sure that converting again to something that calculated the percentage once the total and covered lines were known and set it as a public property would reduce it further. Your regular reminder that even the smallest method can have an impact if it's being called a large number of times!

More Information