Sunday 30 July 2023

Salesforce CLI Open AI Plug-in - Generating Records


Image generated by Stable Diffusion 2.1, based on a prompt from Bob Buzzard

Introduction

After the go-live of Service and Sales GPT, I felt that I had to revisit my Salesforce CLI Open AI Plug-in and connect it up to the GPT 4 Large Language Model. I didn't succeed in this quest, as while I am a paying customer of OpenAI, I haven't satisfied the requirement of making a successful payment of at least $1. The API function I'm hitting, Create Chat Completion, supports gpt-3.5-turbo and the gpt-4 variants, so once I've racked up enough cost using the earlier models I can switch over by changing one parameter. My current spending looks like it will take me a few months to get there, but such is life with competitively priced APIs.

The Use Case

The first incarnation of the plug-in asks the model to describe Apex, CLI or Salesforce concepts, but I wanted something that was more of a tool than a content generator, so I decided on creating test records. The new command takes parameters listing the field names, the desired output format, and the number of records required, and folds these into the messages passed to the API function. Like the Completion API, the interface is very simple:

const response = await openai.createChatCompletion({
     model: 'gpt-3.5-turbo',
     messages,
     temperature: 1,
     max_tokens: maxTokens,
     top_p: 1,
     frequency_penalty: 0,
     presence_penalty: 0,
   });

result = (response.data.choices[0].message?.content as string);

There's a few more parameters than the Completion API:

  • model - the Large Language Model that I send the request to. Right now I've hardcoded this to the latest I can access
  • messages - the collection of messages to send. The messages build on each other, and each message has a content (the instruction/request) and a role (where the instruction is being sent). This allows me to separate the instructions to the model (when the role is assistant, I'm giving it constraints about how to behave) from the request (when the role is user, this is the task/request I'm asking it to carry out).
  • max_tokens is the maximum number of tokens (approximately 4 characters of text) that my request combined with the response can be. I've set this to 3,500, which is approaching the limit of the gpt-3.5 model. If you have a lot of fields you'll have to generate a smaller number of records to avoid breaching this. I was able to create 50 records with 4-5 fields inside this limit, but your mileage may vary.
  • temperature and top_p guide the model as to whether I want precise or creative responses.
  • frequency_penalty and presence_penalty indicate whether I want the model to continually focus on tokens if they are repeated, or focus on new information.

As this is an asynchronous API, I await the response, then pick the first element in the choices array. 

Here's a few executions to show it in action - linebreaks have been added to the commands to aid legibility - remove these if you copy/paste the commands.

> sf bbai data testdata -f 'Index (count starting at 1), Name (Text, product name), 
            Amount (Number), CloseDate (Date yyyy-mm-dd), 
            StageName (One of these values : Negotiating, Closed Lost, Closed Won)' 
            -r csv

Here are the records you requested
Index,Name,Amount,CloseDate,StageName
1,Product A,1000,2022-01-15,Closed Lost
2,Product B,2500,2022-02-28,Closed Won
3,Product C,500,2022-03-10,Closed Lost
4,Product D,800,2022-04-05,Closed Won
5,Product E,1500,2022-05-20,Negotiating

> sf bbai data testdata -f 'FirstName (Text), LastName (Text), Company (Text), 
                            Email (Email), Rating__c (1-10)' 
                            -n 4 -r json

Here are the records you requested
[
  {
    "FirstName": "John",
    "LastName": "Doe",
    "Company": "ABC Inc.",
    "Email": "john.doe@example.com",
    "Rating__c": 8
  },
  {
    "FirstName": "Jane",
    "LastName": "Smith",
    "Company": "XYZ Corp.",
    "Email": "jane.smith@example.com",
    "Rating__c": 5
  },
  {
    "FirstName": "Michael",
    "LastName": "Johnson",
    "Company": "123 Co.",
    "Email": "michael.johnson@example.com",
    "Rating__c": 9
  },
  {
    "FirstName": "Sarah",
    "LastName": "Williams",
    "Company": "Acme Ltd.",
    "Email": "sarah.williams@example.com",
    "Rating__c": 7
  }
]

There's a few interesting points to note here:

  • Formatting field data is conversational - e.g. when I use Date yyyy-mm-dd the model knows that I want the date in ISO8601 format. For picklist values, I just tell it 'One of these values' and it does the rest.
  • In the messages I asked it to generate realistic data, and while it's very good at this for First Name, Last Name, Email, Company, it's not when told a Name field should be a product name, just giving me Product A, Product B etc.
  • It sometimes takes it a couple of requests to generate the output in a format suitable for dropping into a file - I'm guessing this is because I instruct the model and make the request in a single API call.
  • I've generated probably close to 500 records while testing this, and that has cost me the princely sum of $0.04. If you want to play around with the GPT models, it really is dirt cheap.
The final point I'll make, as I did in the last post, is how simple the code is. All the effort went into the messages to ask the model to generate the data in the correct format, not to include additional information that it was responding to the request, to generate realistic data. Truly the key programming language for Generative AI is the supported language that you speak - English in my case!

As before, you can install the plug-in via :
> sf plugins install bbai
or if you have already installed it, upgrade via :
> sf plugins update

More Information





No comments:

Post a Comment