AI is a remarkable leap forward, but it’s critical that we acknowledge its limitations.
I’m an enthusiastic supporter of all things AI, including Generative. Below, I’ve listed a few areas where AI creates a seismic shift, and, critically, where it’s still early in its journey.
Where things like ChatGPT really make a difference will be in accelerating the optimization of content. As an inverse principle to the below, its propensity to make stuff up is precisely what we want from Generative models. Once we have identified our customer segments, we could subsequently customize preexisting copy and, assuming it passes human and legal review, rapidly deploy and test different versions of copy. For personalization, then, we can rapidly accelerate our marketing efforts using Generative AI while leveraging existing analytics - giving the best of both worlds.
While there are extremely powerful traditional Natural Language Processing methods that are often less expensive than Large Language Models, Generative AI’s ability to extract themes and sentiments simultaneously will likely provide a crucial check on human limits for these types of problems. As an example, a typical analyst would compute the sentiments and topics of pieces of text independently, and then subsequently synthesize such data into hand-curated themes. ChatGPT can do this in a few lines of code with minimal analysis.
This remains the most well-publicized use-case for Generative AI. Multiple companies, including Wendy’s, have begun the process of integrating LLMs into their customer flows. While a large body of data is needed for initiatives like this, we can use standard backpropagation techniques to ensure that they learn the domain of your business. As LLMs grew out of Neural Network style models, they require a large amount of data to adequately learn a domain or non-public subject area. In order to generate sufficient training data, synthetic data may be required, which will require a significant engineering investment, and a longer time to ROI than a typical machine learning lifecycle.
For projects that involve Natural Language Processing, OpenAI solutions can be competitive depending on the desired output as well as the sophistication of the model selected. As a frame of reference, we also compared the (approximate) pricing of Generative Solutions to preexisting NLP solutions (such as AWS Comprehend) and custom-built solutions using cost-minimizing cluster-based solutions (e.g. AWS Fargate). All prices are based on evaluating 5,000 pieces of text at varying lengths:
As you can see above, GPT-3.5 Turbo takes the gold for most accessible product, although not by much. It is worth noting that while Fargate solutions are similarly priced, there may be additional optimizations that can be made to be more efficient.
Also, notably, for shorter types of content, there is not a practical difference between the five products compared. In these cases, operational considerations (e.g. orchestration, prompt engineering, preexisting infrastructure, etc.) should guide your decision-making.
Generative AI’s capabilities are brand new, and still expanding, but it has some fundamental limitations.
While there will certainly be opportunities to add Generative content in your personalization pipeline, these models have a couple of problems on the personalization front.
First, they can “hallucinate,” or, in layman's terms, make stuff up. While traditional modeling workflows can most frequently have errors in the data collection, preparation, and algorithm training stage, we now have the additional problem of Generative AI models, trained on internet data, simply getting the wrong prompt and generating content that could be false or defamatory, be discriminatory, violate trademarks or licensing agreements, misrepresent your product(s), or create liability some other way.
Second, while it might be helpful to see what a Large Language Model (LLM) would make of your data, it’s noteworthy that, say, a proper segmentation will still need human oversight. For instance, you might want to implement Factor Analysis to get at the latent variables (e.g. the “purchase interest” vs. “purchase need”). While a Generative model can approximate these, they cannot implement them directly. Although the implementation of the Wolfram Connector may solve this, a better use for these might be the interpretation of the outcomes of factor analysis or other unsupervised learning techniques.
Generative AIs predict sequential data in a unique way, leading to their huge leap forward in recent months. But they’re not optimizers. In fact, they’re the opposite: they’re designed to generate new content, not to evaluate existing content.
If you’re scoring leads, you’re interested in one thing: its probability to convert. This is a well-defined outcome: in the end, it either converts or it doesn’t, meaning it can go from 0% probability all the way to 100% probability. Models like Logistic Regression, Random Forests, or even Transformer’s predecessors, Neural Networks, can be constrained to return a probability between 0 and 1. Not so for Generative AI - even with the best prompt, you can’t guarantee that it won’t occasionally return an entirely unrelated answer.
An additional wrinkle arises from the fact that these AI models - especially the vended ones - are a black box. As AI regulation arises that requires provable explanations for its recommendations, Generative AI may become difficult to use since, while it can give you a text “explanation” of its recommendations, we can’t verify the veracity of its claims under the hood the way we can through a variety of Machine Learning models
While Generative AI can do a great deal of hypothesis generation, it will struggle in the realm of experimental design and hypothesis testing. Furthermore, as of 2023, peer review study of hypothesis generation reports a high error rate — this is still an area where a human with business insight outperforms the machine.
It remains to be seen whether these models can learn to conduct hypothesis tests, especially as they are trained for domain-specific implementations, for now we will need to rely on a human to build these experimental designs and verify that they are actually testing what the user intends. This is because it cannot guarantee its ground truth, or, more simply, it does not know what it does not know. Thus very real, practical limitations for your design may simply not be available to the model.
Need help deciding what sort of AI model you need for your business? Contact Concord!
Not sure on your next step? We'd love to hear about your business challenges. No pitch. No strings attached.