"Why" AI is poised to drive business outcomes.

4/2/2024

With the explosion of Generative Artificial Intelligence (GenAI), and the widespread adoption of Large Language Models (LLMs), there are widespread opportunities for organizations and individuals to augment their existing functions with AI. As with the introduction of any new technology, I have seen varied opinions from people ranging from being fascinated, to resisting change, and even attempting to disprove that the technology is actually beneficial, by trying every possible way to make the technology fail. Worse, we are constantly highlighting one-sided dangers of the technology, with prejudice and bias, without taking the time to understand why or how certain technologies can be beneficial, what its strengths are, and what its limitations may be. Public perception is often the biggest threat to innovation.

Like many people, I have spent a lot of time experimenting with AI but haven't found a compelling explanation of "why" AI can help drive business outcomes. One of the key aspects that I've been focused on is the use of my own curated and contextual data, which has positively disrupted the outputs from these fantastic AI models. As I continue my learning journey through my career and through life, I’ve decided to create a short series of blogs about my findings and my experiments in the space of GenAI and LLMs. While it will primarily serve as a reminder of my own career pathway, I intend to make this information relevant and helpful to anyone taking the time to read it. Throughout my blog posts, I will include relevant resources to where a reader can find further information, should they wish to dive deeper into a certain topic. I hope you enjoy reading this series, as much as I enjoyed writing it.

In my first blog post about this exciting area of research and application, I will aim to explain the inner workings of Large Language Models, and some of their characteristics, in a way that allows us to use a data-driven approach to decision making. The hope is that this information will allow the adoption of these AI models in our workplaces (and our personal lives), to augment our efforts, and help us be more productive, and efficient in the long run.

What are Large Language Models?

Almost everyone that has heard of Artificial intelligence, has also probably heard of the term “Neural Network”. They are a specific architecture which allows computers to learn the relationship between input samples and output samples, by mimicking how neurons in the brain signal each other. The first Neural Network, called a Perceptron, developed in 1957, had one layer of neurons with weights and thresholds that could be adjusted in between the inputs and the outputs. A fantastic introduction to this topic can be found here. Large Language Models (LLMs) are a type of Neural Network. In contrast to the Perceptron from 1957, some of these LLMs are infinitely more complex, with nearly a hundred layers and 175 billion or more neurons.

What do Large Language Models do?

One of the first practical applications of neural networks was to recognize binary patterns. Given a series of streaming bits (1s and 0s) as input, the network was designed to predict the next bit in the sequence (output). Similarly, in very simple terms, Large Language Models can predict the next word given a sequence of words as input. If that is all that they can do, why are they so powerful and appear incredibly intelligent? The scope of this goes well beyond a blog post, but here is an incredible resource that will help you understand the inner workings of large language models.

Characteristics of Large Language Models

In the final sub section of this blog, I will attempt to highlight some properties or characteristics of Large Language Models. I will refer to these characteristics in various future blog posts, so the utility of these properties and their application areas can be better understood. It is these characteristics and properties that contribute to the ability of LLMs to appear to understand and generate human-like language.

Word Vectors: Computers represent all information as bits. LLMs, in specific, represent words as numerical vectors. If you read through the link in the previous sub section on what LLMs do, you will already be aware that by storing these words as vectors, LLMs can capture semantic relationships (or meaning) between words. More importantly, LLMs operate in such a high dimensional space that they can compute the meaning of entire paragraphs and sections of text, comprising of thousands of words. This is what allows them to understand human requests and generate responses.
Transformers: One of the key components of LLMs that can actively exchange information with humans is the transformer architecture. The transformer architecture allows parallel processing of information while also propagating information between sequential layers, allowing the model to learn complex relationships and patterns in the language (i.e. the relationship between words).
Training Data: The top LLMs in use today have benefited from vast amounts of training data. It must be noted that these LLMs are trained on unlabeled text using self-supervised or semi-supervised learning methods. In the first stage, the model is trained on all the textual information that was available on the internet (sourced using crawlers or similar). With this data, the neural network learns to predict the next word in the sequence. In the next stage, the same model is given labeled data, of a much higher quality, with labels, but on a much smaller scale. The process is referred to as fine-tuning and use to generate contextually relevant content, in a format that matters to the end user, from the LLMs. Andrej Karpathy has an excellent podcast on how LLMs are trained, and offers valuable insights into how prediction and compression (e.g. zip files) can be shown to have a close mathematical relationship. It follows that larger the amount of data, and larger the model parameters, the better the ability of the model to predict the next word in the sequence. Using this base model to perform fine-tuning inherently guarantees better results (subject, of course, to the dataset and labels used in the fine-tuning phase).
Fine-Tuning and Prompt Engineering: As I previously mentioned, fine-tuning is essential for LLMs to perform specific tasks in a consistent and effective manner. Fine-tuning helps optimize the model's performance for a particular task such as analyzing sentiment or generating summaries in a specific format. When used in combination with other techniques such as prompt-engineering, the ability of a fine-tuned model to adapt to a new task and context can be quite magical and have a huge utilitarian value. A simple online search will yield several great resources on prompt-engineering, which is a highly sought after skill / emerging field. I strongly recommend that anyone interested in using GenAI for your personal or business use, to experiment with these techniques and solve relevant challenges that you encounter on a daily basis.
The Black Box Problem and Emergent Behavior: So far, we’ve focused on the things that we truly understand about LLMs. But it is also important to recognize that the sheer size and complexity of LLMs pose a significant challenge to our understanding of their true inner workings. The “black box” problem draws attention to the fact that the logic behind the decision-making process of these large neural networks is not easily traceable, and hence, not well-understood. This article in Nature highlights the black box problem and offers a good balance in perspectives on why this problem matters, and how we can continue to work around it while receiving the benefits that AI offers. On a related note, LLMs exhibit something known as “Emergent Behavior” i.e. as the size of the models gets bigger, they exhibit capabilities that they were not trained to perform. It is this characteristic that makes experimenting with LLMs to solve business challenges a particularly exciting area of interest.

What can Large Language Models help with?

At Relativ, we have been experimenting with Deep Learning, Psychology, Linguistics, and Large Language Models, by qualitatively assessing the generated text across thousands of interactions, and continuously instruction-tuning these models to quantize their output. We have learned that when LLMs are combined with proprietary algorithms and curated data, their output can be transformative and insightful. When used in thoughtful conjunction around the end-user experience, as well as the intended business or academic outcome, we are starting to see some very promising results. We are already beta-testing these models in recruiting, sales, learning and development, retrospectives, and career readiness where early adopters are reaping the rewards of experimenting early, and gaining a competitive advantage from the learning that occurs.
In the next series of blogs, I will attempt to describe how Relativ's models are being used in each of the above fields, and why they can be a game changer in the long run. I will refer to the characteristics of LLMs highlighted in this blog entry, for continuity and chain-of-thought (pun-intended), throughout this series. In the meantime, head over to relativ.ai or reach out to us to learn how we can help you deploy your own AI models, infused with psychology, and linguistics, to help you drive business outcomes.

0 Comments

All things AI, VR, Entrepreneurial, Academic, and Fun!

"Why" AI is poised to drive business outcomes.

Leave a Reply.

About

Archives

Categories