A Practical approach to Introduction to LLM's- icoversai

Explore the fundamentals of large language models (LLMs), from understanding their core mechanics to practical applications like prompt engineering, fine-tuning, and building custom models. Dive into the future of AI with our beginner-friendly guide.

Introduction to Large Language Models (LLMs)

What Are Large Language Models?

Everyone, I'm Shah and I'm back with a new data science Series. In this new series, I will discuss large language models and how to use them in practice. I will give a beginner-friendly introduction to large language models. This article describes three levels of working with them in practice in future articles. This series will discuss various practical aspects of large language models things as using open AIs and Python API using open source Solutions like the hugging face Transformers Library. How to fine-tune large language models and of course how to build a large language model from scratch.

If you enjoyed this content, please be sure to like subscribe, and share with others. If you have any suggestions for me to include in this series, please share those in the comments section below.

Why Large Language Models Are Revolutionary

So with that let's get into the article. So to kick off the article series. In this article, I'm going to be giving a practical introduction to large language models and this is meant to be very beginner-friendly and high level. I'll leave more technical details and example code for future articles and blogs in this series. So a natural place to start is a large language model or llm for short. So I'm sure most people are familiar with chat GPT.

However, if you are enlightened enough to not keep up with new cycles and Tech hype and all this kind of stuff. Chat GPT is essentially a very impressive and advanced chatbot. If you go to the chat GPT website, you can ask it questions like what's a large language model and it will generate a response very quickly like the one that we are seeing here and that is really impressive.

If you were ever on AOL Instant Messenger also called Aim. You know back in the early 2000s or in the early days of the internet. There were chat Bots then there have been chat Bots for a long time, but this one feels different the text is very impressive and it almost feels human-like a question.

Understanding the Basics of Large Language Models

What Makes a Language Model "Large"?

You might have when you hear the term large language model is what makes it large. What's the difference between a large language model and a not-large language model. This was exactly the question, I had when I first heard the term and so one way we can put it is that large language models are a special type of language model but what makes them so special. I'm sure there's a lot that can be said about large language models but to keep things simple.

Quantitative and Qualitative Properties of LLMs

I'm going to talk about two distinguishing properties. The first quantitative and the second qualitative. So first quantitatively large language models are large. They have many many more model parameters than past language models. So these days, this is anywhere from tens to hundreds of billions of parameters. The model parameters are numbers that Define how the model will take an input and generate the output. So it's essentially the numbers that Define the model itself, okay.

The Emergent Properties of Large Language Models

So that's a quantitative perspective of what distinguishes large language models from not large language models. But there's also this qualitative perspective and these so-called emergent properties that start to show up. When Lang language models become large, so emergent properties is the language used in this paper cited below a survey of large language models available in the archive really great beginner's guide. I recommend it but essentially what this term means is there are properties in large language models that do not appear in smaller language models.

So one example of this is zero-shot learning. One definition of zero-shot learning is the capability of a machine learning model to complete a task. It was not explicitly trained to do. So while this may not sound super impressive to us very smart and sophisticated humans. This is actually a major innovation in how these state-of-the-art machine learning models are developed. So to see this we can compare the old state-of-the-art Paradigm to this new state-of-the-art paradigm. The old way and not too long ago.

Supervised Learning vs. Self-Supervised Learning in LLMs

A Historical Perspective: From Supervised to Self-Supervised Learning

We can say that about five to ten years ago, the way the high-performing best machine learning models were developed was strictly through supervised learning. What this would typically look like is you would train a model on thousands. If not millions of labeled examples. So what this might have looked like is you have some input text like, Hello, Ola. How's it going nastabian? so on and so forth. You take all these examples and you manually assign a label to each example.

Here we're labeling the language. So English, Spanish and so on. So you can imagine that this would take a tremendous amount of human effort to get thousands. If not millions of high-quality examples.

How Self-Supervised Learning Powers LLMs

Let's compare this to the more recent Innovation with large language models, which use a different Paradigm, they use so-called self-supervised learning. So what that looks like in the context of large language models is you train a very large model on a very large Corpus of data. So what this can look like is if you're trying to build a model that can do language classification instead of painstakingly generating this labeled data set. You can just take a corpus of English text and a corpus of Spanish text and train a model in a self-supervised way.

In contrast to supervised learning self-supervised learning does not require manual labeling of each example in your data set, the so-called labels or targets. The model is actually defined from the inherent structure of the data or in the context of the text.

How Large Language Models Work: The Next Word Prediction Paradigm

The Core Task: Predicting the Next Word

You might be thinking to yourself how does this self-supervised learning actually work. So one of the most popular ways that this is done is the next word prediction Paradigm. So suppose, we have this text listening to you and we want to predict what the next word would be? But clearly, there's not just one word that can go after the string of words. There are actually many words you can put after this text and it would make sense in this next word prediction Paradigm. What the language model is trying to do is to predict the probability distribution of the neck.

Next word given the previous words. What this might look like is listening to your heart might be the most probable. Next, word but another likely word could be gut or listen to your body or listen to your parents and listen to your grandma and so this is essentially the core task that these large language models are trained to do and the way the large language model will learn these probabilities is that it'll see so many examples in this massive Corpus of text that is trained on.

The Role of Context in Language Modeling

It has a massive number of internal parameters so it can efficiently represent all the different statistical associations with different words an important Point here is that context matters if we simply add the word. Don't to the front of this string here and it changed it to don't listen to your. Then this probability distribution could look entirely different because just by adding one word before this sentence, we completely change the meaning of the sentence. So to put this a bit more mathematically. I promise this is the most technical thing.

In this article, this is an example of an autoregression task. So Auto meaning self-regression means you're trying to predict something what this notation means is what is the probability of the nth text or more technically. The nth token is given the preceding M token so n minus 1 and minus 2 and minus three and so on and so forth.

So if you really want to boil everything down. This is the core task most large language models are doing and somehow through this very simple task of predicting the next word. We get this incredible performance from tools like chat GPT and other large language models. So now with that Foundation said hopefully, you have a decent understanding of what large language models are how they work, and a broader context for them.

Practical Applications of Large Language Models

Three Levels of Working with LLMs: From Beginner to Advanced

Now, let's talk about how we can use these in practice. Here, I will talk about three levels in which we can use large language models. These three levels are ordered by the technical expertise and computational resources required the most accessible way to use large language models is prompt engineering.

Level 1: Getting Started with Prompt Engineering

Using ChatGPT and Other User-Friendly Interfaces

Next, we have model fine-tuning, and then finally, we have to build our own large language model. So starting from level one prompt engineering. Here, I have a pretty broad definition of prompt engineering. Here, I Define it as just using an LLM out of the box. So more specifically not touching any of the model parameters. So of these tens of billions or hundreds of billions of parameters that Define the model.

Here, I'll talk about two ways. We can do this one the easy way. I'm sure this is the way that most people in the world have interacted with large language models which is using things like chat GPT. These are like intuitive user interfaces. They don't require any code and they're completely free anyone can just go to the Chat GPT website and type in a prompt.

It'll spit out a response, while this is definitely the easiest way to do it. It is a bit restrictive in that you have to go to their website. This doesn't really scale well. If you're trying to build a product or service around it but for a lot of use cases. This is actually super helpful.

Leveraging OpenAI’s API and Hugging Face Transformers Library

So for applications where the easy way doesn't cut it. There is the less easy way which is using things like the open AI, API, or the hugging phase. Transformers library and these tools provide ways to interact with large language models programmatically. So essentially using Python in The Case of the openai. API instead of typing your request in the chat GPT user interface, you can send it over to Openai using Python and their API and then you will get a response back.

Of course their API is not free, so you have to pay per API call. Another way we can do this is via open source Solutions. One of which is the hugging phase Transformers Library which gives you easy access to open source large language models. So it's free and you can run these models locally. So no need to send your potentially proprietary or confidential information to a third party and open AI. So future articles of the series. We'll dive into all these different aspects.

I'll talk about the Openai API, and what it is? How it works share example code. I'll dive into the hugging face Transformers Library same situation. What the heck is it? How does it work and then sharing some Python example code there.

I'll also do an article talking about prompt engineering more generally. How can we create prompts to get good responses from large language models? So while prompt engineering is the most accessible way to work with large language models just working with a model out of the box may give you sub-optimal performance on a specific task or use case or the model has really good performance. But it's massive it has like a hundred billion parameters.

Level 2: Fine-Tuning Large Language Models

The Basics of Model Fine-Tuning: Adjusting Internal Parameters

So the question might be is there a way, we can use a smaller model, but kind of tweak it in a way to have a good performance on our very narrow and specific use case. So this brings us to level two which is model fine tuning which here I Define as adjusting at least one internal model parameter for a particular task. So here, there are just generally two steps. One you get a pre-trained large language model maybe from open AI or maybe an open source model from the hugging phase Transformers library.

Then you update the model parameters given task-specific examples kind of going back to the supervised learning versus self-supervised learning the pre-trained model is going to be self-supervised. So it will be trained on this simple word prediction task.

Popular Fine-Tuning Techniques: LoRA and RLHF

In step two, here's where we're going to do supervised learning or even reinforcement learning to tweak. The model parameters for a specific use case and so turn out to work very well for models like Chat GPT. You're not working with the raw pre-trained model. The model that you are interacting with in chat GPT is actually a fine-tuned model developed using reinforcement learning. So a reason why this might work is that in doing this self-supervised task and doing the word prediction. The base model of this pre-trained large language model is learning useful representations for a wide variety of tasks.

So in a future article, I will dive more deeply into fine-tuning techniques popular. One is a low-rank adaptation or low raw for short and then another popular one is reinforcement learning with human feedback or rlhf and of course.

Level 3: Building a Custom Large Language Model

High-Level Overview: Data Collection, Preprocessing, and Training

There is a third step. Here, you'll deploy your fine-tuned large language model to do some kind of service or you know use it in your day-to-day life. You'll profit somehow and so my sense is between prompt engineering and model fine-tuning, you can probably handle 99 of large language model use cases and applications.

However, if you're a large organization large Enterprise and security is a big concern. So you don't want to use open source models or you don't want to send data to a third party via an API and maybe you want your large language model to be very good at a relatively specific set of tasks.

You want to customize the training data in a very specific way and you want to own all the rides and have it for commercial use. All this kind of stuff, then it can make sense to go one step. Further Beyond monofins tuning and build your own large language model. So here I Define it as just coming up with all the model parameters.

Understanding the Self-Supervised Learning Process in LLM Training

I'll just talk about how to do this at a very high level here. I'll leave technical details for a future article in the series. First, we need to get our data and so what this might look like is you'll get a book Corpus, a Wikipedia Corpus, and a Python corpus.

You will take that and pre-process it. Refine it into your training data set and then you can take the training data set and do the model training through self-supervised learning and then out of that comes, the pre-trained large language model. So you can take this as your starting point for level two and go from there.

Future Articles in the Large Language Models Series

Exploring Prompt Engineering in Detail

So if you enjoyed this article and you want to read more, be sure to check out the blog in towards data science. There I share some more details that I may have missed in this article. This series is both an article and a blog Series. So each article will have an Associated blog and there will also be tons of example code on the GitHub repository story.

Advanced Fine-Tuning Techniques and Use Cases

The goal of the series is to really just make information about large language models much more accessible. I really do think, this is the technological innovation of our time. There are so many opportunities for potential use cases applications products services that can come out of large language models and that's something that I want to support.

Conclusion: The Importance of Understanding AI

Why Large Language Models Are the Future of Artificial Intelligence

I think we'll be better off if more people understand this technology and are applying it to solving problems. So with that be sure to hit the Subscribe button to keep up with future articles in the series.

Please drop those in the comments section below and as always thank you so much for your time and thanks for reading.