How to Deploy ML Solutions with FastAPI, Docker, & AWS

With our comprehensive guide, you can learn how to deploy your machine learning APIs using Docker and AWS. From building Docker images to deploying on AWS ECS and automating data pipelines, you can discover step-by-step instructions and best practices for scalable, secure, and efficient API deployment.

Introduction to Deploying Machine Learning APIs with Docker and AWS

Why Containerization is Key for Scalable Machine Learning Applications

Setting Up Your Development Environment

Installing Docker: A Step-by-Step Guide
Configuring Your Python Project for Docker Compatibility

Building a Docker Image for Your Machine Learning API

How to Create and Optimize a Dockerfile for Your Python App
Common Pitfalls and Solutions When Building Docker Images

Running Your Docker Container Local

Testing Your Machine Learning API on a Local Docker Container
Troubleshooting: Solving the Python Path Issue in Docker

Pushing Your Docker Image to Docker Hub

Creating and Tagging Docker Images for Public Repositories
How to Push Docker Images to Docker Hub for Easy Deployment

Deploying Your Dockerized API on AWS Elastic Container Service (ECS)

A Beginner’s Guide to AWS ECS: Fargate vs. EC2 Instances
Step-by-Step: Deploying Your Docker Container on AWS ECS

Configuring Security Groups for Your AWS ECS Deployment

How to Set Up Inbound Traffic Rules for Your API
Best Practices for Securing Your AWS ECS Instances

Testing and Interacting with your Deployed API

Using Gradio to Test Your Machine Learning API on AWS
Comparing Performance: Local vs. Cloud-Deployed APIs

Automating the Data Pipeline for Continuous Integration

Why Automating Your Data Pipeline is Crucial for API Accuracy
Future Steps: Setting Up Automated Updates for Your API

Conclusion: Enhancing Your Machine Learning Projects with Docker and AWS

Next Steps: Exploring More Advanced Containerization and Deployment Techniques

Introduction to Deploying Machine Learning APIs with Docker and AWS

This is the fifth article in a larger series on full-stack data science in the previous article. I walked through the development of a model-based Search tool for my YouTube articles. Here I will discuss how we can take this tool and deploy it into a production environment. I'll start with an overview of key concepts and then dive into the example code.

If you're new here, welcome! I'm Icoversai. I make articles about data science and Entrepreneurship and if you enjoy this article please consider subscribing that's a great no-cost way you can support me in all the articles that I make.

Why Containerization is Key for Scalable Machine Learning Applications

When we think of machine learning, we probably think of neural networks or other models that allow us to make predictions. Although these are a core part of the field a match,ine learning model on its own isn't something that provides a whole lot of value in virtually all situations for a . Forine learning model to provide value it nee,ds to be deployed into the real world. I Define this deployment process as taking a machine-learning model and turning it into a machine-learning solution.

Setting Up Your Development Environment

We start by developing the model which consists of taking data passing it into a machine learning algorithm and obtaining a model from that training process. Deployment can look a lot of different ways. It could simply be making predictions available to programmers and other developers. It could be using the model to power a website or a mobile application and then finally it could be embedding the model into a larger business process or piece of software.

But the key point here is that the model that comes out of the training algorithm and is sitting on your laptop doesn't provide a whole lot of value. However, the model when integrated into a website into a piece of software or made available to end users through an API is something that provides value.

Installing Docker: A Step-by-Step Guide

A natural question is how can we deploy these Solutions? There are countless ways to do this in this article. I'm going to talk about a simple three-step strategy for deployment that is popular among data scientists and machine learning Engineers.

The first step is to create an API. In other words, we create an interface for programs to communicate and interact with our model. What this looks like is we take our model and wrap it in this API which is represented by this box here and then people can send requests to the API and receive responses from it. So in the case of a model, the request will be the input of the model and the response will be the output. Two popular libraries for doing this in Python are Flask and Fast API.

The next step is to take the API and put it in a container. Here container is a technical word referring to a Docker container which is a lightweight wrapper around a piece of software that captures all its dependencies and makes it super portable. So you can easily run that piece of software across multiple machines and then finally we deploy the solution and since we put everything into a container, now it's super easy to run that container on someone else's computer. Some servers you manage or most commonly in the cloud.

Configuring Your Python Project for Docker Compatibility

The Big Three Cloud providers of course are AWS Azure and gcp. So with this high-level overview let's see what this looks like in code here. I'm going to walk through how we can use fast API Docker and AWS to deploy this semantic Search tool that I developed in the previous article. This article is going to be pretty Hands-On so I'm not going to talk too much about fast API Docker or AWS from a conceptual point of view. But if those are things you're interested in let me know in the comments and then I'll make some follow-up articles specifically about those tools, okay.

So here we're going to create a search API with fast API then we're going to create a Docker image for that API then we'll push that image to the docker Hub and then finally we'll use that Docker image to deploy a container on AWS's elastic container service.

So let's start with the first one creating the search API with fast API which is this Python library that does all these great things apparently and using it to write this example here. It was super easy for me to learn and they have this great tutorial for those just getting started. I walked through this to make this example and it probably took me like an hour or something to do it so super easy to learn especially if you've been coding in Python for a while.

Anyway coming back to the code first thing we want to do is we're going to make a file called main.py and we're going to import some libraries. So we'll import fast API to create the API and then the rest of these libraries are so we can Implement our search function like we did in the previous article.

Building a Docker Image for Your Machine Learning API

How to Create and Optimize a Dockerfile for Your Python App

These libraries are used so we can implement our search function like we did in the previous article. We use polers to import data about all my YouTube articles. We use the sentence Transformers library to compute text embeddings. We use Psyit Learn to compute the distance between a user's query in all the articles on my channel. Then I have this other file called functions. piy that has this return search results function which I Define here. I'm not going to go into the details CU it's not critical for the deployment process but essentially what it does is that it takes in a user's query and then it'll spit out the top search results for that query coming back to the main script.

The first thing we do is, I'll Define the embedding model that we're going to use from the sentence Transformers Library. So by default, the library will download the model when it's Run for the first time.

But here to avoid that I just save the model locally so what that looks like is we have the main Python file that we were just looking at and then I have this folder called data and in it, I have this folder which contains all the files for this embedding model and then we have this parquet file which has all the data about my YouTube articles. We can then load the model from a file like this. We can load the article index using this line of code and this is the same way we did it in the previous article and then we can import the Manhattan distance from sklearn which I did like this and again.

Common Pitfalls and Solutions When Building Docker Images

Since I talked about this at length in the previous article. I'm not going to get into the details of how the Search tool works here. But you can check out that article if you're interested so everything we just have nothing to do with the API. This was just the implementation of that search function to create the API. We create this object called an app and it's this fast API object and then we can simply create these API operations.

Here I'm strictly defining these get requests which allow users to send requests to the API and receive back responses. The other most common one is a put request which is often used to send data to an API and load it in the back end. For example, if we wanted to update this parquet file in some way we could use a put request to do that. Anyway, here I Define three operations and that's done using this syntax here, where we have this decorator.

What's Happening Here? This is where we're saying that we wanted to find this get request at this endpoint here for the API and it's going to operate based on this Python function. This is a common practice where you have the root endpoint as a health check so it doesn't take in any input parameters, but anytime someone calls this m point they'll receive back this string response of health check.

Running Your Docker Container Local

Testing Your Machine Learning API on a Local Docker Container

Similar thing here so I created another endpoint called info which just gives some information about the API. So it doesn't take any inputs but it returns back the name which I called YT search and I have a description for it which is search API for shots of Levy's YouTube articles. This one's not completely necessary but you can imagine, that if you have multiple users using this API and maybe you have multiple APIs having an info endpoint can be helpful. But the one we care most about is this search endpoint.

What's Happening? Here we're defining this search function that takes in a query from the get request and then it'll pass it into this function to return search result indexes defined in this function. piy file. It'll pass it in as well as the article index the embedding model and the distance metric and then we can use the output of this function to return the search results and so a lot of fanciness here.

Maybe it's not super easy to read. But what's happening is we use the select method to pick out the title and article Columns of the data frame. For the article index, we use this collection method. After all, we didn't actually load the data frame into memory because we used scan paret instead of read paret and then once we load this in we pick out the indexes from this search result and then finally we convert that data frame to a dictionary that dictionary will have two Fields one corresponding to the title and the other field corresponding to the article IDs.

Troubleshooting: Solving the Python Path Issue in Docker

It'll have up to five search results for each field that's the code to make the API super easy. It's great for me as someone who's very comfortable with Python and knows very little about a lot of other programming languages especially ones that have to do with web development. But now we can run this API on my local machine and we can interact with it. So the way that looks is to make sure we're in the right direction.

We see we have this app folder and then we can go into the app and we see that we have our main.py file to run that. We can do fast API Dev main.py. All right so now it's running on port 8000 had to make a couple of changes to the main file. So I had to add this app in front of functions and then I had to remove the app from these paths. Here it was running from a different directory than a previous version of the code, but that should be working now.

Now we'll see that we have this little URL here which we can copy and then I have a notebook here that allows us to test the API the URL is already here and it's at Port 8000 by default and then we want to talk to the search endpoint of the API that we created. So we can actually run this you have this query called text embeddings simply explain and then we can pass that into our API.

Pushing Your Docker Image to Docker Hub

Creating and Tagging Docker Images for Public Repositories

We can see if we get a response. So it took about 1 second. So that's actually pretty long but maybe if I run it again, it'll be faster. Yeah, so maybe that first one is just slow, but run it a second time 76 milliseconds and then we can see the search results here. Just kind of taking a step back the response in its raw form looks like this. So it's just a text in the JSON format which is basically a dictionary and then we can use this JSON library to convert that text into a proper Python dictionary and then we can access the different fields of it like this.

So these are all the titles from the top five search results and then we can also look at the article IDs. So that looks like that. Now we've confirmed the API is working locally. So coming back to the slides. The next thing we want to do is to create a Docker image for the API.

The steps to make a Docker image from a fast API API are available in their documentation and it's a few simple steps. We'll create an app directory a folder called app. We'll create an empty init.py file and then we'll create our main.py file. We've actually already done this if we go back we see that the app directory already exists we have the main.py file and we already have the init.py file taking one step out of that directory.

We see that this app folder is in another folder with a few other files, so we have this requirement. Text file which is shown here. This is just your typical requirements file that you might have for any kind of Python code. Here you can see we have all the different libraries we used in the main.py file. So we have fast API polers sentences. Transformers Psychic Learn and Numpy. We also have this Docker file which is essentially the instructions for creating the Docker image.

How to Push Docker Images to Docker Hub for Easy Deployment

This consists of a few key steps. We start by importing a base image. There are hundreds of thousands of Docker images available on the Docker Hub. The one we're importing here is the official Python image version 3.10 so we can see that on the Docker Hub. Here it's an official image so I guess it's by Docker, it's called Python. Then they're all these tags so these are all different versions of this image. We did 3.10 which I guess is going to be this one.

The next thing we're going to do is change the working directory you know. Imagine you just installed Linux on a machine or something so the working directory is just going to start as the root and then we can change the working directory to this folder called code.

Next, we can copy the requirements file into the docker image so we take the requirements file from our local directory here and put it onto the images directory this code directory here and then once we've moved the requirements file onto the image, we'll install all the requirements. We do that with this line of code here and then I have this line of code to add this code app to the Python path. This might not be necessary because we actually changed this main.py file. So I'm going to actually try to comment this out and see if it still works.

Next, we're going to add this app directory to the image. So we're going to move it from our local machine to this code subdirectory on the docker image. Finally, we define a command that will be run automatically. Whenever the container is spun up to build the docker image. We run the docker build. We specify the tag so we'll give it a name. I'll call it the YT search image test.

Deploying Your Dockerized API on AWS Elastic Container Service (ECS)

A Beginner’s Guide to AWS ECS: Fargate vs. EC2 Instances

Oh! I forgot this, we have to specify where the docker file is. So it's going to be in the current directory. So we do that and now it's building the docker image, okay! So now the image is done. Building you can see it took about maybe a minute to run the run times. Here the longest was installing all the Python libraries and so now if we go over to the docker desktop app, we see that this image is here under the images tab. So I have a previous version of the image but the one we just created is called YT search image- test and then we can actually run this image.

The way to do that is we can go. Let me clear it out so we do a Docker run and then specify the container name. I'll put the YT search container test and then we specify the port. We want it to run at 8080 yep that's what I put here in the docker file. Finally, we'll specify the image here's the code Docker run- d-- the name of the docker container, and then we specify the port and then the image. So we can run that that's not the right image name.

It is a YT search image test. Now the container is running locally. We can actually see that if we go to our image here and it says in use this is the container that is using that image alternatively. We can just go to this containers Tab and we can see all the containers saved locally. Here we can see that the container stopped running which means something went wrong and we can see that the folder with the model in it is not local. So it didn't run because the model folder wasn't on the python path.

We could add the data subdirectory to the Python path but alternatively, we can just go back to the main.py file and add the app to these directory names here the reason we need this is that we Define the working directory as code.

Step-by-Step: Deploying Your Docker Container on AWS ECS

All the code is going to run relative to this directory. Here that means if the Python script is looking for this model path, you have to put the app here because it's running from code. Alternatively, you could add this Python path thing here, but I don't want to do that to make this Docker file more simple now. Let's try to run it again so we can build the image. So that was nice and quick and then we'll run the container.

So now the container is running. Click this and indeed it is running successfully. So we can click on it and we can see that it's running at this URL. Here we can test this if we go over to the super notebook. We test the API locally. So now let's test the docker container running locally. So this should be the correct path so yep and so it's the same thing.

We have the URL and then we have the endpoint name. We're going to use the search operation. We'll Define a query and then we'll make an API call it ran slower because it's essentially talking to a different machine.

However, we can see that the API response is the same. We can similarly call the info endpoint or we can call the base endpoint as well we can see those generate different responses. Now that we've created the docker image. Next, we're going to push the image to the docker Hub. The reason we want to do this is that once the image is on the docker Hub it makes it easy to deploy to any cloud service that you like.

Configuring Security Groups for Your AWS ECS Deployment

How to Set Up Inbound Traffic Rules for Your API

We'll be using AWS's elastic container service, but the docker hub integrates with other Cloud providers. So not just AWS, but also gcp. I'm sure it also connects with Azure even though that's not something. I've checked to push the image to the docker Hub. So I already have one called YT search, but let's go ahead and create a new one from scratch. So we'll call this one YT search demo and then we'll say demo of deploy deploying semantic search for YouTube articles and the Docker Hub. Essentially we're going to create a new image and this is going to have the same name as that repository.n we'll leave it as public and we'll hit create reposit doesn't have a category. So let's just do that. I'll call it machine learning AI. So that's it the repository is made now. What we can do is we can go back to our terminal. We can actually list out the docker images like this. We can see that these are all the images that I have saved locally. What we want to do is push this one to the Docker hub. The first thing we need to do is, Tag the image so it matches the repository name on the Docker Hub. Essentially we're going to create a new image and this is going to have the same name as that repository. We just created it. If we go back here, we see that the repo is called shahin which is my dockerhub username, and then the name of the repo we just created YT search- demo.

Best Practices for Securing Your AWS ECS Instances

The next thing we need to put is the name of the local image. So here we had the YT search image- test. So I actually have it backward. We need to put the local image name first. So YT search image- test and then we need to put the dockerhub repo name. We've now created a new image which we can see here called shahin t/t search- demo and now we can just push it to the docker Hub.

So that's really easy so Docker pushes Shahin T YT search demo and then we can add a tag to it as well but that's not necessary. So it's using the default tag of the latest and now we can see that it's pushing up to the Docker Hub. So now it's done running if we go back to the docker Hub and hit refresh. We see now the image is here now that we've pushed the image to the docker hub.

The last step is we can now deploy a container on AWS using their elastic container service. The way to do that is we can go to our AWS account if you don't have an AWS account. You'll need to make one for this tutorial and then we can go to elastic container service. So we can just type in ECS and it should pop up once, we do that we'll see we come to a screen like this. We can see that I already have a cluster running but let's start one from scratch.

The first thing we can do is go over to task definitions and click this to create a new task definition. I'll call this one YT search demo. We'll scroll down to infrastructure requirements. We'll select AWS fargate as opposed to Amazon ec2.

Testing and Interacting with your Deployed API

Using Gradio to Test Your Machine Learning API on AWS

The first thing we can do is, go over to task definitions and click this to create a new task definition. I'll call this one YT search demo. We'll scroll down to infrastructure requirements. We'll select AWS fargate as opposed to Amazon ec2 instances and the upside of fargate is that you don't have to worry about managing the infrastructure. Yourself that's all handled behind the scenes and you can just worry about getting your container running and using it as a surface.

The next important thing is selecting the operating system and architecture. This will depend on the system that you're running for Mac. They use Arm 64 so that's the architecture and then Linux is the operating system of our image.

Next, we can go to the task size. I'll leave it at one CPU but I'll actually bump down the memory to 2 gb and then task roll. You can actually leave this as none if this is your first time running it and it'll automatically create this task roll called the ECS task execution rule. But since that already exists for me, I'll go ahead and click that now we're going to specify the container details. So I'll call this YT search container demo and then here we'll put the URL of the image which we grab from the Docker Hub.

I'll grab this and then I'll add the tag of the latest and we'll leave it as an essential container port number. We'll leave at 80, we'll leave all this as the same, we'll leave all this stuff as default. We won't add any environment variables. We won't add any environment files and then logging leave that all as the default.

Comparing Performance: Local vs. Cloud-Deployed APIs

We have a bunch of these optional things that we can set like a health check startup dependency, ordering container-type timeouts so on and so forth. We can also configure the storage, so there's ephemeral storage. So just like short-term, I'll just leave this as the default of 21. We can also add external storage using this add volumes thing which is good.

If you want to talk to some external data source there's this monitoring Tab and tags tab but not going to touch any of that. Just going to keep it super simple here and then I'm going to hit create. All right, so now the task definition has been successfully created. Now we can go here and we'll see if we have this new task definition. Now we can go over to clusters and we'll hit create cluster. I'll call this one YT search cluster demo again. We'll use AWS fargate for the infrastructure and then we won't touch the monitoring in the tags. Hit create now it's spinning up the cluster so this might take a bit. So now the cluster has been created.

We can click on this so we see that the cluster is running but there's nothing running on it. We can create services or we can create tasks services are good for the we service kind of like this API.

We're creating a task that is better for something that's more of a batch process that runs once at a predictable time increment but here, we'll create a service to do that. We'll click services and then click this create button. We're going to use the existing cluster. For the Wht search cluster demo click on launch type and we'll leave that as farmgate and the latest.

Automating the Data Pipeline for Continuous Integration

Why Automating Your Data Pipeline is Crucial for API Accuracy

We'll make the application type a service. We'll specify the family of the task definition and then we can give a service name and call it, the YouTube search API demo. We'll leave the service type as a replica. We'll have the desired tasks as one deployment option. We'll leave those as default deployment failure detection leave that as default. We won't do service connect. I'm not sure what that is service Discovery networking.

We'll actually leave all this the same. We'll use an existing Security Group and then we can enable load balancing, if we like but I won't do that here service auto scaling. We can automatically increase the number of containers that are running or decrease the number of containers that are running again. We can configure the data and whatnot but we're not going to touch any of that and we'll just hit create so now it's deploying the search API.

So the API has been successfully deployed it took like 5 minutes or something. But now if we scroll down and click this YouTube search API demo something like this will pop up and we can go over to tasks and we can click this task here. We can see that it'll have a public IP address. So what we can do is copy this public IP and then I have another piece of code here and we'll just paste in the public IP and then we'll make an API call.

So just 100 milliseconds to make the API call. It's actually faster making the API call to AWS than locally which is pretty interesting so this ran just fine here. But one thing I had to do yesterday to get this working was go to the YouTube search API demo click on configuration and networking and then go down to security groups that'll open this VPC dashboard thing and I had to add this rule that allowed all inbound traffic from my IP address specifically.

So if you do that it'll you know have some default Security Group and then you'll hit edit inbound rules and then you can add an additional rule that allows all inbound traffic from my IP. You can also have custom IPS which you specify one by one. You can have any IP version 4 or any IP version 6 so it wasn't working for me. But once I added this inbound rule it was working. Just fine now that the API is deployed on AWS it makes it a lot easier to integrate this functionality.

This Search tool into a wide range of applications to demonstrate that I'm going to spin up a gradio user interface that can talk to the API I'll just run this whole thing and this is essentially the same thing that I walked through in the previous article of the series. So if you're curious about the details be sure to check that out but now we can see that this user interface got spun up we can search something like full stack data science and we see that search results are coming up this is the great thing about running the core functionality on AWS. Now we just have this lightweight front end that can interact with the API and return search results through a web interface. So you can see that the other articles in this series are popping up in the search results we can search other things like finetuning language models and I had a typo, but it doesn't matter and we can see all the content on fine-tuning and large language models pops up.

I'll just call out that all the code I walked through is freely available on GitHub. So if you go to my YouTube blog repository and the full stack data science subfolder you'll see that all this code is available in this ml engineering folder and then you can check out other articles in this series and all the medium articles associated with this series.

This was supposed to be the last article of this series but then I got a comment from Cool Worship 6704 on my article on building the data pipeline for this project they were asking how would you automate this entire process and so that's an excellent question and it wasn't something. I originally was going to cover but since you have this question here, I assume other people have the same question so just to recap what we did here we took the Search tool wrapped it in an API put that into a Docker container and deployed it onto AWS.

So now users and applications can interact with the Search tool but one limitation of how coded things here is that the article index of the articles that are available in the search API is static. It's a snapshot from a couple of weeks ago when I made the article on making data pipelines. So the OB virus.

Next Steps: Exploring More Advanced Containerization and Deployment Techniques

The next step here would be to create another container service thatautomates the whole data Pipeline on some sort of time. Cadence whether it's every night or every week or whatever it might be and then feed the results of that process and update the search API so that new articles will be populated in the search tool so that's going to be the focus of the next article of this series.

Conclusion: Enhancing Your Machine Learning Projects with Docker and AWS

Next Steps: Exploring More Advanced Coate different responses. Now that we've created the docker image. Next, we're going to push the image to the docker Hub. The reason we want to do this is that once the image is on the docker Hub it makes it easy to deploy to any cloud service that you like.ntainerization and Deployment Techniques

So that brings us to the end this article was a lot more Hands-On than a lot of my other content. I'm experimenting with a new format, so let me know what you think in the comment section, below if you want me to dig deeper into any of the tools or Technologies discussed in this article. Let me know and I can make follow-up articles on those topics as always thank you so much for your time and thanks for reading.

How to Deploy ML Solutions with FastAPI, Docker, & AWS - icoversai

Table of Contents