Integrations: FastAPI for Fast SaaS Application Deployments

This blog shows why PerceptiLabs added the FastAPI deployment target, and to inspire how you might build on that output.

Integrations: FastAPI for Fast SaaS Application Deployments
PerceptiLabs and FASTAPI Integrations.

Many of you from our developer community have asked how to deploy and use your model once it has been exported from PerceptiLabs. So, we added a FastAPI deployment target to PerceptiLabs' Deploy View back in the 0.13 release. This feature not only deploys your TensorFlow model, but also generates a fully-working app that you can use to verify and run your model for real-world inference within minutes.

We chose FastAPI as the foundation, because it's a high-performance, web framework for building RESTful APIs, not to mention the third most favourite web framework in 20211. Let's take a closer look at what this gives you.

Why FastAPI?

The FastAPI deployment target exports a TensorFlow model, and generates a Python server app built on FastAPI, along with an example Python client app that communicates with that server. We chose this architecture because it demonstrates how you might host a model via a server for inference, receive data from a remote client app, and then return predictions back to that client app – a typical pattern found in today's SaaS DL solutions. While any user of PerceptiLabs can easily deploy to this target with the click of a button, the code generated by PerceptiLabs will require some programming experience to understand and customize.

How the FastAPI Deployment Target Works in PerceptiLabs

The FastAPI server hosts the model and exposes a /predict endpoint that client apps can invoke to run inference. When you run the server, it renders the API in a browser as shown in Figure 1:

Figure 1: FastAPI GUI Generated by PerceptiLabs.
Figure 1: FastAPI GUI Generated by PerceptiLabs.

The client app loads a CSV file (also generated by the deployment process) that lists a series of data files. The client then loads raw data from the data files listed, and sends it to the /predict endpoint in the request's payload. The server feeds that data through the TensorFlow model it's hosting, and returns a JSON response containing predictions corresponding to each data item:

{'labels': ['yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes', 'no', 'no']}

Note that after you've deployed this target once, you may want to use the TensorFlow or TensorFlow Lite targets for subsequent exports of your model in Perceptilabs, since you probably don't want to keep regenerating the server and app code. You can then overwrite the previous model in your FastAPI folder with your newly-generated model files.

FastAPI Use Cases

The server and client apps generated by the FastAPI deployment target, provide a good foundation for use cases that use real-world inference, such as:

  • Your own cloud-based SaaS solution: The sample server's functionality to host a model and encapsulate requests for predictions in a simple REST API, can be the basis for your own cloud-based SaaS solution. For example, you could transform the generated code into a cloud service that performs inference from data uploaded by devices. In addition to returning predictions back to those edge devices, your server code could also perform analytics such as detecting the quality of images uploaded, or recording simple metrics, like how many times your endpoints are called in a given time frame. Upon receiving the REST response with predictions from the cloud, a device could then perform additional actions, such as activating another device (e.g., actuator), or invoking other endpoints that you've added to your service (e.g., to trigger an alert).
  • Call out to a cloud service for inference: Similarly, the client app is useful for showing how to call out to a cloud service for inference. For example, you could modify the app to run on an edge device to collect camera or video data. That app could collect multiple images or frames, package them into a payload, and then include that payload when invoking the endpoint for cloud-based inference. Those predictions can then be analyzed on the server, which in turn, sends out alerts and/or provides dashboards for insights.
  • Capture timing metrics: Another interesting enhancement you could do is to add code to the server and app to capture timing metrics. For example, you could wrap the inference call on the server, as well as the POST request in the client, with system calls to collect the time both before and after the respective invocations. You could then calculate the timing deltas to test your inference speed and round-trip request/response timing.

FastAPI Learning Resources

For more information be sure to check out the FastAPI Deployment Guide in PerceptiLabs' documentation. It provides additional details on what's exported and how to run the server and app.

Give it a try, and let us know how it worked for you.

And if you don't already have our free version of PerceptiLabs, be sure to check out our Quickstart Guide.

1https://en.wikipedia.org/wiki/FastAPI