How to deploy models with TensorFlow Serving?

To deploy models with TensorFlow Serving, follow these steps:

1. Install TensorFlow Serving

TensorFlow Serving is typically installed via Docker for ease of use. Install Docker first, then pull the TensorFlow Serving image:

docker pull tensorflow/serving

2. Export Your Trained Model

Your model must be saved in the SavedModel format (a directory containing saved_model.pb and variables). Example:

import tensorflow as tf

# Example: Save a simple Keras model
model = tf.keras.Sequential([...])  # Your model architecture
model.compile(...)
model.fit(...)
model.save("my_model")  # Saves in SavedModel format

The exported model will have a structure like:

my_model/
  ├── saved_model.pb
  └── variables/
      ├── variables.data-00000-of-00001
      └── variables.index

3. Run TensorFlow Serving with Docker

Start the server using Docker, pointing it to your model directory:

docker run -p 8501:8501 \
  --mount type=bind,source=$(pwd)/my_model,target=/models/my_model \
  -e MODEL_NAME=my_model -t tensorflow/serving

8501: REST API port (for HTTP requests).
8500: gRPC port (for high-performance RPC).
MODEL_NAME: The name under which the model is served.

4. Make Predictions

REST API (HTTP) Example

Send a JSON request to http://localhost:8501/v1/models/my_model:predict:

curl -d '{"instances": [[1.0, 2.0, 3.0]]}' \
  -H "Content-Type: application/json" \
  -X POST http://localhost:8501/v1/models/my_model:predict

gRPC Example

For lower latency, use a gRPC client (requires protobuf definitions).

5. Managing Multiple Models & Versions

TensorFlow Serving automatically handles model versioning. Place different versions in subdirectories:

models/
  └── my_model/
      ├── 1/  (version 1)
      └── 2/  (version 2)

The server serves the latest version by default but can be configured to serve specific versions.

6. Scaling & Production Deployment

For high availability and scalability, deploy TensorFlow Serving on Kubernetes or a cloud-based container service.

Recommended (if using Tencent Cloud): Deploy the Docker container on Tencent Cloud Container Service (TKE) or Tencent Cloud Elastic Kubernetes Service (EKS) for managed orchestration.
GPU Acceleration: Use GPU-enabled Docker images (tensorflow/serving:latest-gpu) for faster inference.

Example Use Case

Real-time recommendation systems (serving deep learning models).
Batch inference APIs (via REST/gRPC).
A/B testing (serving multiple model versions).

TensorFlow Serving ensures low-latency, high-throughput, and version-controlled model deployment. For cloud-native deployments, Tencent Cloud’s container and serverless solutions can optimize scalability.