To deploy models with TensorFlow Serving, follow these steps:
TensorFlow Serving is typically installed via Docker for ease of use. Install Docker first, then pull the TensorFlow Serving image:
docker pull tensorflow/serving
Your model must be saved in the SavedModel format (a directory containing saved_model.pb and variables). Example:
import tensorflow as tf
# Example: Save a simple Keras model
model = tf.keras.Sequential([...]) # Your model architecture
model.compile(...)
model.fit(...)
model.save("my_model") # Saves in SavedModel format
The exported model will have a structure like:
my_model/
├── saved_model.pb
└── variables/
├── variables.data-00000-of-00001
└── variables.index
Start the server using Docker, pointing it to your model directory:
docker run -p 8501:8501 \
--mount type=bind,source=$(pwd)/my_model,target=/models/my_model \
-e MODEL_NAME=my_model -t tensorflow/serving
8501: REST API port (for HTTP requests).8500: gRPC port (for high-performance RPC).MODEL_NAME: The name under which the model is served.Send a JSON request to http://localhost:8501/v1/models/my_model:predict:
curl -d '{"instances": [[1.0, 2.0, 3.0]]}' \
-H "Content-Type: application/json" \
-X POST http://localhost:8501/v1/models/my_model:predict
For lower latency, use a gRPC client (requires protobuf definitions).
TensorFlow Serving automatically handles model versioning. Place different versions in subdirectories:
models/
└── my_model/
├── 1/ (version 1)
└── 2/ (version 2)
The server serves the latest version by default but can be configured to serve specific versions.
For high availability and scalability, deploy TensorFlow Serving on Kubernetes or a cloud-based container service.
tensorflow/serving:latest-gpu) for faster inference.TensorFlow Serving ensures low-latency, high-throughput, and version-controlled model deployment. For cloud-native deployments, Tencent Cloud’s container and serverless solutions can optimize scalability.