Serving

Q. What is Serving in the mlangles mlops Platform?

A. Serving is the process of sending data to a deployed model to generate predictions.

Q. When can I start serving data to my model?

A. You can start once the model has been successfully deployed via the Model Hub.

Q. What are the ways to serve data to a deployed model?

A. There are two options:

API Requests – Directly send data to the model’s unique API endpoint.
Serving Module – Use the platform’s interface to perform:
1. Online Serving – For single data points.
2. Batch Serving – For large-scale datasets on a scheduled basis.

Q. When should I use Online Serving?

A. Use Online Serving when you need instant predictions for individual data points, such as interactive applications or real-time decision-making systems.

Q. When should I use Batch Serving?

A. Use Batch Serving when working with large datasets that need to be processed in bulk, such as periodic data uploads, scheduled scoring jobs, or offline analytics.

Q. How do I perform Online Serving?

A. The steps are :

Select the Project that contains the deployed model.
Choose the specific Deployed Model.
Enter the required input data in the prompted fields.
Click Submit.
View the prediction directly on the platform.

Q. When should I use Online Serving?

A. Use it for:

Quick validation and sanity checks.
Testing the model’s behavior with new, unseen data before scaling.

Q. How do I create a new Batch Serving job?

A. Steps for creating new batch job :

Click New Job.
Provide the following details:
1. Job Name – Unique identifier.
2. Project – Project with the deployed model.
3. Data Source – Storage location of raw input data.
4. Pipeline – The pipeline originally used for training the model.
5. Pipeline Steps (Optional) – Transformations to apply before prediction.
6. Model – Deployed model to use.
7. Instance – Compute instance for workload execution.
8. Schedule Time – Frequency (once, daily, weekly).

Q. What happens after creating a Batch Serving job?

A. The platform automatically executes the job at the scheduled time, processes the dataset, and stores prediction outputs for review or downstream applications.