Building ResponsesAgent-Based LLM Agents on MLflow

By: Patterson Consulting Engineering Team

In this guide, we’ll walk through how ResponsesAgent offers a clean and typed development interface and why packaging agents as MLflow models unlocks significant governance and deployment benefits.

As we mentioned in a previous article in this series, the next wave of GenAI applications will move the analysis out of ChatGPT and Claude Code and directly embed the analysis into applications, reports, and dashboards directly.

By the end of this article you'll come away with an understanding of how to quickly prototype a new agent that is ready for deployment on MLflow 3.6.

What is ResponsesAgent?

ResponsesAgent is MLflow’s native interface for building, packaging, and operating enterprise-grade LLM agents. Organizations adopt ResponsesAgent interface primarily for three reasons:

Standardization through MLflow’s “models from code” packaging, which captures instructions, tools, and environments for fully reproducible deployments.
Operational reliability via model versioning, registry workflows, access controls, observability, and trace-level telemetry.
Application-grade integration enabling agents to be invoked through REST, batch jobs, workflow engines, dashboards, or decision-intelligence applications.

By placing generative agents inside the governed MLflow ecosystem—including Unity Catalog—enterprises gain predictable lineage, environment management, permissions, and deployment guarantees that standalone LLM frameworks do not provide.

Running agents through MLflow also unlocks structural advantages for quality, governance, and cost control, with automatic capture of traces, tool calls, intermediate reasoning, token usage, and deployment metadata. The ResponsesAgent integrates seamlessly with MLflow’s operational capabilities:

Experiment tracking and artifact logging
Reproducible experimentation workflows
Model Registry for versioning, governance, and lifecycle management
Deployment to real-time serving environments

Together, these capabilities allow enterprises to register, version, compare, and roll back agent implementations with the same rigor used for classical ML models, accelerating the transformation of raw data and unstructured context into reliable, high-quality decisions at scale.

ResponsesAgent as a Wrapper for Agent Frameworks

ResponsesAgent acts as a unifying wrapper that cleanly embeds agents built with LangChain, DSPy, custom Python classes, or other tool-orchestration frameworks into the MLflow runtime. Rather than rewriting agent logic, developers encapsulate their code within the ResponsesAgent class, allowing MLflow to standardize the agent’s inputs, outputs, tool-call semantics, and conversational behavior.

ResponsesAgent, MLflow, and Databricks

Adopting this agent wrapper allows you to also take advantage of deploying to the Databricks cloud platform, which we'll cover in a future article.

The ResponsesAgent wrapper ensures agents remain compatible with Databricks AI Playground, Mosaic AI Model Serving, and MLflow’s evaluation and tracing infrastructure without altering the underlying framework-specific implementation.

Additionally the ResponsesAgent wrapper allows a custom code agent deployed on Databricks to be used with their multi-agent supervisor system.

This abstraction also enables teams to build multi-agent systems with heterogeneous frameworks while maintaining a uniform operational surface area. Every agent—regardless of how it was built—exposes a consistent schema, supports OpenAI-compatible request formatting, and inherits MLflow’s packaging and governance capabilities.

ResponsesAgent compatibility with the OpenAI Responses API enables drop-in integration with existing applications that already use OpenAI endpoints. Combined with MLflow’s experiment management, auditability, and deployment integrations, these features provide a reliable way to evolve prototypes into production-grade conversational agents.

Structured message and chat history handling
Tool and function calling with intermediate outputs
Multi-agent orchestration support
OpenAI API-compatible input/output schema
Token and performance usage tracking
Full MLflow tracking, versioning, and serving capabilities

As a result, developers focus on business logic, tool integration, and domain-specific reasoning, while MLflow provides the reliability, reproducibility, and controls needed to safely operate LLM agents inside enterprise environments.

Building a Simple `ResponsesAgent`

The simplest possible MLflow 3.6 ResponsesAgent must implement only two required methods: load_context() and predict().

Together these two methods form the minimal contract that MLflow expects when packaging and deploying an agent using the Responses flavor. We start with the first method load_context():

load_context(self, context)

This method is executed once when the agent is loaded by the MLflow model server. It receives a ModelContext containing artifacts, configuration files, and metadata. In the simplest implementation, it performs no operation.

def load_context(self, context):
    pass  # no-op for the minimal agent

Then we have the main "workhorse" method of the class, predict():

predict(self, model_input)

This method is invoked each time the deployed agent endpoint receives a request. It must return a response matching MLflow’s Responses API format (OpenAI-style semantics). Even a static literal response is valid for the simplest implementation.

def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        return ResponsesAgentResponse(
                    output=[
                        # "id" can be any stable string you choose for this output item
                        self.create_text_output_item(text="hello world", id="msg_1"),
                    ]
                )

Agent Accelerator

For the Databricks Platform

A 4-week engagement that delivers a custom Decision Intelligence agent on Databricks—grounded in a clear decision owner, explicit business rules, and governed Unity Catalog data models—then deployed to a Databricks Agent Endpoint for testing and production rollout.

Learn More

Example: The Minimal ResponsesAgent

A valid MLflow ResponsesAgent requires only these two methods:

import mlflow
from mlflow.pyfunc import ResponsesAgent

from mlflow.types.responses import (
    ResponsesAgentRequest,
    ResponsesAgentResponse
)

class MyMinimalAgent(ResponsesAgent):

    def load_context(self, context):
        pass

    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        return ResponsesAgentResponse(
                    output=[
                        # "id" can be any stable string you choose for this output item
                        self.create_text_output_item(text="hello world", id="msg_1"),
                    ]
                )

With only these methods implemented, the agent can be logged using mlflow.agents.log_model(...), registered in the Model Registry, deployed to the MLflow model server, and invoked via REST, batch processes, workflow engines, or any application integrating with MLflow models.

ResponsesAgent and Compatibility

The ResponsesAgent introduces a powerful and standardized approach to orchestrating LLM-driven agents with structured inputs and outputs. It does this by processing well-defined interaction data that aligns with modern chat completion standards:

Messages: Handles full conversation context with system, user, and assistant roles
Tool Calls: Supports structured function execution with validated parameters
Usage Tracking: Monitors token consumption and runtime performance
Metadata: Captures configuration details and contextual state for observability

ResponsesAgent is also fully compatible with OpenAI’s Chat Completions API — ensuring seamless integration with existing applications and tools. This includes aligned request formats, response schemas, endpoints, and behavior expectations.

OpenAI's Chat Completions API

OpenAI’s Chat Completions API is a hosted service that lets applications send a structured list of messages—system, user, and assistant—to a large language model and receive a generated response. It is the API behind most modern conversational and reasoning-style interactions with OpenAI models. Developers use it to run instruction following, multi-turn dialogues, tool-calling, and structured output generation. In short, it is the primary interface for invoking OpenAI models to produce conversational, analytical, or task-oriented responses in applications.

Up Next: Deploy to MLFlow

LLMs alone do not deliver enterprise impact.

The combination of prompt logic, LLMs, enterprise data models, and integrated workflow logic are what convert enterprise information into decisions.

By building agents with ResponsesAgent and registering them in MLflow, organizations gain the reliability, structure, safety, and scalability required for real production workloads. Agents are quickly becoming core business infrastructure and MLflow is the platform that makes them dependable.

In the next article we'll take a look at how to package and register a ResponsesAgent for deployment on MLflow.

Next in Series

Deploying Agents to MLflow

This article explains how to turn a Python-based MLflow ResponsesAgent into a production-ready, versioned intelligence service by packaging it as a standardized model artifact, registering it in the MLflow Model Registry, and serving it reliably across environments.