Request Configuration
Theresponses.Request struct allows you to fine-tune your LLM calls with several parameters:
| Parameter | Type | Description |
|---|---|---|
| Instructions | *string | System-level instructions (System Prompt). |
| Input | InputUnion | The user prompt (string or list of messages). |
| Tools | []ToolUnion | Definitions for function calling. |
Parameters
TheParameters struct contains additional configuration options:
| Parameter | Type | Description |
|---|---|---|
| Temperature | *float64 | Sampling temperature (0.0 to 2.0). Higher values make output more random. |
| MaxOutputTokens | *int | Maximum number of tokens to generate in the response. |
| TopP | *float64 | Nucleus sampling parameter (0.0 to 1.0). Controls diversity via nucleus sampling. |
| TopLogprobs | *int64 | Number of most likely tokens to return with their log probabilities. |
| Text | *TextFormat | Structured output format configuration (JSON schema). See Structured Output. |
| Stream | *bool | Enable streaming responses. When true, responses are returned incrementally. |
| Reasoning | *ReasoningParam | Reasoning configuration for models that support chain-of-thought reasoning. See Reasoning. |
| Include | []string | Additional data to include in the response (e.g., "reasoning.encrypted_content"). |
| Metadata | map[string]string | Custom metadata to attach to the request. |
| MaxToolCalls | *int | Maximum number of tool calls allowed in a single response. |
| ParallelToolCalls | *bool | Allow parallel execution of multiple tool calls. |
| Store | *bool | Whether to store the request and response. |
| Background | *bool | If true, the request is processed in the background and the response is not returned immediately. |
Reasoning Parameters
TheReasoningParam struct configures reasoning behavior:
| Parameter | Type | Description |
|---|---|---|
| Summary | *string | Reasoning summary level: "auto", "concise", or "detailed". |
| Effort | *string | Reasoning effort level: "none", "minimal", "low", "medium", "high", "xhigh". |
| BudgetTokens | *int | Maximum tokens to allocate for reasoning steps. Not used for OpenAI. |
Text Format (Structured Output)
TheTextFormat struct enables structured output using JSON schema:
Response Format
The Responses API returns data in two formats depending on whether streaming is enabled:Non-Streaming Response
WhenStream is false or not set, the API returns a complete Response object:
| Field | Type | Description |
|---|---|---|
| ID | string | Unique identifier for the response. |
| Model | string | The model that generated the response. |
| Output | []OutputMessageUnion | Array of output messages. Can contain text messages, function calls, reasoning, image generation calls, or web search calls. |
| Usage | *Usage | Token usage statistics for the request. |
| Error | *Error | Error information if the request failed. |
| ServiceTier | string | The service tier used for this request. |
| Metadata | map[string]interface{} | Custom metadata attached to the response. |
Output Message Types
TheOutput field contains an array of OutputMessageUnion, which can be one of the following types:
-
OutputMessage: Standard text message with contentID: Unique message identifierType: Always"message"Role: Message role ("user","system", or"developer")Content: Array of content parts (typically text)
-
FunctionCallMessage: Function/tool call from the modelType: Always"function_call"ID: Unique function call identifierCallID: Call identifier for trackingName: Name of the function to callArguments: JSON string containing function arguments
-
ReasoningMessage: Reasoning content from models that support chain-of-thoughtType: Always"reasoning"ID: Unique reasoning identifierSummary: Array of summary text contentEncryptedContent: Optional encrypted reasoning content (when requested viaInclude)
-
ImageGenerationCallMessage: Image generation requestType: Always"image_generation_call"ID: Unique image generation identifierStatus: Generation statusResult: Base64-encoded image data
-
WebSearchCallMessage: Web search requestType: Always"web_search_call"ID: Unique web search identifierAction: Search action details
Usage Information
TheUsage object provides token consumption details:
| Field | Description |
|---|---|
| InputTokens | Total number of input tokens processed. |
| InputTokensDetails.CachedTokens | Number of cached tokens (if caching is enabled). |
| OutputTokens | Total number of tokens generated in the response. |
| OutputTokensDetails.ReasoningTokens | Number of tokens used for reasoning (if applicable). |
| TotalTokens | Sum of input and output tokens. |
Error Handling
If an error occurs, theError field contains:
Streaming Response
WhenStream is true, the API returns a stream of ResponseChunk objects via Server-Sent Events (SSE). Each chunk represents a part of the response as it’s generated.
Chunk Types
TheResponseChunk union type can contain various chunk types that indicate different stages of the response:
Response Lifecycle Chunks:
response.created: Initial response object createdresponse.in_progress: Response generation in progressresponse.completed: Response generation completed
output_item.added: A new output item (message, function call, etc.) was addedoutput_item.done: An output item is complete
content_part.added: A new content part was added to a messagecontent_part.done: A content part is completeoutput_text.delta: Incremental text delta (new text fragment)output_text.annotation.added: A text annotation was addedoutput_text.done: Text generation is complete (includes full accumulated text)
function_call.arguments.delta: Incremental function call argumentsfunction_call.arguments.done: Function call arguments are complete
reasoning_summary_part.added: A new reasoning summary part was addedreasoning_summary_part.done: A reasoning summary part is completereasoning_summary_text.delta: Incremental reasoning summary textreasoning_summary_text.done: Reasoning summary text is complete
image_generation_call.in_progress: Image generation startedimage_generation_call.generating: Image is being generatedimage_generation_call.partial_image: Partial image data available
web_search_call.in_progress: Web search startedweb_search_call.searching: Search in progressweb_search_call.completed: Search completed
Streaming Example
When streaming, chunks are delivered in this general order:response.created- Response object initializedoutput_item.added- First output item (e.g., a message) addedcontent_part.added- Content part added to the messageoutput_text.delta- Text deltas streamed incrementally (multiple chunks)output_text.done- Text generation complete (contains full text)content_part.done- Content part completeoutput_item.done- Output item completeresponse.completed- Response generation finished (includes final usage stats)
type: The chunk type identifiersequence_number: Ordering number for the chunk- Relevant data fields for that chunk type
Processing Streaming Responses
To process streaming responses, you’ll receive chunks via a channel (Go) or SSE stream (HTTP). Each chunk should be handled based on its type:- Text deltas: Accumulate
output_text.deltachunks to build the complete text - Function calls: Accumulate
function_call.arguments.deltachunks to build complete arguments - Usage stats: Available in
response.completedchunk - Final text: Available in
output_text.donechunk’stextfield
Supported Providers
The Responses API supports the following LLM providers:| Provider | Text | Image Gen | Image Proc | Tool Calls | Reasoning | Streaming | Structured Output |
|---|---|---|---|---|---|---|---|
| OpenAI | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Anthropic | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Gemini | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |