Skip to main content
This endpoint is the most commonly used for chat-based interactions with LLMs.

Request Body

FieldTypeRequiredDescription
modelstringModel name (e.g., “gpt-4”, “claude-3-sonnet”)
messagesarrayArray of message objects
temperaturenumberSampling temperature (0-2)
max_tokensnumberMaximum tokens to generate
streambooleanEnable streaming response

Example

{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "Write a Python function to check if a number is prime"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 500
}

Response

{
  "id": "chat-123",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "def is_prime(n):\n    if n <= 1:\n        return False\n    for i in range(2, int(n**0.5) + 1):\n        if n % i == 0:\n            return False\n    return True"
      },
      "finish_reason": "stop",
      "index": 0
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 25,
    "total_tokens": 40
  }
}

Streaming Example

Set "stream": true to receive real-time responses:
curl https://api.pipellm.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'