Kimi Chat Completion API Application and Usage

Kimi is a very powerful AI dialogue system that can generate smooth and natural replies in just a few seconds by inputting prompt words. Kimi provides amazing intelligent assistance, greatly enhancing human work efficiency and creativity. This document mainly introduces the usage process of the Kimi Chat Completion API, allowing us to easily utilize the official Kimi dialogue features.

Application Process

To use the Gemini Chat Completion API, you can first visit the Kimi Chat Completion API page and click the “Acquire” button to obtain the credentials needed for the request:

If you are not logged in or registered, you will be automatically redirected to the login page inviting you to register and log in. After logging in or registering, you will automatically return to the current page. During the first application, there will be a free quota provided, allowing you to use the API for free.

Basic Usage

Next, you can fill in the corresponding content on the interface, as shown in the figure:

When using this interface for the first time, we need to fill in at least three pieces of content: one is authorization, which can be selected directly from the dropdown list. The other parameter is model, which is the category of the Kimi official model we choose to use. Here we mainly have 7 types of models; details can be found in the models we provide. The last parameter is messages, which is an array of our input questions. It is an array that allows multiple questions to be uploaded simultaneously, with each question containing role and content. The role indicates the role of the questioner, and we provide three identities: user, assistant, and system. The other content is the specific content of our question. You can also notice that there is corresponding code generation on the right side; you can copy the code to run directly or click the “Try” button for testing.

After the call, we find that the returned result is as follows:

{
  "id": "chatcmpl-b5d9e1b799c137e3",
  "object": "chat.completion",
  "created": 1770991864,
  "model": "kimi-k2.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": " Hello! How can I help you today?",
        "refusal": null,
        "reasoning_content": " The user has simply said \"Hello\". This is a straightforward greeting. I should respond in a friendly, helpful manner while being ready to assist with whatever they need next. Since there's no specific question or task yet, I'll acknowledge their greeting and ask how I can help.\n\nI should keep it:\n- Friendly and welcoming\n- Professional but warm\n- Open-ended to invite them to share what they need help with\n- Concise but not too brief\n\nPossible responses:\n1. \"Hello! How can I help you today?\"\n2. \"Hi there! What can I do for you?\"\n3. \"Hello! I'm ready to assist. What would you like to know or work on?\"\n4. \"Hey! Great to meet you. How can I be of service?\"\n\nI'll go with something warm and professional that invites them to share what they need. ",
        "reasoning": " The user has simply said \"Hello\". This is a straightforward greeting. I should respond in a friendly, helpful manner while being ready to assist with whatever they need next. Since there's no specific question or task yet, I'll acknowledge their greeting and ask how I can help.\n\nI should keep it:\n- Friendly and welcoming\n- Professional but warm\n- Open-ended to invite them to share what they need help with\n- Concise but not too brief\n\nPossible responses:\n1. \"Hello! How can I help you today?\"\n2. \"Hi there! What can I do for you?\"\n3. \"Hello! I'm ready to assist. What would you like to know or work on?\"\n4. \"Hey! Great to meet you. How can I be of service?\"\n\nI'll go with something warm and professional that invites them to share what they need. ",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 184,
    "total_tokens": 193,
    "prompt_tokens_details": {
      "cached_tokens_details": {}
    },
    "completion_tokens_details": {}
  }
}

The returned result contains multiple fields, described as follows:

id, the ID generated for this dialogue task, used to uniquely identify this dialogue task.
model, the selected Kimi official model.
choices, the response information provided by Kimi for the question.
usage: statistics on the tokens used for this Q&A.

Among them, choices contains Kimi’s response information, and the choices inside it shows the specific information of Kimi’s response, as can be seen in the figure.

As can be seen, the content field in choices contains the specific content of the Gemini reply.

Streaming Response

This interface also supports streaming responses, which is very useful for web integration, allowing the webpage to achieve a word-by-word display effect. If you want to return responses in a streaming manner, you can change the stream parameter in the request header to true. Modify as shown in the figure, but the calling code needs to have corresponding changes to support streaming responses.

After changing stream to true, the API will return the corresponding JSON data line by line, and we need to make corresponding modifications at the code level to obtain the line-by-line results. Python sample calling code:

import requests

url = "https://api.acedata.cloud/kimi/chat/completions"

headers = {
    "accept": "application/json",
    "authorization": "Bearer {token}",
    "content-type": "application/json"
}

payload = {
    "model": "kimi-k2.5",
    "messages": [{"role":"user","content":"Hello"}],
    "stream": True
}

response = requests.post(url, json=payload, headers=headers)
print(response.text)

The output effect is as follows:

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"content": "", "role": "assistant"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": " The", "reasoning": " The"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": " user said \"Hello\". This is a simple greeting", "reasoning": " user said \"Hello\". This is a simple greeting"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": ". I should respond in a friendly, welcoming manner", "reasoning": ". I should respond in a friendly, welcoming manner"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": ". Since this is the start of a conversation,", "reasoning": ". Since this is the start of a conversation,"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": " I should ask how I can help them today.\n\n", "reasoning": " I should ask how I can help them today.\n\n"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": "Let me craft a response:\n- Acknowledge the", "reasoning": "Let me craft a response:\n- Acknowledge the"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": " greeting\n- Offer assistance\n- Keep it warm", "reasoning": " greeting\n- Offer assistance\n- Keep it warm"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": " and professional\n\nSomething like: \"Hello! How", "reasoning": " and professional\n\nSomething like: \"Hello! How"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": " can I help you today?\" or \"Hi there", "reasoning": " can I help you today?\" or \"Hi there"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": "! What can I do for you?\"\n\nActually", "reasoning": "! What can I do for you?\"\n\nActually"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": ", looking at the context, this seems to be", "reasoning": ", looking at the context, this seems to be"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": " a general conversation starter. I'll keep it simple and", "reasoning": " a general conversation starter. I'll keep it simple and"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"reasoning_content": " open-ended to encourage them to share what they need", "reasoning": " open-ended to encourage them to share what they need"}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"content": " Hello! How can I help", "reasoning_content": " help with. ", "reasoning": " help with. "}, "logprobs": null, "finish_reason": null}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [{"index": 0, "delta": {"content": " you today?"}, "logprobs": null, "finish_reason": "stop"}], "usage": null}

data: {"id": "chatcmpl-952dd5e75583c4d1", "object": "chat.completion.chunk", "created": 1770992031, "model": "kimi-k2.5", "system_fingerprint": null, "choices": [], "usage": {"prompt_tokens": 9, "completion_tokens": 135, "total_tokens": 144, "prompt_tokens_details": {"cached_tokens_details": {}}, "completion_tokens_details": {}}}

data: [DONE]

You can see that there are many data in the response, and the choices within data are the latest response content, consistent with the content introduced above. choices are the newly added response content, and you can integrate it into your system based on the results. The end of the streaming response is determined by the content of data; if the content is [DONE], it indicates that the streaming response has completely ended. The returned data result has multiple fields, described as follows:

id, the ID generated for this dialogue task, used to uniquely identify this dialogue task.
model, the selected Kimi official website model.
choices, the response information provided by Kimi for the query.

JavaScript is also supported, for example, the streaming call code for Node.js is as follows:

const options = {
  method: "post",
  headers: {
    "accept": "application/json",
    "authorization": "Bearer {token}",
    "content-type": "application/json"
  },
  body: JSON.stringify({
    "model": "kimi-k2.5",
    "messages": [{"role":"user","content":"Hello"}],
    "stream": true
  })
};

fetch("https://api.acedata.cloud/kimi/chat/completions", options)
  .then(response => response.json())
  .then(response => console.log(response))
  .catch(err => console.error(err));

Java sample code:

JSONObject jsonObject = new JSONObject();
jsonObject.put("model", "kimi-k2.5");
jsonObject.put("messages", [{"role":"user","content":"Hello"}]);
jsonObject.put("stream", true);
MediaType mediaType = "application/json; charset=utf-8".toMediaType();
RequestBody body = jsonObject.toString().toRequestBody(mediaType);
Request request = new Request.Builder()
  .url("https://api.acedata.cloud/kimi/chat/completions")
  .post(body)
  .addHeader("accept", "application/json")
  .addHeader("authorization", "Bearer {token}")
  .addHeader("content-type", "application/json")
  .build();

OkHttpClient client = new OkHttpClient();
Response response = client.newCall(request).execute();
System.out.print(response.body!!.string())

Other languages can be rewritten accordingly; the principle is the same.

Multi-turn Dialogue

If you want to integrate multi-turn dialogue functionality, you need to upload multiple query words in the messages field. The specific examples of multiple query words are shown in the image below:

Python sample call code:

import requests

url = "https://api.acedata.cloud/kimi/chat/completions"

headers = {
    "accept": "application/json",
    "authorization": "Bearer {token}",
    "content-type": "application/json"
}

payload = {
    "model": "kimi-k2.5",
    "messages": [{"role":"assistant","content":"Hello! How can I help you today?"},{"role":"user","content":"What model are you?"}]
}

response = requests.post(url, json=payload, headers=headers)
print(response.text)

By uploading multiple query words, you can easily achieve multi-turn dialogue and receive responses like the following:

{
  "id": "chatcmpl-81e5f161ea077f5e",
  "object": "chat.completion",
  "created": 1770992310,
  "model": "kimi-k2.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": " I'm Kimi, an AI assistant made by Moonshot AI. I'm from the **K2.5** series.",
        "refusal": null,
        "reasoning_content": " The user is asking \"What model are you?\" They want to know which AI model I am.\n\n I should identify myself as Kimi, an AI assistant made by Moonshot AI. I should mention that I'm Kimi from the K2.5 series specifically, as that's the model currently deployed.\n\n Key points:\n - I am Kimi\n - Made by Moonshot AI\n - Currently Kimi K2.5 (or just say I'm part of the K2.5 series)\n - I should be helpful and direct\n\n I should not:\n - Claim to be a different model (like GPT-4, Gemini, etc.)\n - Be evasive about my identity\n - Make up version numbers that aren't correct\n\n The current model identity is Kimi K2.5 (though sometimes the exact series designation might vary by deployment, but K2.5 is the current flagship). I'll identify myself as Kimi, an AI assistant by Moonshot AI, and mention I'm from the K2.5 series.\n\n Simple, direct, accurate. ",
        "reasoning": " The user is asking \"What model are you?\" They want to know which AI model I am.\n\n I should identify myself as Kimi, an AI assistant made by Moonshot AI. I should mention that I'm Kimi from the K2.5 series specifically, as that's the model currently deployed.\n\n Key points:\n - I am Kimi\n - Made by Moonshot AI\n - Currently Kimi K2.5 (or just say I'm part of the K2.5 series)\n - I should be helpful and direct\n\n I should not:\n - Claim to be a different model (like GPT-4, Gemini, etc.)\n - Be evasive about my identity\n - Make up version numbers that aren't correct\n\n The current model identity is Kimi K2.5 (though sometimes the exact series designation might vary by deployment, but K2.5 is the current flagship). I'll identify myself as Kimi, an AI assistant by Moonshot AI, and mention I'm from the K2.5 series.\n\n Simple, direct, accurate. ",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 235,
    "total_tokens": 263,
    "prompt_tokens_details": {
      "cached_tokens_details": {}
    },
    "completion_tokens_details": {}
  }
}

As you can see, the information contained in choices is consistent with the basic usage content, which includes the specific content of Kimi’s responses to multiple dialogues, allowing you to answer corresponding questions based on multiple dialogue contents.

Error Handling

When calling the API, if an error occurs, the API will return the corresponding error code and message. For example:

400 token_mismatched: Bad request, possibly due to missing or invalid parameters.
400 api_not_implemented: Bad request, possibly due to missing or invalid parameters.
401 invalid_token: Unauthorized, invalid or missing authorization token.
429 too_many_requests: Too many requests, you have exceeded the rate limit.
500 api_error: Internal server error, something went wrong on the server.

Error Response Example

{
  "success": false,
  "error": {
    "code": "api_error",
    "message": "fetch failed"
  },
  "trace_id": "2cf86e86-22a4-46e1-ac2f-032c0f2a4e89"
}

Conclusion

Through this document, you have learned how to easily implement the official Gemini dialogue function using the Gemini Chat Completion API. We hope this document helps you better integrate and use the API. If you have any questions, please feel free to contact our technical support team.

​Application Process

​Basic Usage

​Streaming Response

​Multi-turn Dialogue

​Error Handling

​Error Response Example

​Conclusion