Hippocampus's Garden

Under the sea, in the hippocampus's garden...

Simulating Real-Time Chats using Flask's Server-Sent Events | Hippocampus's Garden

Simulating Real-Time Chats using Flask's Server-Sent Events

November 06, 2023  |  2 min read  |  1,166 views

  • このエントリーをはてなブックマークに追加

With the increasing adoption of OpenAI’s GPT API and similar LLM APIs, chatbots have become a hot topic in the technology world. In chatbot-powered apps, the response from the GPT is often streamed to allow real-time conversation, which is enabled by server-sent events (SSE).

This is cool, but here’s the thing: when you develop your app with these APIs, it can get expensive, tricky with rate limits, and complex. So you may want to mock them during in the development environment. In this short blog post, I present a minimal implementation of a Flask server to simulate OpenAI’s stream API.

Here is a demo using my mocked API.

Prerequisites

I tested my code in the settings below:

  • python = “3.11”
  • Flask = “3.0.0”
  • flask-cors = “4.0.0”

Server Side Code

The minimal server side code can be written in just 20 lines.

import flask
import flask_cors
import time

app = flask.Flask(__name__)
flask_cors.CORS(app)

def generate():
    tokens = "Hello, this is a mocked chatbot. How can I help you?".split(" ")
    for token in tokens:
        time.sleep(0.5)
        yield f"data: {token}\n\n"
    yield "data: [DONE]\n\n"

@app.route('/stream')
def stream():
    return flask.Response(generate(), mimetype='text/event-stream')

if __name__ == "__main__":
    app.run(port=8001)

It first defines a generator that simulates a chatbot sending two tokens per second. Then the SSE route /stream streams these messages to clients in the text/event-stream format.

That’s pretty much it, but I’d like to add three points here:

  • In the generate function, the prefix “data: ” and the suffix “\n\n” are important. If they are misformatted, the client side may not receive the data correctly
  • The stream terminates by a “data: [DONE]\n\n” messsage, imitating the OpenAI API’s behavior
  • flask_cors.CORS(app) is required when you want to call the endpoint from browsers

You can run this server at port 8001 by something like python server.py

Client side

You can call the above endpoint simply by curl http://localhost:8001, but let’s step a little further using JavaScript. Using a EventSource pattern, you can create an SSE-triggered logic to update the DOM.

<!DOCTYPE html>
<html>
  <body>
    <div id="msg-box"></div>
    <script>
      var source = new EventSource("http://localhost:8001/stream")
      source.onopen = (e) => console.log("Connection opened")
      source.onerror = (e) => console.log("Error:", event)
      source.onmessage = (e) => {
        if (event.data !== "[DONE]") {
          document.getElementById("msg-box").innerHTML += " " + event.data
        } else {
          console.log("Connection closed")
          source.close()
        }
      }
    </script>
  </body>
</html>

When you open this file with a browser, you should see the streamed text. Happy coding!

References

[1] API Reference - OpenAI API
[2] Using server-sent events - Web APIs | MDN
[3] javascript - EventSource.onmessage is not firing even though Flask server push seems to work - Stack Overflow
[4] Server-sent events in Flask without extra dependencies • Max Halford
[5] How to stream completions | OpenAI Cookbook


  • このエントリーをはてなブックマークに追加
[object Object]

Written by Shion Honda. If you like this, please share!

Shion Honda

Hippocampus's Garden © 2024, Shion Honda. Built with Gatsby