Simulating Real-Time Chats using Flask's Server-Sent Events

November 06, 2023 | 2 min read | 514 views

With the increasing adoption of OpenAI’s GPT API and similar LLM APIs, chatbots have become a hot topic in the technology world. In chatbot-powered apps, the response from the GPT is often streamed to allow real-time conversation, which is enabled by server-sent events (SSE).

This is cool, but here’s the thing: when you develop your app with these APIs, it can get expensive, tricky with rate limits, and complex. So you may want to mock them during in the development environment. In this short blog post, I present a minimal implementation of a Flask server to simulate OpenAI’s stream API.

Here is a demo using my mocked API.

Prerequisites

I tested my code in the settings below:

python = “3.11”
Flask = “3.0.0”
flask-cors = “4.0.0”

Server Side Code

The minimal server side code can be written in just 20 lines.

import flask
import flask_cors
import time

app = flask.Flask(__name__)
flask_cors.CORS(app)

def generate():
    tokens = "Hello, this is a mocked chatbot. How can I help you?".split(" ")
    for token in tokens:
        time.sleep(0.5)
        yield f"data: {token}\n\n"
    yield "data: [DONE]\n\n"

@app.route('/stream')
def stream():
    return flask.Response(generate(), mimetype='text/event-stream')

if __name__ == "__main__":
    app.run(port=8001)

It first defines a generator that simulates a chatbot sending two tokens per second. Then the SSE route /stream streams these messages to clients in the text/event-stream format.

That’s pretty much it, but I’d like to add three points here:

In the generate function, the prefix “data: ” and the suffix “\n\n” are important. If they are misformatted, the client side may not receive the data correctly
The stream terminates by a “data: [DONE]\n\n” messsage, imitating the OpenAI API’s behavior
flask_cors.CORS(app) is required when you want to call the endpoint from browsers

You can run this server at port 8001 by something like python server.py

Client side

You can call the above endpoint simply by curl http://localhost:8001, but let’s step a little further using JavaScript. Using a EventSource pattern, you can create an SSE-triggered logic to update the DOM.

<!DOCTYPE html>
<html>
  <body>
    <div id="msg-box"></div>
    <script>
      var source = new EventSource("http://localhost:8001/stream")
      source.onopen = (e) => console.log("Connection opened")
      source.onerror = (e) => console.log("Error:", event)
      source.onmessage = (e) => {
        if (event.data !== "[DONE]") {
          document.getElementById("msg-box").innerHTML += " " + event.data
        } else {
          console.log("Connection closed")
          source.close()
        }
      }
    </script>
  </body>
</html>