Million user webchat with Full Stack Flux, React, redis and PostgreSQL

websockets-flux-multi

Sexy, right? Flux over the Wire allows us to leverage the goodness of Flux to share a single source of truth between multiple clients, and update them in real time. This makes building things like chats, comments systems, or even auto-updating articles, fairly easy. Declare your data dependencies at the component level. Not only will they auto-resolve, but they will also get auto-updated whenever the backend is updated! For free, as in free beer!

Well, not entirely free. This makes a nice 1000-user chat demo, but now each user requires an active network connection and a server-side representation of its stores subscription to remain on your website. Multiply this by your average concurrent number of users, and you might have a problem. The exact amount of users your single-process Node app may handle properly vastly depends on how you batch your updates and how chattery your app is, but in my experience, this number hardly exceeds several dozens of thousands concurrent users for your typical chatroom app. This is already quite nice compared to what you may achieve without using Flux over the Wire or batched mutations, but likely not enough if you run a moderately large website. In addition, unless you add a layer of persistence to your app, then if your server crashes, your data is screwed.

So we need to:

  • Embed the Source of Truth in a persistent storage that can be safely shared between multiple action producers/updates consumers,
  • Make every possible bottleneck scalable (more servers = more clients = more $$$)

SQL to the rescue

We actually have a tool that does persisting a transactionnal Source of Truth, and we’ve been using it for decades. Yup, good ole databases. We want the database to be the source of truth, so that everything else is volatile.

This means when something crashes, we can just restart it and it will work consistently.full-stack-flux-pgsql

My choice to implement a Source of Truth is PostegreSQL. Other choices are probably valid, too, but PostegreSQL plays nice with full stack flux:

  • You can implement action dispatchers as Stored procedures. It hides the implementation details from the Users and allows the Stores (represented by tables/rows) to be updated in an opaque way. In addition, it is usually slightly more performant than inline queries.
  • You can use NOTIFY to dispatch updates to an intermediate consumer.

You need a broker to translate action websocket frames into stored procedure calls and NOTIFY events into updates websocket frames, which we call the Flux broker (since it only pipe actions and updates both ways). We’ll just use a Node process running two PostgreSQL clients (one to forward actions by calling stored procedures, one to receive NOTIFY events and forward them as updates).

The Flux broker is stateless (apart from maintaining connections with each client), which means if it crashes, no SSoT data is lost. It also means that multiple broker processes can run concurrently to serve more clients.

Scaling to infinity and beyond

In our stack, the server has two very distinct tasks:

  • Client state/message routing & I/O: receiving/decoding actions, encoding/sending updates, and maintain per-client state (store subscriptions, TCP connection, etc).
  • Business logic, eg. handling actions and mutating stores

The second part, business logic, is your actual application code, eg. in the typical chatroom example, reacting to “post message” actions and mutations the “messages” stores accordingly.

It is easy to distribute when you have clearly separate domains of stores that don’t need to be synchronized: just split it in several, distinct processes.

But there are domains that are instrinsically hard to distribute, which typically involves locks, or an other kind of synchronization primitive. Scalability isn’t magic: there are some things that simply can’t scale. Our goal here will be to create the conditions so that using Flux over the Wire isn’t less performant/scalable that using the traditional query/response model (in fact, it will often scale even easier, since it streamlines cache invalidation, more on that later).

To achieve this, we need to isolate business logic from client state & I/O so that the underlying process(es) don’t waste precious cycles dealing with the latter. In practice, the client state & I/O part (aka the “front”) is very resource-consuming, in terms of OS resources (sockets, file descriptors), CPU, and memory. Once we’ve done that, then the client state & I/O doesn’t have to deal with the shared, global state. It just needs to manage state per-client, and forward actions & updates in both directions.

full-stack-flux-redis

In our case, it means that we should isolate the PostgreSQL database, and use a cluster of Node front servers to actually handle the Websocket connections. Each front server still needs to be able to pass actions to the database, and forward updates to the subscribing clients, however. Since we want to avoid each front server to tap into the resources of the PostegreSQL server, we can’t just let each have an open connection. So we’ll use another broker, that will maintain a single connection to the database, and communicate with each front server via a message queue.

There are only two very simple kind of messages, thanks to the simplicity of Flux: Actions, and Updates, each with a payload. We don’t need fancy routing stuff, therefore I chosed redis (instead of say, RabbitMQ or ZeroMQ). Very much like PostegreSQL should in principle be swappable with another database implementation, another message queue could be used in place of redis. I just have an excellent experience with redis handling millions of events per second. Note that we only use redis as a message queue; we don’t use its datastores feature at all (although it can actually be used to cache the stores values).

Now that our front is clustered, we should also use a reverse-proxy to do the load-balancing of the Websocket connections. HAProxy is very good at this, but any other load-balancing solution should also be fine. You want it to be sticky, though, since the websocket server processes keep track of client subscriptions.

In principle, until your business logic code becomes your actual limiting factor (and unless you’re Facebook, or doing something very wrong like 10 millions actions per second resulting in complex, locking mutations), it probably won’t be), you should be able to scale almost linearly the number of users your stack can handle at each level of the stack. Again, locking mutations are the limiting factor here; CPU-intensive calculations (computing derived data, eg. crunching terabytes of data with maths) can always be deferred to an external process to keep your PostegreSQL resources dedicated to what they need to do: mutative transactions.

One cool benefit that you can have for free is that semantically, the Source of Truth is a black box that can be ‘rerendered’ from an initial state and a series of actions. So if you log each action, in principle you can recover your Source of Truth state by replaying every action that ever happened in the same order. This might be a bit redundant with the SQL binlog, but it abstracts away the implementation details. Actions are the semantic transactional units in Flux, so it’s also closer to the Flux abstraction than the binlog.

“An architecture more than a framework”

Unlike a typical npm module, Full stack flux is more an architecture than a framework. Besides the multiplexer and the broker (which are relatively simple to implement), nexus-flux-socket.io should have your back covered for the Flux over the Wire implementation. So start your PostegreSQL, provide Actions & Stores in the form of Stored Procedures, start a bunch of multiplexer and broker (each broker runs nexus-flux-socket.io server), write your React app on top of that, and enjoy your own Full Reactive Stack.

You may be interested by the following npm modules, though:

Scaling to 1 million users

DISCLAIMER: You can take this paragraph as a thought experiment. Numbers may be inaccurate. I have not tested this actual architecture at scale (although I’ve used a quite similar backend design for a real-world, tens of thousands of concurrent users production project). Please feel free to point any inaccuracy or design flaw in the comments 🙂

As I mentionned above, scaling properly requires fine tuning and testing. The exact number of processes of each kind that you will need will largely depend on the number of connected clients, the number of clients subscribing to the same resources, the complexity of the mutations in your action handlers, the number/frequency of updates, etc.

However, I have experimented for a while with this kind of architecture. Here are my very empirical rules of thumbs:

  • Number of socket.io connections a single Node broker can handle: ~20k
  • Number of messages a single Node broker can handle: ~10k per second without JSON stringification memoization, ~100k per second with JSON stringification memoization [2] (Nexus Flux socket.io does that for you). That’s assuming each message is a few bytes only, ie. you don’t send collections of millions of values over the wire every now and then (Nexus Flux socket.io also does that for you by sending diffs over the wire).
  • Number of messages a single redis instance can handle: ~1 million per seconds (using only 2 event buses (actions and updates) to avoid routing overhead). Again, that’s assuming message payload is relatively small and you don’t pass huge collections as action payload or updates too often. [1]
  • PostegreSQL perf is much harder to tune, since you need to optimize at both the semantic and the structural level. However, done properly, 1 single shard should be able to handle tens of thousands of actions per second. Note that in most cases, latency should be low, and therefore locality should be of limited impact; all that matter here is the average action throughput.

Say you have a chat server, with 100 rooms, and each room has 10k connected clients (think a social platform chat system). That’s assuming you’re a very wealthy website, so adding a few servers shouldn’t hurt your financials too much 🙂 (more clients served = more money!)

  • You have a total of 1M connected clients; you probably need something like 50-100 front-end processses (‘Flux brokers’), each one handling 10 to 20k connections. At one process/core and using 4 to 8 cores servers, thats 5-25 actual servers. Clearly not that much, especially considering it’s all plug’n’play.
  • Say there are 100 messages per second in each chatroom because your users are very chatty (and let your product manager figure out whether its a good idea to let this happen). This means about 10000 actions per second. Its rather easy to scale the action pipe, 1-2 PostgreSQL instances (which can be sharded per room if necessary), 1 redis instance, and 1 multiplexer instance should be able to handle the actions pipe.
  • In the other direction, things are more complicated. Assuming a user can be in multiple channels, you can’t just shard connections on a per-room basis, so basically each front-end will need to receive the updates of each room. This means 10000 (number of updates) x 50-100 (number of fronts) redis UPDATE messages per second. Again, this should be handled by a single redis instance and a single multiplexer instance [1].

Bottom line: to run a full-fledged, million-user chat server, you need 10-20 front-end processes, 1 postgreSQL server, 1 multiplexer process, and 1 redis process.

Again, note that these calculations are rule of thumbs. If you do a poor job at optimizing simple stuff (like SSL termination or message encoding/decoding), then the scaling factor can drop by orders of magnitude. Conversely, if you do a great job at batching mutations (eg. combining update events per timeframe) for example, you might get even better results.

Does this post make you want to build an actual 1 million user webchat? Do you have a more original idea than you now consider implementing? Can you see room for improvement? Feel free to share your love or hate in the comments below.

[1] ~1 million seems to be an overly optimistic estimation. Benchmarks suggest more like 100k messages per second. Note that is still enough to handle the chat example, and that the MQ could in principle be sharded at the broker/multiplexer level.

[2] Remember that we are talking of massively multi-user chatrooms; therefore many clients subscribe to the same updates, and memoizing JSON-stringification yields huge performance gain over naive re-emitting.