Wayne Robinson's Blog

I'm working on a new application (StaffLocation.com) that requires the ability to stream live data from a server to the client web application via JavaScript. There are two methods that I could approach this.

The first (and simplest) involves polling a script on the server every x-number of seconds to see if the data has changed. This is the traditional way most web applications retrieve status updates from the server but it has two nasty side affects:

Every active client sends a request to the server every x-seconds whether there is changed data or not. Therefore, if there are 1,000 users on the website and you want the data to be updated every second, then those users will generate 1,000 hits to your application back-end (and that's 1,000 times whatever database reads/writes you do per hit).
Every hit uses bandwidth whether it contains data or not. A blank request/response will still contain many hundreds of bytes. Therefore, using the example of 1,000 active users polling every second at a 200 byte payload we would be using 200KB/s (1.2mbps - almost 500GB per month) of bandwidth.

Of course this situation could be improved by decreasing the frequency of polls, and some empirical testing with end-users of our prototypes show that anything below 4 seconds for an update feels instantaneous. However, bandwidth for this type of solution is still using 300kbps (125GB per month) for 1,000 users. Also, advanced caching techniques will be required as there is no way MySQL can handle the authorization & status checks required for much more than 1,000 hits per second. Maybe it's time to investigate other options.

The other type of client/server communication is via server-push. So, instead of the client requesting the server for a new status with 90% of the responses being, "NOT YET!!" The client connects once to the server and the server sends the clients updates on what the client is listening for. At this point I hear you all telling me that HTTP and the web just doesn't work this way. It is a PULL-only based technology. Well, some of you might be surprised to know that server-PUSH functionality has existed since Netscape 1.1 in the form of the content-type multipart/x-mixed-replace. The trouble is, Internet Explorer doesn't seem to support this content-type anymore (it did in 3.0) and it is very hard to use with JavaScript. However, there are ways in which JavaScript may be pushed to the client using modern browsers.

Now, the trouble with most web servers is (especially when utilizing dynamic content creation) that they don't really like pushing data to the browser without caching it first (large static files are the exception). Luckily for me, it is relatively easy today to roll your own web server. For this proof of concept I shall be using Ruby and WEBrick.

Of course, WEBrick by default follows the same pattern of, "lets cache all the data before sending it to the client." Luckily, due to the object-oriented nature of Ruby, that assumption is very easy to override. As a proof of concept, I've created a web server that dynamically adds lines to the page every 3 seconds with the use of JavaScript. One of the tricky things to remember is that the browser is expecting regular data. If it doesn't receive it within it's timeout window (usually 60 seconds), then it will think the connection has been closed. This example sends a single space character every second as a KeepAlive packet. The code is a little long, so you can download it here (push_server.rb).

Now, of course, this doesn't solve the problem of working out when there are new statuses to update the client with, but it does provide an example of a highly efficient server that can easily handle tens of thousands of connections (with the correct file-descriptor permissions on your server) with very little load when the data-set isn't changing. Also, it reduces the bandwidth usage down to the KeepAlive packets (1 byte per 10 seconds for 1,000 users is 12.5kbps or 250MB per month).

To start the server just run ruby push_server.rb. The server will start on port 2000 and you can access the example page via http://localhost:2000/hold.

The next article will be about putting a scalable observer/listener layer on top of this HTTP push server connected to our back-end status database.