DRb as a server for long-running web processes: creating a website in Ruby that is responsive and progressive.

posted 2012-Feb-2
— updated 2012-Feb-3

Let’s say that you have a Sinatra web application allowing users to kick off a computationally-intensive command. Perhaps it’s a build system. Perhaps it’s a simulated annealing solution to solving sports scheduling. If you do something like this:

get '/start' do
  @result = do_long_running_thing()
  haml :results
end

…then your users will visit the URL and wait minutes before anything happens. Very likely their web browser will timeout and close the connection. Even worse, other visitors to your website will be stalled, waiting for the single-threaded Ruby process that is your web site to finish what it was working on. This is clearly not acceptable.

You could sort of fix the problem of locking out other users by running many website processes behind a reverse proxy, but that’s not a scalable solution: are you really going to have one ruby process for each possible concurrent visitor? Further, it still doesn’t provide a good experience for the user of the page.

You could kick off a Thread in the web server process, but this feels dirty and fragile to me. Instead, here is how I solve this problem:

  1. I want the visitor kicking off the long-running command to immediately see a page letting them know that it’s running. This provides good feedback, and frees up the server to handle other requests.

  2. I run the long-running command in a completely different process, an entirely separate Ruby program. I run each new command in its own thread (on the DRB server) so that the DRb server itself remains responsive.

  3. I provide a way for the Sinatra web application to poll the other process and get status updates on the command. The web page makes periodic AJAX requests to the server, the server asks the other process for an update and responds to the AJAX request with some JSON, and the web page updates the progress.

Without further ado, the code:

webserver.rb

require 'sinatra'
require 'haml'
require 'drb'
require 'json'

DRBSERVER = 'druby://localhost:9001'
MCP = DRbObject.new_with_uri(DRBSERVER)

class MyServer < Sinatra::Application
  set :haml, :format => :html5

  get '/' do
    @title = "Welcome to the MCP"
    @finished,@running = MCP.processes.partition{ |o| o[:results] }
    haml :home
  end

  get '/start' do
    @process_id = MCP.start_new_long_running_thingy
    @title = "Process ##{@process_id} Running"
    haml :start
  end

  get '/status' do
    content_type :json
    MCP.status_for(params[:process_id].to_i).to_json
  end

  get '/results/:process_id' do
    @title = "Results for Process ##{params[:process_id]}"
    @results = MCP.results_for( params[:process_id].to_i )
    haml :results
  end
end

mcp.rb

require 'drb'
require 'thread'

DRBSERVER = 'druby://localhost:9001'

module MasterControlProgram
  @scheduler = Mutex.new
  @process_by_id = {}
  def self.start_new_long_running_thingy
    @process_by_id.length.tap do |process_id|
      process = Process.new
      @process_by_id[process_id] = process
      process.go
    end
  end

  def self.status_for( process_id )
    if process = @process_by_id[process_id]
      process.status 
    end
  end

  def self.results_for( process_id )
    if process = @process_by_id[process_id]
      process.results 
    end
  end

  def self.processes
    @process_by_id.map do |id,process|
      if r = process.results
        { id: id, results: r }
      else
        { id: id, status: process.status }
      end
    end
  end
end

class MasterControlProgram::Process
  attr_reader :results
  def initialize
    @percent_done = 0.0
    @status = :starting
    @results = nil
    @data_accessor = Mutex.new
    @start = Time.now
  end

  def status
    # Ensure that nobody is changing the status while we read it
    @data_accessor.synchronize do
      { percent_done: @percent_done, status: @status }
    end
  end

  def go
    # silly simulation of process
    # will take on average 10 seconds to complete
    states = %w[ globbing_dirs aggregating_data undermixing_signals
                 damping_transients detecting_resonances
                 emptying_buffers computing_final_result ].map(&:to_sym)
    Thread.new do
      until @percent_done >= 1.0
        sleep rand * 1
        # Ensure that nobody is reading the status while we change it
        @data_accessor.synchronize do
          @status = states[ (states.length * @percent_done).floor ]
          @percent_done += rand * 0.03
        end
      end
      @data_accessor.synchronize do
        @percent_done = 1.0
        @status = :complete
      end
      @results = {
        signal_strength: [:excellent,:moderate,:poor].sample,
        score: rand * 100
      }
    end
  end
end

DRb.start_service( DRBSERVER, MasterControlProgram )
DRb.thread.join

views/home.haml

- one = @running.length == 1
%p There #{one ? :is : :are} #{@running.length} process#{:es unless one} running right now.

%p Want to <a href="/start">start a new process</a>?

- unless @finished.empty?
  %table
    %caption Finished Processes
    %thead
      %tr
        %th ID
        %th Signal Strength
        %th Final Score
    %tbody
      - @finished.each do |data|
        %tr
          %th <a href="/results/#{data[:id]}">#{data[:id]}</a>
          %td= data[:results][:signal_strength]
          %td= "%0.3f" % data[:results][:score]

%p#notice This page auto-refreshes every few seconds.

%p TODO: We should list all the running processes, and their statuses. We should have JavaScript polling for all the running processes and giving live status updates here.

:javascript
  setTimeout(function(){location.reload()},2500);

views/start.haml

%p
  %span#pct 0.0%
  done
%p
  Status:
  %span#status
%p <a href="/">Return Home</a>

:javascript
  var $pct    = $('#pct'),
      $status = $('#status');

  // poll the server every second
  setInterval(function(){
    $.getJSON('/status',{process_id:#{@process_id.to_json}},function(data){
      $pct.html((data.percent_done * 100).toFixed(1)+"%");
      $status.html(data.status.replace(/_/g,' '));
      if (data.percent_done >= 1){
        location.href = '/results/#{@process_id}';
      }
    });
  },1000);

views/results.haml

%p Signal Strength: <b>#{@results[:signal_strength]}</b>
%p Final Scoring: <b>#{"%.3f" % @results[:score]}</b>
%p <a href="/">Return Home</a>

views/layout.haml

!!! 5
%html
  %head
    %meta(charset='utf-8')
    %title= @title
    %script(type='text/javascript' src='http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js')
    :css
      table { border-collapse: collapse }
      caption { background:#eee; border-bottom:1px solid #ccc; font-weight:bold }
      th, td { padding:0.1em 0.5em; border-bottom:1px solid #ccc }
  %body
    %h1= @title
    #content= yield   

config.ru

# This Rackup file helps start the web server from `thin start`
# if you are using Thin (or another Rack-based web server)
require ::File.join( ::File.dirname(__FILE__), 'webserver' )
run MyServer.new

You can download the above files here.

You can start the DRb server and web server with:

ruby mcp.rb &
ruby webserver.rb

You don’t have to start the MCP before you start the webserver. The line of code:

MCP = DRbObject.new_with_uri(DRBSERVER)

tells the web server how to connect to the DRb server when it needs to; it does not attempt to connect immediately. You can even restart the DRb server if necessary, and your web server will reconnect at the right time.

I used the short name MCP in the web server not only as a geeky reference to Tron, but also to illustrate the fact that although this object acts just like the MasterControlProcess module running on the other end of the DRb connection, it is not the same object. I did not include the mcp.rb file in webserver.rb; the DRb connection sends messages over the wire and the actual MasterControlProcess on the other end handles them and passes the response back over the wire.

I used a Thread to do the bulk of work in the Process so that the DRb server remains responsive. I used a Mutex to ensure that when the MasterControlProcess is reading the information it is not being changed at the same time by the process. I did not wrap reads the the results in a Mutex because I assumed that the thread will have completed its work—and written to the results—by the time anyone gets around to asking for the results.

Harold
06:32PM ET
2012-Feb-03

Wow, the syntax highlighting on this page looks awesome.

Gavin Kistner
02:58PM ET
2012-Feb-05

@Harold Why thank you! It’s inspired by (colors copied directly via screenshot) the Cobalt theme for TextMate by Jacob Rus.

net.mind details contact résumé other
Phrogz.net