Skip to content

Instantly share code, notes, and snippets.

@njakobsen
Last active November 5, 2021 02:28
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save njakobsen/6257887 to your computer and use it in GitHub Desktop.
Save njakobsen/6257887 to your computer and use it in GitHub Desktop.
Live stream a database dump (or any other STDOUT) using Rails 4. Why would you want this? If you have a large database dump and want to avoid storing it in memory as Rails streams it. This allows pipe the dump directly into the http response instead of storing it as a file, sending it, and then deleting it. Let me know what you think! I've teste…
class DatabaseController < ApplicationController
def database_dump
database = Rails.configuration.database_configuration[Rails.env]["database"]
send_file_headers!(:type => 'application/octet-stream', :filename => "#{database}_#{Time.now.to_s(:human)}.backup")
pipe = IO.popen("pg_dump '#{database}' -F c")
stream = response.stream
while (line = pipe.gets)
stream.write line
sleep 0.0001 # HACK: Prevent server instance from sleeping forever if client disconnects during download
end
rescue IOError
# Client Disconnected
ensure
pipe.close
response.stream.close
end
# Code that allows us to only mix in the live methods if we're accessing the desired action
def dispatch(name, *args)
extend ActionController::Live if name.to_s == 'database_dump'
super
end
end
@njakobsen
Copy link
Author

Not sure why the sleep hack is necessary at this point, but it was the only way I could consistently prevent the server instance from sleeping forever if the user hung up prematurely. My initial guess is that live streaming a database is a lot of work and so the server probably needs to sleep a little after each write, otherwise it gets so tired that it never wakes up.

@njakobsen
Copy link
Author

I've overridden the dispatch method to allow us to specify which actions are live and only mix in the ActionController::Live module into the controller instance when calling those methods. You would want to do this if you have other actions that don't work well with live streaming, e.g. a typical ajax response action or any page that throws an exception (currently will redirect to a 500 instead of showing debug info)

This could definitely be cleaned up, but works well in this demo.

@tenderlove
Copy link

@njakobsen do you need a particular size database dump to get this? We use a heartbeat and write to the client ever N seconds, which will cause an IOError if the client disconnects. It seems your code would do the same. I'm not sure why sleep is required.

@njakobsen
Copy link
Author

@tenderlove Sorry didn't see you post until now. During my testing I used a database that dumped about 300MB data across the wire. I don't pretend to know why it works, it just seemed to be the secret ingredient that made it all blend.

What happens when there's an IOError, does the server return to a ready state? It's been a while since I touched this code, but I recall it would stop responding to requests if the client cancelled the download. @ericboehs just had some success using this Gist, maybe he can weigh in with his experiences too. See gottfrois/dashing-rails#12 (comment)

I'd be more than willing to try some more experiments if you had something in mind.

@tenderlove
Copy link

What happens when there's an IOError, does the server return to a ready state?

It's supposed to, yes.

I'd be more than willing to try some more experiments if you had something in mind.

Are you able to reliably reproduce the error if you remove the sleep? If so, I'd love to try debugging it. I suspect it has to do with thread switching, but I'm not 100% sure. Calling sleep would definitely give a chance to schedule a different thread. If you can reproduce the error without a sleep, could you try replacing sleep with Thread.pass and see if that fixes it too?

One thing to note is that the output buffer is a sized queue, so it will block writes if it's full (this is to prevent the server from consuming all memory if the client is slow). I don't think it would have any impact, but I do think it's worth mentioning.

@njakobsen
Copy link
Author

I booted it up again in Ruby 2.1, Rails 4.0.2, and using Puma 2.8.2 and Passenger 4.0.42.
I trigger the streaming download, and then after a few seconds, cancel it in the browser download window.

Puma

No sleep 0.0001

Downloads are streaming but cancelling one causes the server to stop responding.

Using sleep 0.0001

Downloads are streaming but queued, one will not start before the previous one is cancelled.
Random number of downloads (usually between 2 and 8) can be cancelled before the server stops responding.

ctrl + c the server consistently causes the next-response-in-line to return the following error.

Puma caught this error: Attempt to unlock a mutex which is not locked (ThreadError)
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/rack-1.5.2/lib/rack/lock.rb:22:in `unlock'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/rack-1.5.2/lib/rack/lock.rb:22:in `ensure in call'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/rack-1.5.2/lib/rack/lock.rb:23:in `call'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/actionpack-4.0.2/lib/action_dispatch/middleware/static.rb:64:in `call'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/rack-1.5.2/lib/rack/sendfile.rb:112:in `call'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/airbrake-3.1.15/lib/airbrake/user_informer.rb:16:in `_call'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/airbrake-3.1.15/lib/airbrake/user_informer.rb:12:in `call'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/railties-4.0.2/lib/rails/engine.rb:511:in `call'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/railties-4.0.2/lib/rails/application.rb:97:in `call'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/railties-4.0.2/lib/rails/railtie/configurable.rb:30:in `method_missing'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/puma-2.8.2/lib/puma/rack_patch.rb:13:in `call'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/puma-2.8.2/lib/puma/configuration.rb:71:in `call'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/puma-2.8.2/lib/puma/server.rb:490:in `handle_request'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/puma-2.8.2/lib/puma/server.rb:361:in `process_client'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/puma-2.8.2/lib/puma/server.rb:254:in `block in run'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/puma-2.8.2/lib/puma/thread_pool.rb:92:in `call'
/Users/nicholas/.rvm/gems/ruby-2.1.2/gems/puma-2.8.2/lib/puma/thread_pool.rb:92:in `block in spawn_thread'

Thread.pass

Downloads are streaming but queued, one will not start before the previous one is cancelled.
Doesn't seem to lock up the server at all.
All requests reach response.stream.close.

Puma worker threads puma -w

No effect on above behaviour

Passenger

No sleep 0.0001

Downloads are streaming and download in parallel.
Random number of downloads (usually between 2 and 8) can be cancelled before the server stops responding.
Cancelled requests don't seem to reach response.stream.close.

Using sleep 0.0001

Downloads are streaming and download in parallel.
Doesn't seem to lock up the server at all.

Thread.pass

Downloads are streaming and download in parallel.
Doesn't seem to lock up the server at all.
All requests reach response.stream.close.

@matthewd
Copy link

matthewd commented Jun 1, 2014

@njakobsen btw, you probably don't want to use gets here: I imagine read(n) will be much more efficient. Most of the "lines" you see are going to be pretty tiny.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment