Skip to content

Instantly share code, notes, and snippets.

@monolar
Last active August 29, 2015 14:07
Show Gist options
  • Save monolar/8835598d59ef9d1a2d41 to your computer and use it in GitHub Desktop.
Save monolar/8835598d59ef9d1a2d41 to your computer and use it in GitHub Desktop.
Celluloid-IO TCPSocket memleak/filehandle leak issue
#!/usr/bin/env ruby
require 'rubygems'
require 'bundler/setup'
require 'celluloid/io'
require 'chromatic'
require 'objspace'
class Client
include Celluloid::IO
def initialize
@socket = nil
@connecting_timer = nil
reconnect
run
end
def run
loop do
sleep 1
end
end
def disconnect
return unless @socket
@socket.close
@socket = nil
end
def reconnect
puts "connecting ...".green
if @connecting_timer
@connecting_timer.cancel
@connecting_timer = nil
end
disconnect
@socket = TCPSocket.new('localhost', 55555)
rescue Exception => e
puts "error while connecting: #{e.inspect}:\n #{e.backtrace.join("\n ")}".red
disconnect
dump_objects
@connecting_timer = after(1) {
reconnect
}
end
def dump_objects
GC.start(full_mark: true, immediate_sweep: true)
puts ObjectSpace.count_objects[:T_OBJECT]
c = ObjectSpace.each_object(Socket) do |s|
s.close unless s.closed?
end
puts "#{c} sockets"
end
end
Client.new
#!/usr/bin/env ruby
require 'rubygems'
require 'bundler/setup'
require 'celluloid'
require 'celluloid/io'
class Manager
include Celluloid
trap_exit :actor_died
def initialize
connect
run
end
def actor_died(actor, reason)
c = ObjectSpace.each_object(Socket) { |s| }
puts "#{c} sockets"
after (1) {
connect
}
end
def connect
Connection.new_link.async.connect
end
def run
loop do
sleep 1
end
end
end
class Connection
include Celluloid::IO
def connect
@socket = Celluloid::IO::TCPSocket.new('localhost', 5555)
# would actually do sensible stuff with the socket now...
end
end
Manager.new
@mikeatlas
Copy link

From https://github.com/celluloid/celluloid/wiki/Actor-lifecycle

"Everytime you do MyActor.new within Celluloid, in the background you are spawning a new native thread. However, when a regular sequential object goes out of scope, the garbage collector will automatically clean it up for you. This is not the case with actors: if they go out of scope, they will continue running and never be garbage collected.
This means when you're done with an actor, you need to terminate it explicitly:

actor = MyActor.new
...
actor.terminate

The terminate method sends a system message to an actor requesting that it gracefully terminate."

Personally, I have a lot of "ugly" cleanup code to ensure my actors are releasing active sockets in begin/rescues:

socket.close if !socket.nil? && !socket.closed? && !socket.eof?

@Asmod4n
Copy link

Asmod4n commented Oct 6, 2014

You shouldn't be rescueing "Exception" here but IOError i guess: https://gist.github.com/monolar/8835598d59ef9d1a2d41#file-demo-rb-L39

@monolar
Copy link
Author

monolar commented Oct 6, 2014

@Asmod4n I guess i should technically rescue the exceptions listed here: http://www.ruby-doc.org/stdlib-2.0/libdoc/socket/rdoc/Socket.html#method-i-connect - but you are right, simply rescuing "Exception" here is a little too broad but it served its purpose for this demo.

As for the actor lifecycle (@mikeatlas) - this indeed probably explains the Socket object not being GC'd (Tests in Pry seem to confirm this). The main issue i have here then is that the Celluloid-IO implementation of TCPSocket does not take care of cleaning up the file-handle by calling @socket.close and since the actual socket is not GC'd (since the actor is indefinitely running) the filehandle lingers on.

I guess there are a few ways to approach this:

  • Have the TCPSocket implementation in Celluloid-IO at least close the socket if something goes wrong in the constructor. This would still leave some Socket objects lying around that eventually get GC'd only when the actor dies - which may be a long time for some kind of daemon.
  • Approach the whole reconnection approach differently. E.g. have one actor only deal with the connection and keep the business logic (this includes reconnects) in another actor (linking probably helps here as well). This way the actual Socket objects are in the lifecycle of one actor (thread), whereas the business entity lives in its own actor (thread) - I am not sure if i properly explained this approach. Will see if can whip together an example.

@monolar
Copy link
Author

monolar commented Oct 6, 2014

I added another demo (demo2.rb) which seems to solve this issue. Basically the connection is handled in one actor which is linked to another actor, which takes care of reconnecting.

What is not detailed here is the actual listening and communication between those two actors to actually do something sensible with the socket, e.g. protocol.

This new example produces an output like

E, [2014-10-06T18:57:04.536654 #26431] ERROR -- : Actor crashed!
Errno::ECONNREFUSED: Connection refused - connect(2) for 127.0.0.1:5555
    /Users/andreas/git/celluloid-io/lib/celluloid/io/tcp_socket.rb:84:in `connect_nonblock'
    /Users/andreas/git/celluloid-io/lib/celluloid/io/tcp_socket.rb:84:in `initialize'
    bin/demo2.rb:42:in `new'
    bin/demo2.rb:42:in `connect'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `public_send'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `dispatch'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/calls.rb:122:in `dispatch'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/cell.rb:60:in `block in invoke'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/cell.rb:71:in `block in task'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/actor.rb:357:in `block in task'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/tasks.rb:57:in `block in initialize'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/tasks/task_fiber.rb:15:in `block in create'
1 sockets
[...]
E, [2014-10-06T18:57:12.570421 #26431] ERROR -- : Actor crashed!
Errno::ECONNREFUSED: Connection refused - connect(2) for 127.0.0.1:5555
    /Users/andreas/git/celluloid-io/lib/celluloid/io/tcp_socket.rb:84:in `connect_nonblock'
    /Users/andreas/git/celluloid-io/lib/celluloid/io/tcp_socket.rb:84:in `initialize'
    bin/demo2.rb:42:in `new'
    bin/demo2.rb:42:in `connect'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `public_send'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `dispatch'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/calls.rb:122:in `dispatch'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/cell.rb:60:in `block in invoke'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/cell.rb:71:in `block in task'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/actor.rb:357:in `block in task'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/tasks.rb:57:in `block in initialize'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/tasks/task_fiber.rb:15:in `block in create'
9 sockets
E, [2014-10-06T18:57:13.576238 #26431] ERROR -- : Actor crashed!
Errno::ECONNREFUSED: Connection refused - connect(2) for 127.0.0.1:5555
    /Users/andreas/git/celluloid-io/lib/celluloid/io/tcp_socket.rb:84:in `connect_nonblock'
    /Users/andreas/git/celluloid-io/lib/celluloid/io/tcp_socket.rb:84:in `initialize'
    bin/demo2.rb:42:in `new'
    bin/demo2.rb:42:in `connect'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `public_send'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `dispatch'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/calls.rb:122:in `dispatch'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/cell.rb:60:in `block in invoke'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/cell.rb:71:in `block in task'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/actor.rb:357:in `block in task'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/tasks.rb:57:in `block in initialize'
    /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/tasks/task_fiber.rb:15:in `block in create'
3 sockets

This shows that the GC now actually kicks in. While looking at the filehandles it can be nicely seen that they are closed properly on their own...although it can take a while.

@monolar
Copy link
Author

monolar commented Oct 6, 2014

Dammit - i accidently deleted the first comment in a coffee induced coma:

Here is it again:


Output is something like

connecting ...
error while connecting: #<Errno::ECONNREFUSED: Connection refused - connect(2) for 127.0.0.1:55555>:
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-io-0.16.0/lib/celluloid/io/tcp_socket.rb:85:in `connect_nonblock'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-io-0.16.0/lib/celluloid/io/tcp_socket.rb:85:in `initialize'
  bin/demo.rb:44:in `new'
  bin/demo.rb:44:in `reconnect'
  bin/demo.rb:21:in `initialize'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `public_send'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `dispatch'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/calls.rb:63:in `dispatch'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/cell.rb:60:in `block in invoke'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/cell.rb:71:in `block in task'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/actor.rb:357:in `block in task'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/tasks.rb:57:in `block in initialize'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/tasks/task_fiber.rb:15:in `block in create'
411
1 sockets
connecting ...
error while connecting: #<Errno::ECONNREFUSED: Connection refused - connect(2) for 127.0.0.1:55555>:
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-io-0.16.0/lib/celluloid/io/tcp_socket.rb:85:in `connect_nonblock'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-io-0.16.0/lib/celluloid/io/tcp_socket.rb:85:in `initialize'
  bin/demo.rb:44:in `new'
  bin/demo.rb:44:in `reconnect'
  bin/demo.rb:50:in `block in reconnect'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/actor.rb:357:in `block in task'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/tasks.rb:57:in `block in initialize'
  /Users/andreas/.rvm/gems/ruby-2.1.3@tcp_socket_demo/gems/celluloid-0.16.0/lib/celluloid/tasks/task_fiber.rb:15:in `block in create'
421
2 sockets

and so on. The left-over filehandles are removed by an ugly hack (s.close unless s.closed?) via ObjectSpace

@mikeatlas
Copy link

I have something like:

finalizer :terminate_myactor

def terminate_myactor
        begin       
            @socket.close if !@socket.nil? && !@socket.closed? && !@socket.eof?
        rescue Celluloid::Task::TerminatedError
            begin
                @socket.close if !@socket.closed? && !@socket.eof?
            rescue Exception
                # at this point we can assume things are cleaned up enough
            end
        end
end

The operating system will eventually release the socket file handles in the process, after they've been in CLOSE_WAIT for awhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment