vsajip/gist:721870

## gistfile1.txt
Some thoughts about Celery and logging
======================================

LoggerAdapter
-------------

It appears that the use of LoggerAdapter is to get the processName into the
LogRecord (at least, that's what _CompatLoggerAdapter does) but if
LoggerAdapter is defined in logging, _CompatLoggerAdapter won't be used.

processName
-----------

There are three places in the code where the processName is written to a
LogRecord, some of which can lead to unexpected behaviour in some scenarios:

1. ColorFormatter. What happens if you don't use a ColorFormatter?
2. _CompatLoggerAdapter. This works, but only because of the coincidence that
   LoggerAdapter and multiprocessing both appeared in Python 2.6. So in pre-
   2.6 versions, _CompatLoggerAdapter does the job; in 2.6 and later, logging
   itself will update the LogRecord with processName.
3. ProcessAwareLogger. This can fail if a user does a setLoggerClass with
   their own Logger subclass.

A possible solution is to replace the above mechanisms with a filter, e.g.

class ProcessNameFilter(logging.Filter):
    def __init__(self):
        import multiprocessing
        self.processName = multiprocessing.current_process()._name

    def filter(self, record):
        record.processName = self.processName
        return True

You only ever need one instance of these per process:

pnf = ProcessNameFilter()

and, where a logger is returned to a user, you could do for example:

    if pnf not in returned_logger.filters:
        returned_logger.addFilter(pnf)
    return returned_logger

This should avoid the need to use a LoggerAdapter altogether. The processName
will only be added if the event is actually logged, but that's fine.

Problems with stderr writes and infinite loops
----------------------------------------------

Logging only ever prints exceptions to sys.stderr where the value of
logging.raiseExceptions is true (which it is by default, in order to catch
errors during development), or if no handlers are found for a logging call.
The latter can be avoided by adding a NullHandler to the top-level Celery
logger and also to task loggers where these are not below the top-level
Celery logger. (Note that in general, the recommendation is to use __name__
to create module-level loggers, and, where necessary, to create loggers below
the module-level logger for more granularity, or occasionally a non-module-
named logger for some cross-cutting concern across modules (such as
"sql_executed" in a database application).

You could recommend that users set logging.raiseExceptions to False, but I
accept that (a) in development you could still get into trouble, and (b)
users might not do this in production either, and of course you don't want
infinite loops!

One way round it is to have a LoggingProxy as you do now, but change the way
it works. I think the best way of logging with multiprocessing is to use a
multiprocessing queue, as described here:

http://plumberjack.blogspot.com/2010/09/using-logging-with-multiprocessing.html

http://plumberjack.blogspot.com/2010/09/improved-queuehandler-queuelistener.html

So, the main Celery process creates a queue and initialises a QueueListener
with it, and the queue is also passed to all worker processes. These just have
a QueueHandler configured (no other handlers at all) which writes any events
logged by tasks to that queue. The main process can take any logging
configuration (e.g. via a config file) and log stuff to the appropriately
configured destinations, via a QueueListener which is initalised with a
handler function something like this:

def handle_log_event(record):
    logger = logging.getLogger(record.name)
    logger.handle(record)

which will automatically make use of the active logging configuration in the
main process.

Summary:

Worker processes and code running in tasks all log to whatever logger
they want, but only one handler is provided on the root logger: a QueueHandler
which writes all events to the queue. You can attach a ProcessNameFilter to
this handler for Pythons, which automatically adds the processName before
enqueuing the record. Workers can take account of logging levels which are
configured, to avoid generating logging events needlessly. I think there is
no need to provide things like filenames, propagation, colorizing settings etc.
to task loggers.

In the main process, a QueueListener dequeues these records and sends them
to the handlers for appropriately named logger. This allows the main Celery
process to be set up with a completely custom logging configuration, so that
user logs end up wherever they want them to go, but contain correct info about
the process in which they ran. You can attach a colorising formatter to a
console logger configured here, of course.

The above was generated following a relatively brief look through the code so
I may have missed something, please let me know if so.
	Some thoughts about Celery and logging
	======================================

	LoggerAdapter
	-------------

	It appears that the use of LoggerAdapter is to get the processName into the
	LogRecord (at least, that's what _CompatLoggerAdapter does) but if
	LoggerAdapter is defined in logging, _CompatLoggerAdapter won't be used.

	processName
	-----------

	There are three places in the code where the processName is written to a
	LogRecord, some of which can lead to unexpected behaviour in some scenarios:

	1. ColorFormatter. What happens if you don't use a ColorFormatter?
	2. _CompatLoggerAdapter. This works, but only because of the coincidence that
	LoggerAdapter and multiprocessing both appeared in Python 2.6. So in pre-
	2.6 versions, _CompatLoggerAdapter does the job; in 2.6 and later, logging
	itself will update the LogRecord with processName.
	3. ProcessAwareLogger. This can fail if a user does a setLoggerClass with
	their own Logger subclass.

	A possible solution is to replace the above mechanisms with a filter, e.g.

	class ProcessNameFilter(logging.Filter):
	def __init__(self):
	import multiprocessing
	self.processName = multiprocessing.current_process()._name

	def filter(self, record):
	record.processName = self.processName
	return True

	You only ever need one instance of these per process:

	pnf = ProcessNameFilter()

	and, where a logger is returned to a user, you could do for example:

	if pnf not in returned_logger.filters:
	returned_logger.addFilter(pnf)
	return returned_logger

	This should avoid the need to use a LoggerAdapter altogether. The processName
	will only be added if the event is actually logged, but that's fine.

	Problems with stderr writes and infinite loops
	----------------------------------------------

	Logging only ever prints exceptions to sys.stderr where the value of
	logging.raiseExceptions is true (which it is by default, in order to catch
	errors during development), or if no handlers are found for a logging call.
	The latter can be avoided by adding a NullHandler to the top-level Celery
	logger and also to task loggers where these are not below the top-level
	Celery logger. (Note that in general, the recommendation is to use __name__
	to create module-level loggers, and, where necessary, to create loggers below
	the module-level logger for more granularity, or occasionally a non-module-
	named logger for some cross-cutting concern across modules (such as
	"sql_executed" in a database application).

	You could recommend that users set logging.raiseExceptions to False, but I
	accept that (a) in development you could still get into trouble, and (b)
	users might not do this in production either, and of course you don't want
	infinite loops!

	One way round it is to have a LoggingProxy as you do now, but change the way
	it works. I think the best way of logging with multiprocessing is to use a
	multiprocessing queue, as described here:

	http://plumberjack.blogspot.com/2010/09/using-logging-with-multiprocessing.html

	http://plumberjack.blogspot.com/2010/09/improved-queuehandler-queuelistener.html

	So, the main Celery process creates a queue and initialises a QueueListener
	with it, and the queue is also passed to all worker processes. These just have
	a QueueHandler configured (no other handlers at all) which writes any events
	logged by tasks to that queue. The main process can take any logging
	configuration (e.g. via a config file) and log stuff to the appropriately
	configured destinations, via a QueueListener which is initalised with a
	handler function something like this:

	def handle_log_event(record):
	logger = logging.getLogger(record.name)
	logger.handle(record)

	which will automatically make use of the active logging configuration in the
	main process.

	Summary:

	Worker processes and code running in tasks all log to whatever logger
	they want, but only one handler is provided on the root logger: a QueueHandler
	which writes all events to the queue. You can attach a ProcessNameFilter to
	this handler for Pythons, which automatically adds the processName before
	enqueuing the record. Workers can take account of logging levels which are
	configured, to avoid generating logging events needlessly. I think there is
	no need to provide things like filenames, propagation, colorizing settings etc.
	to task loggers.

	In the main process, a QueueListener dequeues these records and sends them
	to the handlers for appropriately named logger. This allows the main Celery
	process to be set up with a completely custom logging configuration, so that
	user logs end up wherever they want them to go, but contain correct info about
	the process in which they ran. You can attach a colorising formatter to a
	console logger configured here, of course.

	The above was generated following a relatively brief look through the code so
	I may have missed something, please let me know if so.