Skip to content

Instantly share code, notes, and snippets.

@vsajip
Created November 30, 2010 15:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save vsajip/721870 to your computer and use it in GitHub Desktop.
Save vsajip/721870 to your computer and use it in GitHub Desktop.
Some thoughts about Celery and logging
Some thoughts about Celery and logging
======================================
LoggerAdapter
-------------
It appears that the use of LoggerAdapter is to get the processName into the
LogRecord (at least, that's what _CompatLoggerAdapter does) but if
LoggerAdapter is defined in logging, _CompatLoggerAdapter won't be used.
processName
-----------
There are three places in the code where the processName is written to a
LogRecord, some of which can lead to unexpected behaviour in some scenarios:
1. ColorFormatter. What happens if you don't use a ColorFormatter?
2. _CompatLoggerAdapter. This works, but only because of the coincidence that
LoggerAdapter and multiprocessing both appeared in Python 2.6. So in pre-
2.6 versions, _CompatLoggerAdapter does the job; in 2.6 and later, logging
itself will update the LogRecord with processName.
3. ProcessAwareLogger. This can fail if a user does a setLoggerClass with
their own Logger subclass.
A possible solution is to replace the above mechanisms with a filter, e.g.
class ProcessNameFilter(logging.Filter):
def __init__(self):
import multiprocessing
self.processName = multiprocessing.current_process()._name
def filter(self, record):
record.processName = self.processName
return True
You only ever need one instance of these per process:
pnf = ProcessNameFilter()
and, where a logger is returned to a user, you could do for example:
if pnf not in returned_logger.filters:
returned_logger.addFilter(pnf)
return returned_logger
This should avoid the need to use a LoggerAdapter altogether. The processName
will only be added if the event is actually logged, but that's fine.
Problems with stderr writes and infinite loops
----------------------------------------------
Logging only ever prints exceptions to sys.stderr where the value of
logging.raiseExceptions is true (which it is by default, in order to catch
errors during development), or if no handlers are found for a logging call.
The latter can be avoided by adding a NullHandler to the top-level Celery
logger and also to task loggers where these are not below the top-level
Celery logger. (Note that in general, the recommendation is to use __name__
to create module-level loggers, and, where necessary, to create loggers below
the module-level logger for more granularity, or occasionally a non-module-
named logger for some cross-cutting concern across modules (such as
"sql_executed" in a database application).
You could recommend that users set logging.raiseExceptions to False, but I
accept that (a) in development you could still get into trouble, and (b)
users might not do this in production either, and of course you don't want
infinite loops!
One way round it is to have a LoggingProxy as you do now, but change the way
it works. I think the best way of logging with multiprocessing is to use a
multiprocessing queue, as described here:
http://plumberjack.blogspot.com/2010/09/using-logging-with-multiprocessing.html
http://plumberjack.blogspot.com/2010/09/improved-queuehandler-queuelistener.html
So, the main Celery process creates a queue and initialises a QueueListener
with it, and the queue is also passed to all worker processes. These just have
a QueueHandler configured (no other handlers at all) which writes any events
logged by tasks to that queue. The main process can take any logging
configuration (e.g. via a config file) and log stuff to the appropriately
configured destinations, via a QueueListener which is initalised with a
handler function something like this:
def handle_log_event(record):
logger = logging.getLogger(record.name)
logger.handle(record)
which will automatically make use of the active logging configuration in the
main process.
Summary:
Worker processes and code running in tasks all log to whatever logger
they want, but only one handler is provided on the root logger: a QueueHandler
which writes all events to the queue. You can attach a ProcessNameFilter to
this handler for Pythons, which automatically adds the processName before
enqueuing the record. Workers can take account of logging levels which are
configured, to avoid generating logging events needlessly. I think there is
no need to provide things like filenames, propagation, colorizing settings etc.
to task loggers.
In the main process, a QueueListener dequeues these records and sends them
to the handlers for appropriately named logger. This allows the main Celery
process to be set up with a completely custom logging configuration, so that
user logs end up wherever they want them to go, but contain correct info about
the process in which they ran. You can attach a colorising formatter to a
console logger configured here, of course.
The above was generated following a relatively brief look through the code so
I may have missed something, please let me know if so.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment