bmaupin/graylog-vs-splunk-free.md

## graylog-vs-splunk-free.md

      
    Raw
  

              graylog-vs-splunk-free.md
            
          
    Overall impression

Splunk—the de-facto standard—is a much better product overall and requires less tweaking to get things working. If you
can work with the limitations of Splunk Free
(500 MB daily limit, no authentication/RBAC, no alerts, no clustering, etc),
use it.
Otherwise, Graylog is a decent product and works well, but will require some tweaking.
Role-based access control (RBAC)

Splunk Free has no authentication or RBAC. You can, however, put a reverse proxy server in front of it (Nginx, Apache, etc)
to provide authentication if you know what you're doing. You will still not have any RBAC, and all users who log in will
effectively be logged in as the same administrative user.
Graylog has RBAC, but it has its limits. For example, there is no separation of dashboards by users
(I imagine this applies to many other features but I'm not sure which). So if you create a dashboard, any user who can see
dashboards can see your dashboard. Any user who can modify dashboards can modify your dashboard.
Creating user roles in Graylog, surprisingly, cannot be done in the UI and must be done via the API. For example, see:
Create Graylog power user role
Documentation

The Splunk documentation is excellent. Period.
The Graylog documentation looks organized, but it only takes a few minutes of actually reading it to realize it's a bit of
a mess. It's an organized mess, but a mess nonetheless. It's much more difficult to determine how to do a task using the
Graylog documentation.
This will become evident as soon as you try to do one of the first tasks after setting up the Graylog server:
trying to send data to it (see: Sending in log data).
Parsing received data

In Splunk, received data is quite often automatically parsed and there's nothing more to do. Any additional
parsing/formatting you wish to do with the data after it's been received can typically be done on demand.
Timestamps

Graylog seems to require much more customization to handle received data. It often doesn't correctly parse timestamps, and
so they have to be modified at the source (for example, the default log4j v1 timestamp format wasn't correctly parsed).
Syslog

Another example: Splunk will by default join any multiline syslog messages that may get split by the client. Graylog not
only does not do this, but it doesn't seem to be even possible to customize it to do so. So this must be corrected at the
client.
Syslog messages over 1024 bytes will be split by Splunk just as they will Graylog. With Splunk this can be fixed,
e.g. by putting this into props.conf:
SEDCMD-join_log4j_syslog_lines=s/\.\.\.[\r\n]+\.\.\.//g

There doesn't seem to be a way to fix this at all in Graylog.
Custom parsing

While Splunk can easily do custom parsing of data after it's been received, with Graylog this typically needs to be set up before the data is received. For example, see: Extract Java exceptions in Graylog