Skip to content

Instantly share code, notes, and snippets.

@alexkli
Last active July 29, 2021 09:16
Show Gist options
  • Save alexkli/0d5823dd470691d82803c9f155bb8533 to your computer and use it in GitHub Desktop.
Save alexkli/0d5823dd470691d82803c9f155bb8533 to your computer and use it in GitHub Desktop.
Retrieving replication actions from AEM error.log

Extracting actions from author log (6.2)

With DEBUG level on

parse-replication.sh below is a script that will parse a standard AEM Author error.log or replication.log file with DEBUG level replication logging (see note below) from an AEM author instance and extract the replication actions:

parse-replication.sh error.log > replication.actions.txt

The resulting replication.actions.txt will look like this:

ACTIVATE,/content/dam/assets/myproject/file.png
DEACTIVATE,/content/dam/assets/myproject/doc.pdf

Format of each line is <action>,<path>, order is chronological.

A more detailed output including originating IP and thread (URL) can be received using

grep "action: ReplicationAction{type=.*" error.log | awk '{ print $4 " " $6 " " $7 " " $14 " " $15 }' | sed -E 's/^\[//' | sed -E 's/ReplicationAction{type=//' | sed -E "s/, path\[0\]='/ /" | sed -E "s/',$//"

Example output:

10.10.10.70 GET /bin/myservlet ACTIVATE /content/dam/asset.png
127.0.0.1 POST /bin/replicate.json DEACTIVATE /content/site/page
- job /etc/workflow/instances/server0/2017-01-30/MyWorkflow_123:/content/dam/asset.png ACTIVATE /content/dam/some/other/file.jpg

Format of each line is <ip> [<http-method> <url> | "job" <job-path>] <action> <path>, order is chronological.

Notes

This REQUIRES to have DEBUG log level turned on for com.day.cq.replication (see also "Create a replication.log" on this troubleshooting documentation). Otherwise DEACTIVATIONS aren't logged in the INFO level message "Queued job ReplicationAction{type=.... for agent xxx for batch processing...". Also, the message at the end of an activation "Replication (ACTIVATE) of successful." isn't a good candidate, because it will frequently contain "batch:" instead of path for batch replications, and only be shown in case of success after delivery, and not show intended activations. The message "Replication request queued for at " is on INFO level, but does not include the action, which is important.

With INFO level

Use this below, but it will only be complete if you have no batch replication enabled:

grep "action: ReplicationAction{type=.*" error.log | awk '{ if ($4 == "[JobHandler:") { print "- job " substr($5, 1, length($5)-1) " " $11 " " $12 } else { print $4 " " $6 " " $7 " " $14 " " $15 } }' | sed -E 's/^\[//' | sed -E 's/ReplicationAction{type=//' | sed -E "s/, path\[0\]='/ /" | sed -E "s/',$//" > replication.action.txt

Statistics

  1. How often replications happened (by type)

     awk -F, '{ print $1 }' replication.actions.txt | sort | uniq -c | sort -nr
    
     408 ACTIVATE
     52 DEACTIVATE
    
  2. How often certain paths were activated

     grep ACTIVATE replication.actions.txt | awk -F, '{ print $2 }' | sort | uniq -c | sort -nr
    
     3 /content/dam/assets/foo
     2 /content/dam/assets/bar
    

Republish

republish.sh below is a script that will take all the actions from replication.actions.txt and rerun them against an author instance and make HTTP API calls to trigger the replication for these actions and paths again.

Configure server and user/password using env variables (defaults to http://localhost:4502 and admin:admin):

export aem_server=http://myserver.com:4502
export aem_credentials=admin:secure

If you have a context path, include it in aem_server, e.g. http://myserver.com:4502/contextpath.

Then run:

republish.sh replication.actions.txt

Note: if you have multiple files, say one error.log per day (error.log.2017-01-01, error.log.2017-01-02, ...) and action files accordingly (replication.action.2017-01-01, replication.action.2017-01-02, ...) make sure to run them in the chronological order, as order is important for replication!

Parsing actions from publish log (6.2)

Use parse-replication-publish.sh below to get activations and deactivations that arrived successfully on the publish.

parse-replication-publish.sh error.log

Output is in the exact same format as above:

ACTIVATE,/content/dam/assets/myproject/file.png
DEACTIVATE,/content/dam/assets/myproject/doc.pdf

If you want to get errors, these are a bit inconsistent and not logged in one message with their path. This might help, including 3 previous lines:

grep -B 3 "Error during replication" error.log

Example output:

... *INFO*  ... POST /bin/receive ... Processing replication: ACTIVATE:/content/dam/assets/en_us/asset.jpg, size: 91910220
... *INFO*  ... POST /bin/receive ... Content size triggered creation of temp file: 91910220...
... *INFO*  ... POST /bin/receive ... Temporary file created in 665ms (91910220 bytes)
... *ERROR* ... POST /bin/receive ... Error during replication: Segment 6ebcd834-0773-4efc-a75f-a5f9f3763035 not found
#!/bin/sh
# parses replication receive events from an AEM publish error.log or
# replication.log and outputs them in the format <action>,<path>
#
# example output:
# ACTIVATE,/content/dam/assets/foo
# DEACTIVATE,/content/some/page
grep -oh "/bin/receive.*Processed replication action.*" "$1" \
| cut -c 112- \
| sed -E 's/^[[:digit:]]+ms: //' \
| sed -E 's/([^ ]*)\(ACTIVATE\), /\1(ACTIVATE)\'$'\n/g' \
| sed -E 's/^(DEACTIVATE|TEST|DELETE) of (.*)/\2(\1)/' \
| sed -E 's/^ACTIVATE of //' \
| sed -E 's/(.*)\((ACTIVATE|DEACTIVATE|TEST|DELETE)\)/\2,\1/'
# full line:
# (date/thread/etc) ... com.day.cq.replication.impl.servlets.ReplicationServlet Processed replication action in 4090ms: ACTIVATE of /content/dam/digital_assets/assethub/2016/4/27/916303d2-1024-4e55-9c77-67799f34c06d/renditions/c05131915.jpg(ACTIVATE)
# different cases for the last part (after ms:)
# a) activation
# ACTIVATE of /content/dam/digital_assets/assethub/2016/4/27/916303d2-1024-4e55-9c77-67799f34c06d/renditions/c05131915.jpg(ACTIVATE)
# b) deactivation
# DEACTIVATE of /content/dam/binary-delivery/mtx/mtx-19/mtx-1948_unity/mtx-ab75904fddf0428189f009003e
# c) delete
# DELETE of /content/dam/assets/en_us/Marketing/Collateral-Program/2014/11/4AA5-5854/enn/4AA5-5854.hires.pdf
# d) batch replication
# ACTIVATE of /content/dam/binary-delivery/mtx/mtx-19/mtx-1935_unity/mtx-unity-i9804/binaries/sp16301.exe/jcr:content/metadata(ACTIVATE), /content/dam/binary-delivery/mtx/mtx-19/mtx-1935_unity/mtx-unity-i9804/mtx-unity-i9804/jcr:content/metadata(ACTIVATE), /content/dam/binary-delivery/mtx/mtx-19/mtx-1935_unity(ACTIVATE)
#!/bin/sh
# parses activation events from an AEM error.log or replication.log with
# DEBUG level for "com.day.cq.replication" enabled and outputs them in
# the format <action>,<path>
#
# example output:
# ACTIVATE,/content/dam/assets/foo
# DEACTIVATE,/content/some/page
grep -oh "action: ReplicationAction{type=.*" "$1" | awk '{ print $2 $3}' | cut -b 24- | sed 's/,//g' | sed "s/path\[0\]='/,/g" | sed "s/'$//"
#!/bin/bash
# this script will replicate items via curl from a file with lines
# in the format <action>,<path>
#
# example file:
# ACTIVATE,/content/dam/assets/foo
# DEACTIVATE,/content/some/page
aem_server=${aem_server:-http://localhost:4502}
aem_credentials=${aem_credentials:-admin:admin}
total=`wc -l "$1" | awk '{ print $1 }'`
count=0
while IFS='' read -r line || [[ -n "$line" ]]; do
# lines are in the format <action>,<path>
action=${line%,*}
path=${line#*,}
count=$(($count + 1))
echo
echo "$action => $path"
echo "$count of $total"
curl -sfS -u $aem_credentials -X POST -F path="$path" -F cmd="$action" $aem_server/bin/replicate.json > /dev/null
done < "$1"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment