Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Shell script that reads LDIF data from STDIN and outputs as CSV.
#!/bin/bash
#
# Converts LDIF data to CSV.
# Doesn't handle comments very well. Use -LLL with ldapsearch to remove them.
#
# 2010-03-07
# dsimmons@squiz.co.uk
#
# Show usage if we don't have the right params
if [ "$1" == "" ]; then
echo ""
echo "Usage: cat ldif.txt | $0 <attributes> [...]"
echo "Where <attributes> contains a list of space-separated attributes to include in the CSV. LDIF data is read from stdin."
echo ""
exit 99
fi
ATTRS="$*"
c=0
while read line; do
# Skip LDIF comments
[ "${line:0:1}" == "#" ] && continue;
# If this line is blank then it's the end of this record, and the beginning
# of a new one.
#
if [ "$line" == "" ]; then
output=""
# Output the CSV record
for i in $ATTRS; do
eval data=\$RECORD_${c}_${i}
output=${output}\"${data}\",
unset RECORD_${c}_${i}
done
# Remove trailing ',' and echo the output
output=${output%,}
echo $output
# Increase the counter
c=$(($c+1))
fi
# Separate attribute name/value at the semicolon (LDIF format)
attr=${line%%:*}
value=${line#*: }
# Save all the attributes in variables for now (ie. buffer), because the data
# isn't necessarily in a set order.
#
for i in $ATTRS; do
if [ "$attr" == "$i" ]; then
eval RECORD_${c}_${attr}=\"$value\"
fi
done
done

awesome, just what I needed right now. Thanks!

saric commented Feb 6, 2014

I've changed your script a little:

since CSV output is echoed line by line, there is no need to use counter "c" at all (it is useless), and therefore pseudo-array RECORD can be one-dimensional:

$ sed -i 's/RECORD_${c}/RECORD/g' ldif-to-csv.sh

I also needed possibility to parse multi-valued attributes, so I'm using "additional" delimiter ":". So, the latest part of modified script looks like this:

for i in $ATTRS; do
    if [ "$attr" == "$i" ]; then
        eval tmp=$(echo $(echo \$RECORD_${attr}))
        if [[ ! -z "$tmp" ]]; then
            eval RECORD_${attr}=$tmp':'\"$value\"
        else
            eval RECORD_${attr}=\"$value\"
        fi
    fi
done

The script still does not properly handle double quotes, if present in attribute value. Embedded double quote characters have to be represented by a pair of consecutive double quotes, or by prefixing an escape character such as a backslash.

Another possible issue is representation of non-ASCII characters. LDIF format uses base64 encoding for representation of values with UTF-8 characters.

freilich commented Mar 5, 2014

Added a check for base64 encoded fields and decode them:

# Separate attribute name/value at the semicolon (LDIF format)
# Check for double colon and decode the base 64 results
if [[ $line == *::* ]]
then
    attr=${line%%:*}
    value=`echo ${line#*: } | base64 --decode`
else
    attr=${line%%:*}
    value=${line#*: }
fi  

Thanks this helped me move away from a simple perl script using Net::LDAP (which wasn't working anyway). Able to use this + ldapsearch instead, much better.

joaogit commented May 15, 2014

Thanks!
With the freilich base64 "add-on" it was perfect for the job.

First thing - great script - it is what I needed, so don't shoot me for the following issues! I'm thankful for the work done on it.

I think Saric's addition is good - I needed this for multiple mailalternateaddress fields which I imagine many LDIFs have), although if you have apostrophes (') in an email address (which is allowed and I came across..!) then you're going to get thrown an "eval: Unexpected end of file" error

If you change the line: eval RECORD_${attr}=$tmp':'"$value"

to eval RECORD_${attr}="$tmp':'$value"

Seems to handle it.

However this extra logic (using the tmp variable test) adds a long time to processing - fine on small LDIFs, but not something I could afford. (approx double it seems)

I made a slightly horrible hack and instead of making an IF statement, just did:

eval RECORD_${c}_${attr}+=" $value"

or, if you're removing the "c" counter (which saves a little code doesn't seem to save any time in processing):

eval RECORD_${attr}+=" $value"

The += and " " (space) means that you end up listing multiple valued fields space separated, BUT you always have a leading space even on single items - the expense of losing the "if" test...

In my use this is actually acceptable, I'm comparing 2 LDIFs, as long as there is consistently a leading space, then the diff is always correct. When I come to print the results a quick sed to replace the few instances of "open quote followed by space" is then a trivial and non time consuming task as there is often only a handful of changes - as I said, kinda horrible, but putting it out there in case anyone is interested.

I thought this was a great script - it still is, for less than a few k records, but it doesn't scale too well.

On an LDIF with 8500 entries (and the changes for multiple values I describe above) it takes 40 seconds for 7 attributes. Which I can accept.

But I have a much longer LDIF, 400,000 entries - takes about 20 minutes!

I assumed perl would be much quicker for such a task - but sadly I can't find a perl script that is "standalone" - This looks good http://www.stallies.net/index.php/convert-an-ldif-to-csv/ but it requires addition LDAP perl modules. Sadly I am working in a controlled environment, and cannot add anything to the filesystem.

Anyone got any ideas..! I am still looking, but thought I'd ask here too.

Thanks for the script!

First of all my apologies, i am new for shell script. Second, This is something i was looking for but i seems to have some issue running this script.
sh ldif-to-csv.sh file.txt
when i run this in out put i get just filename ".txt"
may be i am missing some param or ?

Thanks for the script.

Thanks for the bash script. It worked really well!

However, like @jackcaesar, I needed to serialize ~500k records corresponded with ~100 keys which took quite a while. Guess we've hit the ceiling in bash.

And so I reimplemented the serialization in C (https://github.com/szechuen/ldif_csv) and that cuts the running time to ~21s for me. Hope it helps someone ;)

I did a bit of modding this to get what I needed out of it. As soon as I can figure out how to fork I will post my mods
First I needed a field to tell if an account was disabled or not and found out it was in the variable userAccountControl so I broke it up into all its Identifiers described here https://msdn.microsoft.com/en-us/library/windows/desktop/ms680832(v=vs.85).aspx
this might be useful for a lot of people. Second I broke out where it echos into a function and instead of writing on an empty line I write on the next occurrence of "dn" as this is what constituted a break according to ldif man page then after the loop I write out the last record.
I ran into a few base64 encoded values on operatingSystem value thanks to freilich his idea fixed this.

A slight mod on freilich's code as I found some ipv6 addresses where causing errors

    if [[ $line == *::* ]] && [[ $line != *:*::* ]];then
        attr=${line%%:*}
        value=`echo ${line#*: } | base64 --decode || echo [$line] >&2`
    else
        attr=${line%%:*}
        value=${line#*: }
    fi 

Here is what I did for UAC

after the line

ATTRS="$*"

I added

for i in $ATTRS; do
if [ "${i:0:7}" == "ADS_UF_" ] || [ "$i" == "Enabled" ];then
UAC_NEEDED="YES"
fi
done

then after the writing section I added

    if [ "$UAC_NEEDED" == "YES" ];then
            if [ "$attr" == "userAccountControl" ];then
                    #follow layout here https://msdn.microsoft.com/en-us/library/windows/desktop/ms680832(v=vs.85).aspx
                    binary=`echo "obase=2;${value}"|bc|awk '{printf("%032d",$1)}'`
                    for i in $ATTRS; do
                            bit=0
                            reverse=0
                            case "$i" in
                                    "ADS_UF_SCRIPT")                                        bit=31;;
                                    "ADS_UF_ACCOUNTDISABLE")                                bit=30;;
                                    # Inverted value of ADS_UF_ACCOUNT_DISABLE
                                    "Enabled")                                              bit=30;reverse=1;;
                                    "ADS_UF_HOMEDIR_REQUIRED")                              bit=28;;
                                    "ADS_UF_LOCKOUT" )                                      bit=27;;
                                    "ADS_UF_PASSWD_NOTREQD" )                               bit=26;;
                                    "ADS_UF_PASSWD_CANT_CHANGE" )                           bit=25;;
                                    "ADS_UF_ENCRYPTED_TEXT_PASSWORD_ALLOWED")               bit=24;;
                                    "ADS_UF_TEMP_DUPLICATE_ACCOUNT")                        bit=23;;
                                    "ADS_UF_NORMAL_ACCOUNT")                                bit=22;;
                                    "ADS_UF_INTERDOMAIN_TRUST_ACCOUNT")                     bit=20;;
                                    "ADS_UF_WORKSTATION_TRUST_ACCOUNT")                     bit=19;;
                                    "ADS_UF_SERVER_TRUST_ACCOUNT")                          bit=18;;
                                    "ADS_UF_DONT_EXPIRE_PASSWD")                            bit=15;;
                                    "ADS_UF_MNS_LOGON_ACCOUNT")                             bit=14;;
                                    "ADS_UF_SMARTCARD_REQUIRED")                            bit=13;;
                                    "ADS_UF_TRUSTED_FOR_DELEGATION")                        bit=12;;
                                    "ADS_UF_NOT_DELEGATED")                                 bit=11;;
                                    "ADS_UF_USE_DES_KEY_ONLY")                              bit=10;;
                                    "ADS_UF_DONT_REQUIRE_PREAUTH")                          bit=9;;
                                    "ADS_UF_PASSWORD_EXPIRED")                              bit=8;;
                                    "ADS_UF_TRUSTED_TO_AUTHENTICATE_FOR_DELEGATION")        bit=7;;
                            esac
                            if [ "$bit" != "0" ];then
                                    if [ "$reverse" == "1" ];then
                                            if [ "${binary:$bit:1}" == "1" ];then
                                                    eval RECORD_${c}_${i}=\"False\"
                                            else
                                                    eval RECORD_${c}_${i}=\"True\"
                                            fi
                                    else
                                            if [ "${binary:$bit:1}" == "1" ];then
                                                    eval RECORD_${c}_${i}=\"True\"
                                            else    
                                                    eval RECORD_${c}_${i}=\"False\"
                                            fi
                                    fi
                            fi
                    done
            fi
    fi

i created a fork to create a dokuwiki-table: https://gist.github.com/Finkregh/51feb486cfc257558532

This is great, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment