-
-
Save dansimau/674203 to your computer and use it in GitHub Desktop.
#!/bin/bash | |
# | |
# Converts LDIF data to CSV. | |
# Doesn't handle comments very well. Use -LLL with ldapsearch to remove them. | |
# | |
# 2010-03-07 | |
# dsimmons@squiz.co.uk | |
# | |
# Show usage if we don't have the right params | |
if [ "$1" == "" ]; then | |
echo "" | |
echo "Usage: cat ldif.txt | $0 <attributes> [...]" | |
echo "Where <attributes> contains a list of space-separated attributes to include in the CSV. LDIF data is read from stdin." | |
echo "" | |
exit 99 | |
fi | |
ATTRS="$*" | |
c=0 | |
while read line; do | |
# Skip LDIF comments | |
[ "${line:0:1}" == "#" ] && continue; | |
# If this line is blank then it's the end of this record, and the beginning | |
# of a new one. | |
# | |
if [ "$line" == "" ]; then | |
output="" | |
# Output the CSV record | |
for i in $ATTRS; do | |
eval data=\$RECORD_${c}_${i} | |
output=${output}\"${data}\", | |
unset RECORD_${c}_${i} | |
done | |
# Remove trailing ',' and echo the output | |
output=${output%,} | |
echo $output | |
# Increase the counter | |
c=$(($c+1)) | |
fi | |
# Separate attribute name/value at the semicolon (LDIF format) | |
attr=${line%%:*} | |
value=${line#*: } | |
# Save all the attributes in variables for now (ie. buffer), because the data | |
# isn't necessarily in a set order. | |
# | |
for i in $ATTRS; do | |
if [ "$attr" == "$i" ]; then | |
eval RECORD_${c}_${attr}=\"$value\" | |
fi | |
done | |
done |
I've changed your script a little:
since CSV output is echoed line by line, there is no need to use counter "c" at all (it is useless), and therefore pseudo-array RECORD can be one-dimensional:
$ sed -i 's/RECORD_${c}/RECORD/g' ldif-to-csv.sh
I also needed possibility to parse multi-valued attributes, so I'm using "additional" delimiter ":". So, the latest part of modified script looks like this:
for i in $ATTRS; do
if [ "$attr" == "$i" ]; then
eval tmp=$(echo $(echo \$RECORD_${attr}))
if [[ ! -z "$tmp" ]]; then
eval RECORD_${attr}=$tmp':'\"$value\"
else
eval RECORD_${attr}=\"$value\"
fi
fi
done
The script still does not properly handle double quotes, if present in attribute value. Embedded double quote characters have to be represented by a pair of consecutive double quotes, or by prefixing an escape character such as a backslash.
Another possible issue is representation of non-ASCII characters. LDIF format uses base64 encoding for representation of values with UTF-8 characters.
Added a check for base64 encoded fields and decode them:
# Separate attribute name/value at the semicolon (LDIF format)
# Check for double colon and decode the base 64 results
if [[ $line == *::* ]]
then
attr=${line%%:*}
value=`echo ${line#*: } | base64 --decode`
else
attr=${line%%:*}
value=${line#*: }
fi
Thanks this helped me move away from a simple perl script using Net::LDAP (which wasn't working anyway). Able to use this + ldapsearch instead, much better.
Thanks!
With the freilich base64 "add-on" it was perfect for the job.
First thing - great script - it is what I needed, so don't shoot me for the following issues! I'm thankful for the work done on it.
I think Saric's addition is good - I needed this for multiple mailalternateaddress fields which I imagine many LDIFs have), although if you have apostrophes (') in an email address (which is allowed and I came across..!) then you're going to get thrown an "eval: Unexpected end of file" error
If you change the line: eval RECORD_${attr}=$tmp':'"$value"
to eval RECORD_${attr}="$tmp':'$value"
Seems to handle it.
However this extra logic (using the tmp variable test) adds a long time to processing - fine on small LDIFs, but not something I could afford. (approx double it seems)
I made a slightly horrible hack and instead of making an IF statement, just did:
eval RECORD_${c}_${attr}+=" $value"
or, if you're removing the "c" counter (which saves a little code doesn't seem to save any time in processing):
eval RECORD_${attr}+=" $value"
The += and " " (space) means that you end up listing multiple valued fields space separated, BUT you always have a leading space even on single items - the expense of losing the "if" test...
In my use this is actually acceptable, I'm comparing 2 LDIFs, as long as there is consistently a leading space, then the diff is always correct. When I come to print the results a quick sed to replace the few instances of "open quote followed by space" is then a trivial and non time consuming task as there is often only a handful of changes - as I said, kinda horrible, but putting it out there in case anyone is interested.
I thought this was a great script - it still is, for less than a few k records, but it doesn't scale too well.
On an LDIF with 8500 entries (and the changes for multiple values I describe above) it takes 40 seconds for 7 attributes. Which I can accept.
But I have a much longer LDIF, 400,000 entries - takes about 20 minutes!
I assumed perl would be much quicker for such a task - but sadly I can't find a perl script that is "standalone" - This looks good http://www.stallies.net/index.php/convert-an-ldif-to-csv/ but it requires addition LDAP perl modules. Sadly I am working in a controlled environment, and cannot add anything to the filesystem.
Anyone got any ideas..! I am still looking, but thought I'd ask here too.
Thanks for the script!
First of all my apologies, i am new for shell script. Second, This is something i was looking for but i seems to have some issue running this script.
sh ldif-to-csv.sh file.txt
when i run this in out put i get just filename ".txt"
may be i am missing some param or ?
Thanks for the script.
Thanks for the bash script. It worked really well!
However, like @JackCaesar, I needed to serialize ~500k records corresponded with ~100 keys which took quite a while. Guess we've hit the ceiling in bash.
And so I reimplemented the serialization in C (https://github.com/szechuen/ldif_csv) and that cuts the running time to ~21s for me. Hope it helps someone ;)
I did a bit of modding this to get what I needed out of it. As soon as I can figure out how to fork I will post my mods
First I needed a field to tell if an account was disabled or not and found out it was in the variable userAccountControl so I broke it up into all its Identifiers described here https://msdn.microsoft.com/en-us/library/windows/desktop/ms680832(v=vs.85).aspx
this might be useful for a lot of people. Second I broke out where it echos into a function and instead of writing on an empty line I write on the next occurrence of "dn" as this is what constituted a break according to ldif man page then after the loop I write out the last record.
I ran into a few base64 encoded values on operatingSystem value thanks to freilich his idea fixed this.
A slight mod on freilich's code as I found some ipv6 addresses where causing errors
if [[ $line == *::* ]] && [[ $line != *:*::* ]];then
attr=${line%%:*}
value=`echo ${line#*: } | base64 --decode || echo [$line] >&2`
else
attr=${line%%:*}
value=${line#*: }
fi
Here is what I did for UAC
after the line
ATTRS="$*"
I added
for i in $ATTRS; do
if [ "${i:0:7}" == "ADS_UF_" ] || [ "$i" == "Enabled" ];then
UAC_NEEDED="YES"
fi
done
then after the writing section I added
if [ "$UAC_NEEDED" == "YES" ];then
if [ "$attr" == "userAccountControl" ];then
#follow layout here https://msdn.microsoft.com/en-us/library/windows/desktop/ms680832(v=vs.85).aspx
binary=`echo "obase=2;${value}"|bc|awk '{printf("%032d",$1)}'`
for i in $ATTRS; do
bit=0
reverse=0
case "$i" in
"ADS_UF_SCRIPT") bit=31;;
"ADS_UF_ACCOUNTDISABLE") bit=30;;
# Inverted value of ADS_UF_ACCOUNT_DISABLE
"Enabled") bit=30;reverse=1;;
"ADS_UF_HOMEDIR_REQUIRED") bit=28;;
"ADS_UF_LOCKOUT" ) bit=27;;
"ADS_UF_PASSWD_NOTREQD" ) bit=26;;
"ADS_UF_PASSWD_CANT_CHANGE" ) bit=25;;
"ADS_UF_ENCRYPTED_TEXT_PASSWORD_ALLOWED") bit=24;;
"ADS_UF_TEMP_DUPLICATE_ACCOUNT") bit=23;;
"ADS_UF_NORMAL_ACCOUNT") bit=22;;
"ADS_UF_INTERDOMAIN_TRUST_ACCOUNT") bit=20;;
"ADS_UF_WORKSTATION_TRUST_ACCOUNT") bit=19;;
"ADS_UF_SERVER_TRUST_ACCOUNT") bit=18;;
"ADS_UF_DONT_EXPIRE_PASSWD") bit=15;;
"ADS_UF_MNS_LOGON_ACCOUNT") bit=14;;
"ADS_UF_SMARTCARD_REQUIRED") bit=13;;
"ADS_UF_TRUSTED_FOR_DELEGATION") bit=12;;
"ADS_UF_NOT_DELEGATED") bit=11;;
"ADS_UF_USE_DES_KEY_ONLY") bit=10;;
"ADS_UF_DONT_REQUIRE_PREAUTH") bit=9;;
"ADS_UF_PASSWORD_EXPIRED") bit=8;;
"ADS_UF_TRUSTED_TO_AUTHENTICATE_FOR_DELEGATION") bit=7;;
esac
if [ "$bit" != "0" ];then
if [ "$reverse" == "1" ];then
if [ "${binary:$bit:1}" == "1" ];then
eval RECORD_${c}_${i}=\"False\"
else
eval RECORD_${c}_${i}=\"True\"
fi
else
if [ "${binary:$bit:1}" == "1" ];then
eval RECORD_${c}_${i}=\"True\"
else
eval RECORD_${c}_${i}=\"False\"
fi
fi
fi
done
fi
fi
i created a fork to create a dokuwiki-table: https://gist.github.com/Finkregh/51feb486cfc257558532
This is great, thank you.
Great script! Worked perfectly and was exactly what I needed to convert my ldif export to csv.
I need the opposite script, please help me!
My requirement is to convert csv data to ldif format/file.
awesome, just what I needed right now. Thanks!