Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Pack and unpack COBOL's COMP-3 numbers.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Pack and unpack COBOL's COMP-3 numbers.
Cobol stores most numbers as strings. There are times that they are stored in a
packed format ("Computational numbers"). Comp numbers are not stored like
traditional numbers (16-bit, 32-bit, 64-bit, etc), but in a bit length that is
four times the number of digits in the stored value plus four bits. For a
four-digit number (eg PIC 9(4) or PIC 9(2)V9(2)) the length of the stored value
will be 12 bits.
Each digit is stored in four bits. The number 4 becomes 0x4 (0100), 5 becomes
0x5 (0101), 6 becomes 0x6 (0110), etc. Once the full number is encoded (456)
an additional digit is placed on the end to denote positve or negative. 0xD is
negative, 0xC is positive, and while 0xF is unsigned.
Also note that decimals are not stored in numbers, computational or otherwise.
Decimals, and significant digits, are determined by the PICture clauses. The
number is loaded into the variable in COBOL and the PICture clause dictates
where the decimal should be. For example, given PIC 9(2)V9(2), the number is
stored on disk as a four digit integer, but when loaded it is represented in
memory as a two digit number with two additional digits of precision after a
decimal point.
For further information see the following pages:
http://3480-3590-data-conversion.com/article-cobol-comp.html
http://3480-3590-data-conversion.com/article-packed-fields.html
unpack_number() orignially from:
https://mail.python.org/pipermail/python-list/2000-April/050953.html
"""
from array import array
from struct import pack
def unpack_number(p):
""" Unpack a COMP-3 number. """
a = array('B', p)
v = float(0)
# For all but last digit (half byte)
for i in a[:-1]:
v = (v * 100) + ( ( (i & 0xf0) >> 4) * 10) + (i & 0xf)
# Last digit
i = a[-1]
v = (v * 10) + ((i & 0xf0) >> 4)
# Negative/Positve check.
if (i & 0xf) == 0xd:
v = -v
# Decimal points are determined by a COBOL program's PICtrue clauses, not
# the data on disk.
return int(v)
def pack_number(n):
""" Pack a COMP-3 number. Format: PIC 9(9). """
# Cobol numbers are stored without decimal info. Remove the decimal before
# calling pack_number()
n = int(n)
# Is the number negative? Remember for later.
negative = False
if n < 0:
negative = True
n *= -1
# Treat the number as a string. Makes it easier to loop over.
n_str = str(n)
b = int(n_str[0])
# For each digit, shift it onto the result.
for c in n_str[1:]:
b = (b << 4) | int(c)
# Make the number negative if needed.
if negative:
b = (b << 4) | 0xd
else:
b = (b << 4) | 0xf
# Pack the number as a long long and chop off the unused bits at the
# beginning. This will need to be changed for varying PICture clauses.
b_packed = pack('>q', b)
if len(b_packed) > 5:
b_packed = b_packed[-5:]
return b_packed
if __name__ == '__main__':
value = 123456
packed = pack_number(value)
unpacked = unpack_number(packed)
hex_packed = ''.join( [ "%02X" % ord( chr(x) ) for x in packed ] )
print('Value: {}\nPacked: 0x{}\nUnpacked: {}'.format(value, hex_packed, unpacked))
@Rocckk
Copy link

Rocckk commented Apr 16, 2021

Hello!
I have a problem of writing packed decimals to a file which will be read by old IBM Mainframe (obviously using Cobol).
Your code snippet seems to be what could potentially help me to solve this problem.
Could you please explain some things which happen in the pack_number function?

For example, why you choose '>q' in struct call? How do you know it's supposed to be big-endian long long?
And what exactly should be changed in different PICture clauses?

Thanks!

@zorchenhimer
Copy link
Author

zorchenhimer commented Apr 16, 2021

Hey there!

It's been a while since I wrote this, but from what I can remember everything was figured out with trial and error. For context, I was using Microfocus COBOL, so IBM COBOL might format stuff a little differently (ie, endianness). I honestly don't remember why I chose those options in the struct call, but that probably matched my expected output (I'd output something from COBOL and check the bytes, iirc).

As for the different PICture clauses, this code was written for COMP-3 numbers (packed decimal, or "BCD"). The binary format is different for COMP-1 and COMP-2 (see here: http://3480-3590-data-conversion.com/article-cobol-comp.html )

@Rocckk
Copy link

Rocckk commented Apr 19, 2021

Hi again!
Thanks for the quick reply, I did not expect it!

For my case, I have COMP-3, but those PICture clauses are different. There are cases when it's PIC 9(4) and PIC9(9). So it looks pretty much the same what is going on in your case. The thing is that I as the input I have integers which may have up to 9 digits it them, and it seems like according to these PICture clauses they will have to be packed in 4 bytes, so I still try to figure out how to do that in case we have an integer like 123456789 as input. If there 2 integers are packed in 1 byte, I wonder how to fit 9 digits there... Maybe you have some ideas? I would be thankful :)

In addition, I will surely ask for some example of a file and try to write my code and compare with the needed result.

@VishweshS
Copy link

VishweshS commented Jun 14, 2021

Hello,

I tried your function unpack_number( ) for input 001 (in comp-3 equivalent), the output comes out to be 1.

Basically, the function ignores leading zeros - Any suggestions to account for leading zeros?

Thanks!

@m-schmitt
Copy link

m-schmitt commented Jul 6, 2022

For my case, I have COMP-3, but those PICture clauses are different. There are cases when it's PIC 9(4) and PIC9(9). So it looks pretty much the same what is going on in your case. The thing is that I as the input I have integers which may have up to 9 digits it them, and it seems like according to these PICture clauses they will have to be packed in 4 bytes, so I still try to figure out how to do that in case we have an integer like 123456789 as input. If there 2 integers are packed in 1 byte, I wonder how to fit 9 digits there... Maybe you have some ideas? I would be thankful :)

Packed decimal on IBM mainframes (z/Architecture and earlier) is always stored as an integral number of bytes, i.e. the final value will always be a multiple of 8 bits. The rule for determining field size from a PIC clause is:

  1. Add up the number of digits in the PIC clause. PIC 9(4) is 4. PIC 9(9) is 9. PIC 9(7)V99 is also 9.
  2. If the total is even, then add one. So, PIC 9(4) becomes 5, but PIC 9(9) stays as 9.
  3. Add one for the sign. Now we have 6 and 10.
  4. Divide by 2. This gives the final answer: PIC 9(4) is stored in 3 bytes, PIC 9(9) is stored in 5.

A number that was even in step 2 would be stored with a leading zero. +1234 is stored as hex 01 23 4C, in 3 bytes.
A number that was odd in step 2 would be stored without a leading zero (unless the value itself is less than the max digits, of course). +123456789 is stored as 12 34 56 78 9C. A field defined as PIC S9(9) containing a value of -9876 would be stored as 00 00 09 87 6D.

To get from a packed field in N bytes to the number of digits it can hold, the rule is:

  1. Multiple number of bytes by 2.
  2. Subtract one for the sign.

Thus 5 bytes times 2 is 10, subtract one for sign, gives a max of 9 digits.

@UdoWeike
Copy link

UdoWeike commented Jul 7, 2022

It work's like charm and saved me a lot of time.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment