Skip to content

Instantly share code, notes, and snippets.

@nlitsme
Last active December 20, 2022 23:16
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save nlitsme/41b05201e1d1feb586af to your computer and use it in GitHub Desktop.
Save nlitsme/41b05201e1d1feb586af to your computer and use it in GitHub Desktop.
perl script for decoding raw hex sms PDU data
#!perl -w
# by Willem Hengeveld <itsme@xs4all.nl>
# license: http://en.wikipedia.org/wiki/Beerware
use strict;
$|=1;
# this script decodes raw smsses, as used with the
# AT+CMT, or AT+CMGR, or AT+CMGS commands
# either parses hex strings on the commandline, or from stdin.
# prefix a string with '<' or '>' to specify if the message was incoming or outgoing
# 27.005 - Use of Data Terminal Equipment - Data Circuit terminating Equipment (DTE-DCE) interface for Short Message Service (SMS) and Cell Broadcast Service (CBS)
# describes the at-commands involved in sending/receiving smsses
# 23.040 - Technical realization of Short Message Service (SMS)
# describes the encoding of smsses
# 23.038 - Technical realization of Short Message Service (SMS)
# describes the data coding schemes
#
# todo: add support for decoding cell broadcast message pdu's -> 23.041
#
# ms = mobilestation
# SC = servicecenter
my @pdutypes= (
# ms->SC | SC->ms
[ 'deliver-report', 'deliver' ], # 0
[ 'submit', 'submit-report' ], # 1
[ 'command', 'status-report' ], # 2
[ 'unknown-out', 'unknown-in' ], # 3
);
# RP = Reply-Path
# H = UDHI - userdata header indicator
# SRI = status report indicator
# SRR = status report requested
# MMS = moremessages
# RD = reject dups
# VPF = validity period format
# MR = message reference
# OA = origination address
# DA = destination address
# VP = validity period
# PID = protocol id
# DCS = data coding scheme
#SCTS = sc timestamp
# DT = discharge time
# UDL = userdata lenght
# UD = user data
# FCS = failure cause -- encoded as 'i'
# PI = parameter indicator : bitmask
# CT = command type
# MN = message number
# CDL = commandata length
# CD = command data
my %typeinfo= ( # 7 6 5 4 3 2 1 0 |
'deliver-report'=> ",FCS,PI,PID,DCS,UDL,UD,", # - H - - - MTI | FCS, PI, PID, DCS, UDL, UD
'deliver'=> ",OA,PID,DCS,SCTS,UDL,UD,", # RP H SRI - MMS MTI | OA, PID, DCS, SCTS, UDL, UD
'submit'=> ",MR,DA,PID,DCS,VP,UDL,UD,", # RP H SRR VPF RD MTI | MR, DA, PID, DCS, VP, UDL, UD
'submit-report'=> ",FCS,PI,SCTS,PID,DCS,UDL,UD,", # - H - - - MTI | FCS, PI, SCTS, PID, DCS, UDL, UD
'status-report'=> ",MR,RA,SCTS,DT,ST,PI,PID,DCS,UDL,UD,",# - H SRQ - MMS MTI | MR, RA, SCTS, DT, ST, PI, PID, DCS, UDL, UD
'command'=> ",MR,PID,CT,MN,DA,CDL,CD,", # - H SRR - - MTI | MR, PID, CT, MN, DA, CDL, CD
);
sub hasfield {
my ($pdutype, $field)= @_;
return $typeinfo{$pdutype} =~ /,$field,/i;
}
my @numtypes= ( 'unknown', 'international', 'national', 'network', 'subscriber', 'alpha', 'abbrev', 'reserved' );
my @plantypes= ( 'Unknown', 'ISDN_e164', 'undef2', 'Data_x121', 'Telex', 'SCspec5', 'SCspec6', 'undef7', 'National', 'Private', 'ERMES', 'undefb', 'undefc', 'undefd', 'undefe', 'Reserved');
# TP-PID protocol identifier
# bit7,6 == 00, bit5=0 : sme-to-sme protocol
# bit7,6 == 00, bit5=1 : telematic interworking
# 00000 implicit - device type is specific to this SC, or can be concluded on the basis of the address
# 00001 telex (or teletex reduced to telex format)
# 00010 group 3 telefax
# 00011 group 4 telefax
# 00100 voice telephone (i.e. conversion to speech)
# 00101 ERMES (European Radio Messaging System)
# 00110 National Paging system (known to the SC)
# 00111 Videotex (T.100 [20] /T.101 [21])
# 01000 teletex, carrier unspecified
# 01001 teletex, in PSPDN
# 01010 teletex, in CSPDN
# 01011 teletex, in analog PSTN
# 01100 teletex, in digital ISDN
# 01101 UCI (Universal Computer Interface, ETSI DE/PS 3 01-3)
# 01110..01111 (reserved, 2 combinations)
# 10000 a message handling facility (known to the SC)
# 10001 any public X.400-based message handling system
# 10010 Internet Electronic Mail
# 10011..10111 (reserved, 5 combinations)
# 11000..11110 values specific to each SC, usage based on mutual agreement between the SME and the SC (7 combinations available for each SC)
# 11111 A GSM/UMTS mobile station. The SC converts the SM from the received TP-DCS to any data coding scheme supported by the MS ( default )
# bit7,6=01
# 000000 Short Message Type 0
# 000001 Replace Short Message Type 1
# 000010 Replace Short Message Type 2
# 000011 Replace Short Message Type 3
# 000100 Replace Short Message Type 4
# 000101 Replace Short Message Type 5
# 000110 Replace Short Message Type 6
# 000111 Replace Short Message Type 7
# 001000..011101 Reserved
# 011110 Enhanced Message Service (Obsolete)
# 011111 Return Call Message
# 100000..111011 Reserved
# 111100 ANSI-136 R-DATA
# 111101 ME Data download
# 111110 ME De-personalization Short Message
# 111111 (U)SIM Data download
# these sections are in gsm standard 27.005
# section 4.3: AT+CMGS=<length><CR>PDUDATA<CTRLZ>
# length excluding smsc address
# section 4.2: AT+CMGR=<index><CR>
# -> +CMGR: <stat>,[alpha],<length><CRLF>pdu
# +CMT: [<alpha>],<length><CRLF>pdu
#
# http://www.computer.org/portal/site/computer/menuitem.5d61c1d591162e4b0ef1bd108bcd45f3/index.jsp?&pName=computer_level1_article&TheCat=1055&path=computer/homepage/Dec07&file=howthings.xml&xsl=article.xsl&
# type
# 91 1 - international
# a1 2 - national
# d0 5 - 7bit ascii
# 01 0
# 81 0
# struct address {
# char nrofdigits;
# struct {
# int onebit:1
# int numbertype:3;
# int numberingplan:4;
# } type;
# char value[ceil(nrofdigits/2)]
# }
# numbertype
# 0 Unknown
# 1 International number
# 2 National number
# 3 Network specific number
# 4 Subscriber number
# 5 Alphanumeric, (coded according to 3GPP TS 23.038 [9] GSM 7-bit default alphabet)
# 6 Abbreviated number
# 7 Reserved for extension
#
# numberingplan
# 0 Unknown
# 1 ISDN/telephone numbering plan (E.164 [17]/E.163[18])
# 2
# 3 Data numbering plan (X.121)
# 4 Telex numbering plan
# 5 Service Centre Specific plan 1)
# 6 Service Centre Specific plan 1)
# 7
# 8 National numbering plan
# 9 Private numbering plan
# a ERMES numbering plan (ETSI DE/PS 3 01-3)
# b
# c
# d
# e
# f Reserved for extension
# sms-submit:
# bb
# 00 MTI,RD,VPF,SRR,UDHI,RP
# 01 MR
# 02 TP-DA
# ..
# TP-PID
# TP-DCS
# TP-VP
# ..
# TP-UDL
# TP-UD
# SMS-DELIVER:
# b b b b bb
#
# 00 ,SRI,UDHI,RP,MMS,MTI
# 01 TP-DA
# These section numbers refer to gsm standard 23.040
# 9.2.3.1 - TP-Message-Type-Indicator (TP-MTI)
# bits 1,0 of byte0 of all pdu's
# xmit | recv
# 0 DELIVER-REPORT | DELIVER
# 1 SUBMIT | SUBMIT-REPORT
# 2 COMMAND | STATUS-REPORT
# 3 - | -
# 9.2.3.2 - TP-More-Messages-to-Send (TP-MMS)
# bit 2 of byte0 of SMS-DELIVER and SMS-STATUS-REPORT
# 0 More messages are waiting for the MS in this SC
# 1 No more messages are waiting for the MS in this SC
# 9.2.3.25 TP-Reject-Duplicates (TP-RD)
# bit 2 of byte0 of SMS-SUBMIT
# 0 Instruct the SC to accept an SMS-SUBMIT for an SM still held in the SC which has the same TP-MR and the same TP-DA as a previously submitted SM from the same OA.
# 1 Instruct the SC to reject an SMS-SUBMIT for an SM still held in the SC which has the same TP-MR and the same TP-DA as the previously submitted SM from the same OA.
# 9.2.3.3 - TP-Validity-Period-Format (TP-VPF)
# bit 4,3 of byte0 of SMS-SUBMIT
# 0 TP-VP field not present
# 1 TP-VP field present - relative format
# 2 TP-VP field present - enhanced format
# 3 TP-VP field present - absolute format
# 9.2.3.4 - TP-Status-Report-Indication (TP-SRI)
# bit 5 of byte0 of SMS-DELIVER
# 0 A status report shall not be returned to the SME
# 1 A status report shall be returned to the SME
# 9.2.3.26 - TP-Status-Report-Qualifier (TP-SRQ)
# bit 5 of byte0 of SMS-STATUS-REPORT
# 0 The SMS-STATUS-REPORT is the result of a SMS-SUBMIT.
# 1 The SMS-STATUS-REPORT is the result of an SMS-COMMAND
# 9.2.3.5 - TP-Status-Report-Request (TP-SRR)
# bit 5 of byte0 of SMS-SUBMIT, SMS-COMMAND
# 0 A status report is not requested
# 1 A status report is requested
# 9.2.3.23 - TP-User-Data-Header-Indicator (TP-UDHI)
# bit 6 of byte0 of all pdu's
# 0 The TP-UD field contains only the short message
# 1 The beginning of the TP-UD field contains a Header in addition to the short message.
# 9.2.3.17 - TP-Reply-Path (TP-RP)
# bit 7 of byte0 of SMS-DELIVER and SMS--SUBMIT
# 0 TP-Reply-Path parameter is not set in this SMS-SUBMIT/DELIVER
# 1 TP-Reply-Path parameter is set in this SMS-SUBMIT/DELIVER
if (@ARGV) {
my $x= shift;
if ($x =~ /([<>])?(\w+)/) {
decodesms($1 ? ($1 eq "<" ? 1 : 0) : undef, $2);
}
}
else {
my $n=0;
while (<>) {
if (/([<>])?(\w+)/) {
decodesms($1 ? ($1 eq "<" ? 1 : 0) : undef, $2);
$n++;
}
}
if ($n==0) {
while (<>) {
if (/^([<>])(\w{8,})/ || /"(<>)(\w+)"/) {
print "decoding $_\n";
decodesms($1 ? ($1 eq "<" ? 1 : 0) : undef, $2);
}
else {
print $_;
}
}
}
}
my %msgtypename=(
0=> {0=>'DELIVER-REPORT', 1=>'DELIVER'},
1=> {0=>'SUBMIT', 1=>'SUBMIT-REPORT'},
2=> {0=>'COMMAND', 1=>'STATUS-REPORT'},
3=> {0=>'-', 1=>'-'},
);
sub decodesms {
my ($dir, $smshex)=@_;
my $data= pack("H*", $smshex);
my $ofs= 0;
my $smsclen= unpack("C", substr($data, $ofs++, 1));
my $smscdata= substr($data, $ofs, $smsclen);
$ofs+=$smsclen;
printf("smsc: %s\n", decode_address($smscdata));
# if direction not known, assume it is outgoing when the smsc length == 0
my $incoming= defined $dir ? $dir : $smsclen!=0;
my $pduhdr= unpack("C", substr($data, $ofs++, 1));
my $mti= ($pduhdr&3);
my $pdutype= $pdutypes[$mti][$incoming];
printf("%s %s\n", $pdutype, $mti?"done":"");
if (hasfield($pdutype, "MR")) {
# message reference
my $mr= unpack("C", substr($data, $ofs++, 1));
printf("MR: %02x\n", $mr);
}
if (hasfield($pdutype, "DA") || hasfield($pdutype, "OA")) {
# destination/originating address
my $srclen= unpack("C", substr($data, $ofs++, 1));
my $srcnum= substr($data, $ofs, ($srclen-1)/2+2);
$ofs += ($srclen-1)/2+2;
printf("%s: %s\n", hasfield($pdutype, "DA")?"DA":hasfield($pdutype, "OA")?"OA":"??",
decode_address($srcnum));
}
if (hasfield($pdutype, "PID")) {
# protocol id
# todo
my $protocol= unpack("C", substr($data, $ofs++, 1));
printf("prot: %02x\n", $protocol);
# 0x00
# 0x0b
# 0x0d
# 0x10
}
my $dcs;
if (hasfield($pdutype, "DCS")) {
# data coding scheme
# todo , see 23.038
$dcs= unpack("C", substr($data, $ofs++, 1));
printf("dcs: %02x\n", $dcs);
}
if (hasfield($pdutype, "VP") && ($pduhdr&0x18)) {
# validity period
my $vpf= ($pduhdr&0x18)>>3;
my @fmt=qw(- rel enh abs);
# rel: 1 byte
# abs: 7 bytes
# enh: 7 bytes
if ($vpf==1) {
my $vp= unpack("C", substr($data, $ofs++, 1));
printf("VP: %s %d\n", $fmt[$vpf], $vp);
}
else {
my $vp= substr($data, $ofs, 7);
$ofs+=7;
printf("VP: %s %s\n", $fmt[$vpf], unpack 'H*',$vp);
}
}
if (hasfield($pdutype, "SCTS")) {
# sc timestamp
my $scts= unpack 'H*', unpack("a7", substr($data, $ofs, 7));
$ofs+=7;
$scts =~ s/(\w)(\w)/$2$1/g;
printf("scts: %s\n", $scts);
}
my $udl;
if (hasfield($pdutype, "UDL")) {
# userdata length, in septets ( 7 bit units )
$udl= unpack("C", substr($data, $ofs++, 1));
}
if (hasfield($pdutype, "UD")) {
# user data
# $dcs&c == 00 : 7bit
# $dcs&c == 04 : 8bit
# $dcs&c == 08 : unicode
if ($dcs&0x20) {
printf("compressed[%02x]: %s\n", $udl, unpack("H*", substr($data, $ofs)));
}
elsif (($dcs&0x0c)==0) {
# 7 bit data
my $nroctets= int(($udl*7)/8)+1;
my $ud= substr($data, $ofs, $nroctets);
$ofs += $nroctets;
my $uofs= 0;
if ($pduhdr&0x40) {
my $udhilen= unpack("C", substr($ud, 0, 1));
printf("udhi: %s\n", unpack("H*", substr($ud, 0, $udhilen+1)));
$uofs+= $udhilen+1;
}
my @msg= sms7to8bit($ud);
my $skip7= ($uofs*8+6)/7;
printf("msg: '%s'\n", join "", @msg[$skip7..$udl-1]);
}
elsif (($dcs&0x0c)==0x04) {
# 8 bit data
my $ud= substr($data, $ofs, $udl);
$ofs += ($udl*7)/8;
printf("msg: '%s'\n", $ud);
}
elsif (($dcs&0x0c)==0x08) {
# ucs2 data
my $ud= substr($data, $ofs, $udl);
$ofs += $udl;
printf("msg: U'%s'\n", pack("U*", unpack("n*",$ud)));
}
# 0x91
# 0xd0
else {
printf("msg: unknown encoding: %s\n", unpack("H*", substr($data, $ofs)));
}
}
printf("leftover: %d\n", length($data)-$ofs);
}
sub decode_address {
return "-" if ($_[0] eq "");
my ($typebyte, $addr)= unpack("Ca*", $_[0]);
my ($one, $type, $plan)=(($typebyte>>7)&1, ($typebyte>>4)&7, $typebyte&0xf);
return sprintf("%d.%s.%s:%s", $one, $numtypes[$type], $plantypes[$plan], decodenumber($addr, $type))
}
sub sms7to8bit {
my @xlat= (
"@", "£", "\$", "¥", "è", "é", "ù", "ì", "ò", "Ç", "\n", "Ø", "ø", "\r", "Å", "å",
"∆", "_", "Φ", "Γ", "Λ", "Ω", "Π", "Ψ", "Σ", "Θ", "Ξ", "\x1b", "Æ", "æ", "ß", "É",
"\x20", "!", "\x22", "#", "¤", "%", "&", "\x27", "(", ")", "*", "+", ",", "-", ".", "/",
"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", ">", "?",
"¡", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O",
"P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "Ä", "Ö", "Ñ", "Ü", "§",
"¿", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o",
"p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "ä", "ö", "ñ", "ü", "à",
);
# \x03 : pagebreak
# \x1b : reserved for more extensions
my @xlat2= (
" ", " ", " ", " ", " ", " ", " ", " ", " ", " ", "\x03", " ", " ", " ", " ", " ",
" ", " ", " ", " ", "^", " ", " ", " ", " ", " ", " ", "\x1b", " ", " ", " ", " ",
" ", " ", " ", " ", " ", " ", " ", " ", "{", "}", " ", " ", " ", " ", " ", "\\",
" ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", "[", "~", "]", " ",
"|", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ",
" ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ",
" ", " ", " ", " ", " ", "€", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ",
" ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ",
);
# printf("xx-%s\n", unpack("H*", $_[0]));
my $bits= unpack("b*", $_[0]);
my $esc;
my @str;
for (my $bit=0 ; $bit<length($bits) ; $bit+=7) {
my $c= unpack "C", pack("b*", substr($bits, $bit, 7));
if ($esc) {
$esc--;
push @str, $xlat2[$c];
}
elsif ($c==0x1b) {
$esc++;
}
else {
push @str, $xlat[$c];
}
}
return @str;
}
sub decodenumber {
my ($numdata, $type)= @_;
if ($type==5) {
my $str= join "", sms7to8bit($numdata);
$str =~ s/\x00$//;
return $str;
}
else {
(my $nr= unpack 'H*', $numdata) =~ s/(\w)(\w)/$2$1/g;
return $nr;
}
}
@daniel-santos
Copy link

daniel-santos commented Aug 20, 2022

Holy crap, this is intense! So I'll give major kudos to the designers for a sweetly compact and versatile protocol, but crap grades for the way they write their spec docs -- they keep referring to previous versions of the spec for definitions. And while this keeps each new revision small and compact, it's hell to try to decipher if it's your first time in this world.

@mwarning Your helpful comments reference section numbers, but don't say which GSM spec version they are in relation to. Could you add that info? Thank you for sharing this!

Unfortunately, I'm working with iridium SMS PDUs, which claim to be formatted as "GSM 04.11 SC address followed by GSM 03.40 TPDU in hexadecimal format." I don't really understand this because, if I'm looking in the right place, §8.2.5.1 of the GSM 04.11 spec claims that the first byte should have 1 unused bit and 7 bits of "RP-Originator Address IEI", when it clearly doesn't -- the first byte is the length of the OA. So when I use your parser, the address type and plan type come up as "unknown" (instead of blowing up), as that value always seems to be 0x00.

image

Anyway, I'm considering just ignoring this and moving on. I don't think I really need these two fields anyway? I'll find out soon enough.

@mwarning
Copy link

@daniel-santos I do not remember my involvement here. It must have been a long time ago.

@daniel-santos
Copy link

@mwarning lol, my apologies! I grabbed the wrong user name! 😊

@daniel-santos
Copy link

@nlitsme I had intended this one for you: Your helpful comments reference section numbers, but don't say which GSM spec version they are in relation to. Could you add that info? Thank you for sharing this!

@nlitsme
Copy link
Author

nlitsme commented Aug 21, 2022

I added them, it a long time ago that I wrote this script, around 2008 I think. back when I still used perl.

@daniel-santos
Copy link

@nlitsme OK, thank you. This does indeed appear to refer to the GSM 03.40 specification. I'm not a major Perl fan myself, but I've inherited some Perl code. I can do this part in an external executable however, but there are no libs that work with Iridium's strange format "off the shelf."

@nlitsme
Copy link
Author

nlitsme commented Aug 23, 2022

GSM 03.40 is the 'old' (1996) number of the newer 'GSM 23.040' (since about 2000) standard, which is also known as 'ETSI TS 123 040'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment