tbrowder/proposed-sprintf-doc.txt

## proposed-sprintf-doc.txt
=begin pod

=TITLE class Str

=SUBTITLE String of characters

=head2 sub sprintf

=comment Features are still being tested for implementation, so some
of the following features may be removed

The routines in the C<sprintf()> family produce output according to a
format as described below.  Routine C<printf()> writes output to
stdout, the standard output stream, and C<sprintf()> returns its
output as a character string which.

These routines produce their output under the control of a format
string that specifies how subsequent arguments are converted for
output.  Note that neither routine provides a newline, thus the user
must explicitly include it in the format string in one or more places
as desired.

The following features generally follow POSIX descriptions as used in
Perl 5 and C, but not all features found in Perl 5 or POSIX are
implemented because they are either not necessary due to the lack of
multiple-width numbers, or they were not wanted for other reasons such
as the lack of pointers.

=head3 Format of the format string

The format string is a character string (single or double quoted)
composed of zero or more directives: ordinary characters (not %),
which are copied unchanged to the output stream; and conversion
specifications, each of which results in fetching zero or more
subsequent arguments.  Each conversion specification is introduced by
the character %, and ends with a conversion specifier.  In between
there may be (in this order): zero or more flags, an optional minimum
field width, and an optional precision.

The arguments must correspond properly with the conversion specifier.
By default, the arguments are used in the order given, where each '*'
and each conversion specifier asks for the next argument (and it is an
error if insufficiently many arguments are given).  One can also
specify explicitly which argument is taken, at each place where an
argument is required, by writing "%m$" instead of '%' and "*m$"
instead of '*', where the decimal integer m denotes the position in
the argument list of the desired argument, indexed starting from one.
Thus,

    printf "%*d\n", width, num;

and

    printf "%2\$*1$d\n", width, num;

are equivalent.  The second style allows repeated references to the
same argument.  If the style using '$' is used, it must be used
throughout for all conversions taking an argument and all width and
precision arguments, but it may be mixed with "%%" formats which do
not consume an argument.  There may be no gaps in the numbers of
arguments specified using '$'; for example, if arguments 1 and 3 are
specified, argument 2 must also be specified somewhere in the format
string. The '$' must be escaped if the format string is double quoted
(as illustrated in the last example).

For some numeric conversions a radix character ("decimal point") or
thousands' grouping character is used.  The actual character used
depends on the LC_NUMERIC part of the locale.  The POSIX locale uses
'.' as the radix character and does not have a grouping character.
Thus,

        printf "%'.2f\n", 1234567.89;

results in "1234567.89" in the POSIX locale, in "1234567,89" in the
nl_NL locale, and in "1.234.567,89" in the da_DK locale.

=head3 The flag characters

The character % is followed by zero or more of the following flags:

    #     The value should be converted to an "alternate form".
          For o conversions, the first character of the output
          string is made zero (by prefixing a 0 if it was not zero
          already).  For x and X conversions, a nonzero result has
          the string "0x" (or "0X" for X conversions) prepended to
          it.  For a, A, e, E, f, F, g, and G conversions, the
          result will always contain a decimal point, even if no
          digits follow it (normally, a decimal point appears in
          the results of those conversions only if a digit
          follows).  For g and G conversions, trailing zeros are
          not removed from the result as they would otherwise be.
          For other conversions, the result is undefined.

    0     The value should be zero padded.  For C<d>, C<i>, C<o>,
          u, x, X, a, A, e, E, f, F, g, and G conversions, the
          converted value is padded on the left with zeros rather
          than blanks.  If the 0 and - flags both appear, the 0
          flag is ignored.  If a precision is given with a numeric
          conversion (d, i, o, u, x, and X), the 0 flag is ignored.
          For other conversions, the behavior is undefined.

    -     The converted value is to be left adjusted on the field
          boundary.  (The default is right justification.)  The
          converted value is padded on the right with blanks,
          rather than on the left with blanks or zeros.  A - over-
          rides a 0 if both are given.

    ' '   (a space) A blank should be left before a positive number
          (or empty string) produced by a signed conversion.

    +     A sign (C<+> or C<->) should always be placed before a
          number produced by a signed conversion.  By default a
          sign is used only for negative numbers.  A C<+> overrides
          a space if both are used.

    '     For decimal conversion (i, d, u, f, F, g, G) the output
          is to be grouped with thousands' grouping characters if
          the locale information indicates any.  Note that many
          versions of gcc(1) cannot parse this option and will
          issue a warning.

    I     For decimal integer conversion (i, d, u) the output uses
          the locale's alternative output digits, if any.  For
          example, since glibc 2.2.3 this will give Arabic-Indic
          digits in the Persian ("fa_IR") locale.

=head3 The field width

An optional decimal digit string (with nonzero first digit) specifying
a minimum field width.  If the converted value has fewer characters
than the field width, it will be padded with spaces on the left (or
right, if the left-adjustment flag has been given).  Instead of a
decimal digit string one may write "*" or "*m$" (for some decimal
integer m) to specify that the field width is given in the next
argument, or in the m-th argument, respectively, which must be of type
int.  A negative field width is taken as a '-' flag followed by a
positive field width.  In no case does a nonexistent or small field
width cause truncation of a field; if the result of a conversion is
wider than the field width, the field is expanded to contain the
conversion result.

=head3 The precision

An optional precision, in the form of a period ('.')  followed by an
optional decimal digit string.  Instead of a decimal digit string one
may write "*" or "*m$" (for some decimal integer m) to specify that
the precision is given in the next argument, or in the m-th argument,
respectively, which must be of type int.  If the precision is given as
just '.', the precision is taken to be zero.  A negative precision is
taken as if the precision were omitted.  This gives the minimum number
of digits to appear for d, i, o, u, x, and X conversions, the number
of digits to appear after the radix character for a, A, e, E, f, and F
conversions, the maximum number of significant digits for g and G
conversions, or the maximum number of characters to be printed from a
string for s and S conversions.

=head3 The conversion specifier

A character that specifies the type of conversion to be applied.  The
conversion specifiers and their meanings are:

  d, i        The int argument is converted to signed decimal
              notation.  The precision, if any, gives the minimum
              number of digits that must appear; if the converted
              value requires fewer digits, it is padded on the left
              with zeros.  The default precision is 1.  When 0 is
              printed with an explicit precision 0, the output is
              empty.

  o, u, x, X  The unsigned int argument is converted to unsigned
              octal (o), unsigned decimal (u), or unsigned
              hexadecimal (x and X) notation.  The letters abcdef
              are used for x conversions; the letters ABCDEF are
              used for X conversions.  The precision, if any,
              gives the minimum number of digits that must appear;
              if the converted value requires fewer digits, it is
              padded on the left with zeros.  The default precision
              is 1.  When 0 is printed with an explicit precision
              0, the output is empty.

  e, E        The double argument is rounded and converted in the
              style [-]d.ddde+-dd where there is one digit before
              the decimal-point character and the number of digits
              after it is equal to the precision; if the precision
              is missing, it is taken as 6; if the precision is
              zero, no decimal-point character appears.  An E
              conversion uses the letter E (rather than e) to
              introduce the exponent.  The exponent always contains
              at least two digits; if the value is zero, the
              exponent is 00.

  f, F        The double argument is rounded and converted to
              decimal notation in the style [-]ddd.ddd, where
              the number of digits after the decimal-point
              character is equal to the precision
              specification.  If the precision is missing, it
              is taken as 6; if the precision is explicitly
              zero, no decimal-point character appears.  If a
              decimal point appears, at least one digit appears
              before it.

              Character string representations for infinity and
              NaN are "[-]infinity" for infinity, and a string
              starting with "nan" for NaN, in the case of f
              conversion, and "[-]INFINITY" or "NAN*" in the
              case of F conversion.

  g, G        The double argument is converted in style f or e (or
              F or E for G conversions).  The precision specifies
              the number of significant digits.  If the precision
              is missing, 6 digits are given; if the precision is
              zero, it is treated as 1.  Style e is used if the
              exponent from its conversion is less than -4 or
              greater than or equal to the precision.  Trailing
              zeros are removed from the fractional part of the
              result; a decimal point appears only if it is
              followed by at least one digit.

  a, A        For a conversion, the double argument is converted to
              hexadecimal notation (using the letters abcdef) in
              the style [-]0xh.hhhhp+-; for A conversion the prefix
              0X, the letters ABCDEF, and the exponent separator P
              is used.  There is one hexadecimal digit before the
              decimal point, and the number of digits after it is
              equal to the precision.  The default precision
              suffices for an exact representation of the value
              if an exact representation in base 2 exists and
              otherwise is sufficiently large to distinguish
              values of type double.  The digit before the decimal
              point is unspecified for non-normalized numbers, and
              nonzero but otherwise unspecified for normalized
              numbers.

  c           The int argument is converted to an unsigned char,
              and the resulting character is written.

  s           The argument is expected to be a character string. If
              a precision is specified, no more characters than the
              number specified are written.

  %           A '%' is written.  No argument is converted.  The
              complete conversion specification is '%%'.


=head3 Miscellaneous Examples

To print Pi to five decimal places:

           printf "pi = %.5f\n", 4 * atan(1.0);

To print a date and time in the form "Sunday, July 3, 10:02", where
weekday and month are pointers to strings:

           printf "%s, %s %d, %.2d:%.2d\n",
                   weekday, month, day, hour, min;

Many countries use the day-month-year order.  Hence, an
internationalized version must be able to print the arguments in an
order specified by the format:

           printf format,
                   weekday, month, day, hour, min;

where format depends on locale, and may permute the arguments.  With
the value:

           "%1$s, %3$d. %2$s, %4$d:%5$.2d\n"

one might obtain "Sonntag, 3. Juli, 10:02".

=head3 Detailed, Comparative Examples

[WORK IN PROGRESS]

=end pod
	=begin pod

	=TITLE class Str

	=SUBTITLE String of characters

	=head2 sub sprintf

	=comment Features are still being tested for implementation, so some
	of the following features may be removed

	The routines in the C<sprintf()> family produce output according to a
	format as described below. Routine C<printf()> writes output to
	stdout, the standard output stream, and C<sprintf()> returns its
	output as a character string which.

	These routines produce their output under the control of a format
	string that specifies how subsequent arguments are converted for
	output. Note that neither routine provides a newline, thus the user
	must explicitly include it in the format string in one or more places
	as desired.

	The following features generally follow POSIX descriptions as used in
	Perl 5 and C, but not all features found in Perl 5 or POSIX are
	implemented because they are either not necessary due to the lack of
	multiple-width numbers, or they were not wanted for other reasons such
	as the lack of pointers.

	=head3 Format of the format string

	The format string is a character string (single or double quoted)
	composed of zero or more directives: ordinary characters (not %),
	which are copied unchanged to the output stream; and conversion
	specifications, each of which results in fetching zero or more
	subsequent arguments. Each conversion specification is introduced by
	the character %, and ends with a conversion specifier. In between
	there may be (in this order): zero or more flags, an optional minimum
	field width, and an optional precision.

	The arguments must correspond properly with the conversion specifier.
	By default, the arguments are used in the order given, where each '*'
	and each conversion specifier asks for the next argument (and it is an
	error if insufficiently many arguments are given). One can also
	specify explicitly which argument is taken, at each place where an
	argument is required, by writing "%m$" instead of '%' and "*m$"
	instead of '*', where the decimal integer m denotes the position in
	the argument list of the desired argument, indexed starting from one.
	Thus,

	printf "%*d\n", width, num;

	and

	printf "%2\$*1$d\n", width, num;

	are equivalent. The second style allows repeated references to the
	same argument. If the style using '$' is used, it must be used
	throughout for all conversions taking an argument and all width and
	precision arguments, but it may be mixed with "%%" formats which do
	not consume an argument. There may be no gaps in the numbers of
	arguments specified using '$'; for example, if arguments 1 and 3 are
	specified, argument 2 must also be specified somewhere in the format
	string. The '$' must be escaped if the format string is double quoted
	(as illustrated in the last example).

	For some numeric conversions a radix character ("decimal point") or
	thousands' grouping character is used. The actual character used
	depends on the LC_NUMERIC part of the locale. The POSIX locale uses
	'.' as the radix character and does not have a grouping character.
	Thus,

	printf "%'.2f\n", 1234567.89;

	results in "1234567.89" in the POSIX locale, in "1234567,89" in the
	nl_NL locale, and in "1.234.567,89" in the da_DK locale.

	=head3 The flag characters

	The character % is followed by zero or more of the following flags:

	# The value should be converted to an "alternate form".
	For o conversions, the first character of the output
	string is made zero (by prefixing a 0 if it was not zero
	already). For x and X conversions, a nonzero result has
	the string "0x" (or "0X" for X conversions) prepended to
	it. For a, A, e, E, f, F, g, and G conversions, the
	result will always contain a decimal point, even if no
	digits follow it (normally, a decimal point appears in
	the results of those conversions only if a digit
	follows). For g and G conversions, trailing zeros are
	not removed from the result as they would otherwise be.
	For other conversions, the result is undefined.

	0 The value should be zero padded. For C<d>, C<i>, C<o>,
	u, x, X, a, A, e, E, f, F, g, and G conversions, the
	converted value is padded on the left with zeros rather
	than blanks. If the 0 and - flags both appear, the 0
	flag is ignored. If a precision is given with a numeric
	conversion (d, i, o, u, x, and X), the 0 flag is ignored.
	For other conversions, the behavior is undefined.

	- The converted value is to be left adjusted on the field
	boundary. (The default is right justification.) The
	converted value is padded on the right with blanks,
	rather than on the left with blanks or zeros. A - over-
	rides a 0 if both are given.

	' ' (a space) A blank should be left before a positive number
	(or empty string) produced by a signed conversion.

	+ A sign (C<+> or C<->) should always be placed before a
	number produced by a signed conversion. By default a
	sign is used only for negative numbers. A C<+> overrides
	a space if both are used.

	' For decimal conversion (i, d, u, f, F, g, G) the output
	is to be grouped with thousands' grouping characters if
	the locale information indicates any. Note that many
	versions of gcc(1) cannot parse this option and will
	issue a warning.

	I For decimal integer conversion (i, d, u) the output uses
	the locale's alternative output digits, if any. For
	example, since glibc 2.2.3 this will give Arabic-Indic
	digits in the Persian ("fa_IR") locale.

	=head3 The field width

	An optional decimal digit string (with nonzero first digit) specifying
	a minimum field width. If the converted value has fewer characters
	than the field width, it will be padded with spaces on the left (or
	right, if the left-adjustment flag has been given). Instead of a
	decimal digit string one may write "" or "m$" (for some decimal
	integer m) to specify that the field width is given in the next
	argument, or in the m-th argument, respectively, which must be of type
	int. A negative field width is taken as a '-' flag followed by a
	positive field width. In no case does a nonexistent or small field
	width cause truncation of a field; if the result of a conversion is
	wider than the field width, the field is expanded to contain the
	conversion result.

	=head3 The precision

	An optional precision, in the form of a period ('.') followed by an
	optional decimal digit string. Instead of a decimal digit string one
	may write "" or "m$" (for some decimal integer m) to specify that
	the precision is given in the next argument, or in the m-th argument,
	respectively, which must be of type int. If the precision is given as
	just '.', the precision is taken to be zero. A negative precision is
	taken as if the precision were omitted. This gives the minimum number
	of digits to appear for d, i, o, u, x, and X conversions, the number
	of digits to appear after the radix character for a, A, e, E, f, and F
	conversions, the maximum number of significant digits for g and G
	conversions, or the maximum number of characters to be printed from a
	string for s and S conversions.

	=head3 The conversion specifier

	A character that specifies the type of conversion to be applied. The
	conversion specifiers and their meanings are:

	d, i The int argument is converted to signed decimal
	notation. The precision, if any, gives the minimum
	number of digits that must appear; if the converted
	value requires fewer digits, it is padded on the left
	with zeros. The default precision is 1. When 0 is
	printed with an explicit precision 0, the output is
	empty.

	o, u, x, X The unsigned int argument is converted to unsigned
	octal (o), unsigned decimal (u), or unsigned
	hexadecimal (x and X) notation. The letters abcdef
	are used for x conversions; the letters ABCDEF are
	used for X conversions. The precision, if any,
	gives the minimum number of digits that must appear;
	if the converted value requires fewer digits, it is
	padded on the left with zeros. The default precision
	is 1. When 0 is printed with an explicit precision
	0, the output is empty.

	e, E The double argument is rounded and converted in the
	style [-]d.ddde+-dd where there is one digit before
	the decimal-point character and the number of digits
	after it is equal to the precision; if the precision
	is missing, it is taken as 6; if the precision is
	zero, no decimal-point character appears. An E
	conversion uses the letter E (rather than e) to
	introduce the exponent. The exponent always contains
	at least two digits; if the value is zero, the
	exponent is 00.

	f, F The double argument is rounded and converted to
	decimal notation in the style [-]ddd.ddd, where
	the number of digits after the decimal-point
	character is equal to the precision
	specification. If the precision is missing, it
	is taken as 6; if the precision is explicitly
	zero, no decimal-point character appears. If a
	decimal point appears, at least one digit appears
	before it.

	Character string representations for infinity and
	NaN are "[-]infinity" for infinity, and a string
	starting with "nan" for NaN, in the case of f
	conversion, and "[-]INFINITY" or "NAN*" in the
	case of F conversion.

	g, G The double argument is converted in style f or e (or
	F or E for G conversions). The precision specifies
	the number of significant digits. If the precision
	is missing, 6 digits are given; if the precision is
	zero, it is treated as 1. Style e is used if the
	exponent from its conversion is less than -4 or
	greater than or equal to the precision. Trailing
	zeros are removed from the fractional part of the
	result; a decimal point appears only if it is
	followed by at least one digit.

	a, A For a conversion, the double argument is converted to
	hexadecimal notation (using the letters abcdef) in
	the style [-]0xh.hhhhp+-; for A conversion the prefix
	0X, the letters ABCDEF, and the exponent separator P
	is used. There is one hexadecimal digit before the
	decimal point, and the number of digits after it is
	equal to the precision. The default precision
	suffices for an exact representation of the value
	if an exact representation in base 2 exists and
	otherwise is sufficiently large to distinguish
	values of type double. The digit before the decimal
	point is unspecified for non-normalized numbers, and
	nonzero but otherwise unspecified for normalized
	numbers.

	c The int argument is converted to an unsigned char,
	and the resulting character is written.

	s The argument is expected to be a character string. If
	a precision is specified, no more characters than the
	number specified are written.

	% A '%' is written. No argument is converted. The
	complete conversion specification is '%%'.


	=head3 Miscellaneous Examples

	To print Pi to five decimal places:

	printf "pi = %.5f\n", 4 * atan(1.0);

	To print a date and time in the form "Sunday, July 3, 10:02", where
	weekday and month are pointers to strings:

	printf "%s, %s %d, %.2d:%.2d\n",
	weekday, month, day, hour, min;

	Many countries use the day-month-year order. Hence, an
	internationalized version must be able to print the arguments in an
	order specified by the format:

	printf format,
	weekday, month, day, hour, min;

	where format depends on locale, and may permute the arguments. With
	the value:

	"%1$s, %3$d. %2$s, %4$d:%5$.2d\n"

	one might obtain "Sonntag, 3. Juli, 10:02".

	=head3 Detailed, Comparative Examples

	[WORK IN PROGRESS]

	=end pod