Document number | Nnnnn=yy-nnnn |
Date: | yyyy-mm-dd |
Project | WG14 - Programming Language C Working Group(?) |
Reply-to | d3x0r < d3ck0r at gmail > |
- This
- Introduction
- Motivation and Scope
- Impact
- Design Decisions
- Technical Specifications
- Acknowledgements
- References
- Allow
.
operator to operate on pointers. - Introduction
- Modernizes method of member access to what basically every other language uses. Makes portability between different langugae codebases slightly easier. Simplifies the knowledge required to start programming in C, if there's just one operator, that works the same to access members of an object.
- All valid code previously written still behaves the same way.
- See remainder of this gist
- Introduction
- C++ Modification discussion
- see remainder of this; there is a rough example implementation against a GCC 9.x compiler.
- None known, no previous proposals have been mentioned.
- C11 standard draft
- Additional footnote contribution
6.5.2.3 Structure and union members
- The first operand of the . operator shall have a qualified or unqualified structure or union type, and the second operand shall name a member of that type.
- The first operand of the -> operator shall have type ‘‘pointer to qualified or unqualified structure’’ or ‘‘pointer to qualified or unqualified union’’, and the second operand shall name a member of the type pointed to.
Replace
-
The first operand of the . operator shall have a qualified or unqualified structure or union type or ‘‘pointer to qualified or unqualified structure’’ or ‘‘pointer to qualified or unqualified union’’, and the second operand shall name a member of that type.
a) (footnote)
X.member
should rewrite toX→member
ifX
is a pointer. ...
Compatibility notes; right now, it is an error condition that terminates compilation; This error could be demoted to a warning, if the secondary condition (is a pointer...) is true, and it's still an otherwise valid statement. The warning should be disable-able.
In C/C++, there are two operators .
and ->
used to access members of struct
, class
and union
types, as available. These operators are specified such that they are always paired with a single lvalue type; for example, if the left hand expression is a pointer to a struct
,class
, or union
, then the operator ->
MUST be used. There is no occasion where .
and ->
may be interchanged.
Other languages (JS,Rust,Ruby,Python,Go,C#,Java,Kotlin,Swift,ObjC), indirectly derived from C have come to just use '.' as an operator; although, they also don't have a static struct instance that's a constant, and the behavior of '.' is really to dereference the left hand expression always.
A leading cause of confusion in first learning C is why . and -> are different, when they both just get the value from a structure.
Another side effect of having different operators is migrating code from a place where you have an instance of a structure you're passing by reference to read information into, and migrating to perhaps use a pointer to the buffer instead, and all the code has to be changed from '.' to '->' when it's really the same operation. It should be noted, though, that with the introduction of smart pointers, which provide a -> operator to appear like pointers, would still have to use the -> symbol, because the '.' operator would continue to work as normal on the base structure instance.
This is a simple example of current behavior.
struct s {
int a;
};
void f( void ) {
struct s S;
struct s *P = &S;
S.a
P->a
}
The internal operation of '.' and '-> are nearly identical; S.a
is just 'take the base address of the structure, add an offset to it'; or a shorthand (mem+offset) where mem is variable depending on where the function's execution frame is located. On the other hand, P->a
is also (mem + offset), only the value comes from the lvalue, instead of just being the address of the lvalue.
It could be that the actual operation of these operators depends on the type of the left hand operator.
struct t {
struct s *pS;
};
struct v {
struct t T;
};
struct v * pV;
void f2(void) {
// there are no times where ->-> is immediately followed with but must always
// be followed by an identifier, same with '.' except as a parameter '...'; which
// is an entirely diffferent operator.
pV.T.Ps //...
pV->T->Ps //...
}
One immediate argument was 'clarity' that you know more information if the operators
remain the same. That you would know that the above T
was a pointer or a struct (whether
that information itself is of any use is deatable).
What about the other method of dereferencing? (* identifier)
void g( void ) {
static struct s S;
(*pV).T.pS = &S;
pV->T.pS = &S;
}
In either case above the resulting type of the lvalue is either a struct
or union
or a pointer to a struct
or a pointer to a union
, and the token after the operator(s)
is(are) an() identifier(s), which is a member of that struct or union.
This is a more complex example. It involves a base class 'b' that has a data state, which is accessed through many levels of indirection in a single expression. (This is not a style or design practice I would recommend, having the partials saved as the result of the check to see if it's valid... )
struct b { // base
int a; // a value
};
struct f { // functional 1
struct b b;
}
struct g { // functional 2
struct b b;
}
struct h { // merge
struct f *f;
struct g g;
};
struct s {
struct h h;
struct h *ph;
}
void f( void ) {
struct s S;
sturct s *pS;
S.ph = &h;
S.ph->f = NULL;
S.ph->g.a = 3;
s.ph->f.b.a = 3; // with new syntax seg fault, but stylistitically -> was used only once
s.ph->f->b.a = 3; // you would know you only had to check two pointers.
s.ph.f.b.a = 3; // total adoption, JS style, every '.' access is considered illegal,
// and there's 3 places you need to check if it's a valid pointer; when in reality it's only 2.
}
(more on above style)
struct h *ph_;
if( ph_ = s.ph ) {
struct f *pf;
if( pf = ph_.f ){
// and really this would be b_set_a( &pf.b, 3 );
pf.b .a = 3;
}
}
That does remind me of the other extension, of declared variables with the scope of an expression and its following statement block... like for( int .. )
or while( int ... );
why can't I just arbitraily ( int a, ( a = otherfactors * otherscalars ) ) {
... well I guess if( ... ) but that's a different proposal.
It's a very minor differentiation In the GCC Compiler
between the handling of CPP_DOT
and CPP_DEREF
, and the routine build_indirect_ref
is only referenced in that one place (definition of build_indirect_ref
)... (definition of convert_lvalue_to_rvalue))
which inspects the left hand value to see if it is a pointer, else it returns an error; with a flag check it could instead just return the
expression itself, and then that code and the CPP_DOT handling code would be identitical... resulting with just a slightly different
resulting expression tree instead of emitting errors.
The other difference, at the start of the handling, convert_lvalue_to_rvalue (expr_loc, expr, true, false);
and default_function_array_conversion (expr_loc, expr);
the convert_lvalue...
routine calls default_function_array_conversion
as part of its evaluation, and then additionally checks
something about an atomic lvalue.