d3x0r/C.dot.arrow.md

## C.dot.arrow.md

      
    Raw
  

              C.dot.arrow.md
            
          
    See C++ Proposal


Document number
Nnnnn=yy-nnnn


Date:
yyyy-mm-dd


Project
WG14 - Programming Language C Working Group(?)


Reply-to
d3x0r < d3ck0r at gmail >


I. Table of Contents


This
Introduction

Introduction
C Standards Proposal
C++ Standards Proposal


Motivation and Scope
Impact
Design Decisions
Technical Specifications
Acknowledgements
References

II. Introduction


Allow . operator to operate on pointers.
Introduction

III. Motivation and Scope


Modernizes method of member access to what basically every other language uses.  Makes portability between different langugae codebases slightly easier.  Simplifies the knowledge required to start programming in C, if there's just one operator, that works the same to access members of an object.

IV. Impact On the Standard


All valid code previously written still behaves the same way.

V. Design Decisions


See remainder of this gist
Introduction
C++ Modification discussion

VI. Technical Specifications


see remainder of this; there is a rough example implementation against a GCC 9.x compiler.

VII. Acknowledgements


None known, no previous proposals have been mentioned.

VIII. References


C11 standard draft
Additional footnote contribution

C Standards Proposal

6.5.2.3 Structure and union members

The first operand of the . operator shall have a qualified or unqualified structure or union
type, and the second operand shall name a member of that type.
The first operand of the -> operator shall have type ‘‘pointer to qualified or unqualified
structure’’ or ‘‘pointer to qualified or unqualified union’’, and the second operand shall
name a member of the type pointed to.

Replace


The first operand of the . operator shall have a qualified or unqualified structure or union type or ‘‘pointer to qualified or unqualified structure’’ or ‘‘pointer to qualified or unqualified union’’, and the second operand shall name a member of that type.
a) (footnote)X.member should rewrite to X→member if X is a pointer. ...


Compatibility notes; right now, it is an error condition that terminates compilation; This error could be demoted to a warning, if the secondary condition (is a pointer...) is true, and it's still an otherwise valid statement.  The warning should be disable-able.
CXX Standards Proposal

External Gist Link
Introduction

In C/C++, there are two operators . and -> used to access members of struct, class and union types, as available. These operators are specified such that they are always paired with a single lvalue type; for example, if the left hand expression is a  pointer to a struct,class, or union, then the operator -> MUST be used.  There is no occasion where . and -> may be interchanged.
Other languages (JS,Rust,Ruby,Python,Go,C#,Java,Kotlin,Swift,ObjC), indirectly derived from C have come to just use '.' as an operator; although, they also don't have a static struct instance that's a constant, and the behavior of '.' is really to dereference the left hand expression always.
A leading cause of confusion in first learning C is why . and -> are different, when they both just get the value from a structure.
Another side effect of having different operators is migrating code from a place where you have an instance of a structure you're passing by reference to read information into, and migrating to perhaps use a pointer to the buffer instead, and all the code has to be changed from '.' to '->' when it's really the same operation. It should be noted, though, that with the introduction of smart pointers, which provide a -> operator to appear like pointers, would still have to use the -> symbol, because the '.' operator would continue to work as normal on the base structure instance.
This is a simple example of current behavior.

struct s {
   int a;
};

void f( void ) {
    struct s S;
    struct s *P = &S;
    S.a  
    P->a
}

The internal operation of '.' and '-> are nearly identical; S.a is just 'take the base address of the structure, add an offset to it'; or a shorthand (mem+offset) where mem is variable depending on where the function's execution frame is located.  On the other hand, P->a is also (mem + offset), only the value comes from the lvalue, instead of just being the address of the lvalue.
It could be that the actual operation of these operators depends on the type of the left hand operator.
struct t {
   struct s *pS;
};

struct v {
   struct t T;
};

struct v * pV;

void f2(void) {
    // there are no times where ->-> is immediately followed with but must always
    // be followed by an identifier, same with '.' except as a parameter '...'; which 
    // is an entirely diffferent operator.
   pV.T.Ps //...
   pV->T->Ps //...
}

One immediate argument was 'clarity' that you know more information if the operators
remain the same.  That you would know that the above T was a pointer or a struct (whether
that information itself is of any use is deatable).
What about the other method of dereferencing?  (* identifier)
void g( void ) {
    static struct s S;
    (*pV).T.pS = &S;
    pV->T.pS = &S;
}

In either case above the resulting type of the lvalue is either a struct or union
or a pointer to a struct or a pointer to a union, and the token after the operator(s)
is(are) an() identifier(s), which is a member of that struct or union.
The meaning of -> and . ...

This is a more complex example.  It involves a base class 'b' that has a data state, which is accessed through many levels of indirection in a single expression.  (This is not a style or design practice I would recommend, having the partials saved as the result of the check to see if it's valid...  )
struct b { // base
	int a; // a value
};

struct f { // functional 1
	struct b b;
}

struct g { // functional 2
	struct b b;
}

struct h { // merge
	struct f *f;
	struct g g;
};

struct s { 
	struct h h;
	struct h *ph;
}


void f( void ) {
	struct s S;
	sturct s *pS;
	
	S.ph = &h;
	S.ph->f = NULL;
	S.ph->g.a = 3;
	
	s.ph->f.b.a = 3; // with new syntax seg fault, but stylistitically -> was used only once
	s.ph->f->b.a = 3; // you would know you only had to check two pointers.
	s.ph.f.b.a = 3; // total adoption, JS style, every '.' access is considered illegal,
	              //   and there's 3 places you need to check if it's a valid pointer; when in reality it's only 2.	
}

Continued

(more on above style)
	struct h *ph_;
	if( ph_ = s.ph ) {
		struct f *pf;
		if( pf = ph_.f ){
			// and really this would be  b_set_a( &pf.b, 3 );
			pf.b .a = 3;
		}
	}


That does remind me of the other extension, of declared variables with the scope of an expression and its following statement block... like for( int .. ) or while( int ... ); why can't I just arbitraily ( int a, ( a = otherfactors * otherscalars ) ) { ... well I guess if( ... ) but that's a different proposal.
C Compiler modification notes

It's a very minor differentiation In the GCC Compiler
between the handling of CPP_DOT and CPP_DEREF, and the routine build_indirect_ref
is only referenced in that one place (definition of build_indirect_ref)... (definition of convert_lvalue_to_rvalue))
which inspects the left hand value to see if it is a pointer, else it returns an error; with a flag check it could instead just return the
expression itself, and then that code and the CPP_DOT handling code would be identitical... resulting with just a slightly different
resulting expression tree instead of emitting errors.
The other difference, at the start of the handling, convert_lvalue_to_rvalue (expr_loc, expr, true, false); and default_function_array_conversion (expr_loc, expr);
the convert_lvalue... routine calls default_function_array_conversion as part of its evaluation, and then additionally checks
something about an atomic lvalue.

Document number	Nnnnn=yy-nnnn
Date:	yyyy-mm-dd
Project	WG14 - Programming Language C Working Group(?)
Reply-to	d3x0r < d3ck0r at gmail >