Skip to content

Instantly share code, notes, and snippets.

@d3x0r
Last active April 2, 2023 03:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save d3x0r/a574459e4ad18ea1e10a2367b06bd257 to your computer and use it in GitHub Desktop.
Save d3x0r/a574459e4ad18ea1e10a2367b06bd257 to your computer and use it in GitHub Desktop.
A proposal for extending C to allow `.` operator to perform indirection of a pointer.

See C++ Proposal

Document number Nnnnn=yy-nnnn
Date: yyyy-mm-dd
Project WG14 - Programming Language C Working Group(?)
Reply-to d3x0r < d3ck0r at gmail >

I. Table of Contents

II. Introduction

III. Motivation and Scope

  • Modernizes method of member access to what basically every other language uses. Makes portability between different langugae codebases slightly easier. Simplifies the knowledge required to start programming in C, if there's just one operator, that works the same to access members of an object.

IV. Impact On the Standard

  • All valid code previously written still behaves the same way.

V. Design Decisions

VI. Technical Specifications

  • see remainder of this; there is a rough example implementation against a GCC 9.x compiler.

VII. Acknowledgements

  • None known, no previous proposals have been mentioned.

VIII. References

C Standards Proposal

6.5.2.3 Structure and union members

  1. The first operand of the . operator shall have a qualified or unqualified structure or union type, and the second operand shall name a member of that type.
  2. The first operand of the -> operator shall have type ‘‘pointer to qualified or unqualified structure’’ or ‘‘pointer to qualified or unqualified union’’, and the second operand shall name a member of the type pointed to.

Replace

  1. The first operand of the . operator shall have a qualified or unqualified structure or union type or ‘‘pointer to qualified or unqualified structure’’ or ‘‘pointer to qualified or unqualified union’’, and the second operand shall name a member of that type.

    a) (footnote)X.member should rewrite to X→member if X is a pointer. ...

Compatibility notes; right now, it is an error condition that terminates compilation; This error could be demoted to a warning, if the secondary condition (is a pointer...) is true, and it's still an otherwise valid statement. The warning should be disable-able.

CXX Standards Proposal

External Gist Link

Introduction

In C/C++, there are two operators . and -> used to access members of struct, class and union types, as available. These operators are specified such that they are always paired with a single lvalue type; for example, if the left hand expression is a pointer to a struct,class, or union, then the operator -> MUST be used. There is no occasion where . and -> may be interchanged.

Other languages (JS,Rust,Ruby,Python,Go,C#,Java,Kotlin,Swift,ObjC), indirectly derived from C have come to just use '.' as an operator; although, they also don't have a static struct instance that's a constant, and the behavior of '.' is really to dereference the left hand expression always.

A leading cause of confusion in first learning C is why . and -> are different, when they both just get the value from a structure.

Another side effect of having different operators is migrating code from a place where you have an instance of a structure you're passing by reference to read information into, and migrating to perhaps use a pointer to the buffer instead, and all the code has to be changed from '.' to '->' when it's really the same operation. It should be noted, though, that with the introduction of smart pointers, which provide a -> operator to appear like pointers, would still have to use the -> symbol, because the '.' operator would continue to work as normal on the base structure instance.

This is a simple example of current behavior.


struct s {
   int a;
};

void f( void ) {
    struct s S;
    struct s *P = &S;
    S.a  
    P->a
}

The internal operation of '.' and '-> are nearly identical; S.a is just 'take the base address of the structure, add an offset to it'; or a shorthand (mem+offset) where mem is variable depending on where the function's execution frame is located. On the other hand, P->a is also (mem + offset), only the value comes from the lvalue, instead of just being the address of the lvalue.

It could be that the actual operation of these operators depends on the type of the left hand operator.

struct t {
   struct s *pS;
};

struct v {
   struct t T;
};

struct v * pV;

void f2(void) {
    // there are no times where ->-> is immediately followed with but must always
    // be followed by an identifier, same with '.' except as a parameter '...'; which 
    // is an entirely diffferent operator.
   pV.T.Ps //...
   pV->T->Ps //...
}

One immediate argument was 'clarity' that you know more information if the operators remain the same. That you would know that the above T was a pointer or a struct (whether that information itself is of any use is deatable).

What about the other method of dereferencing? (* identifier)

void g( void ) {
    static struct s S;
    (*pV).T.pS = &S;
    pV->T.pS = &S;
}

In either case above the resulting type of the lvalue is either a struct or union or a pointer to a struct or a pointer to a union, and the token after the operator(s) is(are) an() identifier(s), which is a member of that struct or union.

The meaning of -> and . ...

This is a more complex example. It involves a base class 'b' that has a data state, which is accessed through many levels of indirection in a single expression. (This is not a style or design practice I would recommend, having the partials saved as the result of the check to see if it's valid... )

struct b { // base
	int a; // a value
};

struct f { // functional 1
	struct b b;
}

struct g { // functional 2
	struct b b;
}

struct h { // merge
	struct f *f;
	struct g g;
};

struct s { 
	struct h h;
	struct h *ph;
}


void f( void ) {
	struct s S;
	sturct s *pS;
	
	S.ph = &h;
	S.ph->f = NULL;
	S.ph->g.a = 3;
	
	s.ph->f.b.a = 3; // with new syntax seg fault, but stylistitically -> was used only once
	s.ph->f->b.a = 3; // you would know you only had to check two pointers.
	s.ph.f.b.a = 3; // total adoption, JS style, every '.' access is considered illegal,
	              //   and there's 3 places you need to check if it's a valid pointer; when in reality it's only 2.	
}

Continued

(more on above style)

	struct h *ph_;
	if( ph_ = s.ph ) {
		struct f *pf;
		if( pf = ph_.f ){
			// and really this would be  b_set_a( &pf.b, 3 );
			pf.b .a = 3;
		}
	}

That does remind me of the other extension, of declared variables with the scope of an expression and its following statement block... like for( int .. ) or while( int ... ); why can't I just arbitraily ( int a, ( a = otherfactors * otherscalars ) ) { ... well I guess if( ... ) but that's a different proposal.

C Compiler modification notes

It's a very minor differentiation In the GCC Compiler between the handling of CPP_DOT and CPP_DEREF, and the routine build_indirect_ref is only referenced in that one place (definition of build_indirect_ref)... (definition of convert_lvalue_to_rvalue)) which inspects the left hand value to see if it is a pointer, else it returns an error; with a flag check it could instead just return the expression itself, and then that code and the CPP_DOT handling code would be identitical... resulting with just a slightly different resulting expression tree instead of emitting errors.

The other difference, at the start of the handling, convert_lvalue_to_rvalue (expr_loc, expr, true, false); and default_function_array_conversion (expr_loc, expr); the convert_lvalue... routine calls default_function_array_conversion as part of its evaluation, and then additionally checks something about an atomic lvalue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment