mproffitt/thoughts-on-copilot.md

## thoughts-on-copilot.md

      
    Raw
  

              thoughts-on-copilot.md
            
          
    Copilot. Love it or hate AI, copilot is a useful tool but after using it for a couple of weeks, with intense activity over the last few days I have to say I have mixed feelings about it as a tool.
I think like any other tool, or human for that matter, there are times copilot gets it right, and there are times it really gets it wrong.
To give an example of where it seems to consistently get it wrong, I want to look at testing.
Now when it comes to testing in golang I am not at my finest. In previous roles I was almost pious about it. "You will write tests, you must cover every line and every branch" - but after an extended stint in a company where writing tests wasn't just frowned upon but was actively discouraged, I found my own attitude to writing them changed. Therefore, until recently I've had an extremely relaxed attitude towards them which has left me learning how to write them all over again.
As I have come to write more and more in golang, I find myself in a position where I not only need to write tests, but have recovered some of my earlier piousness towards them and actively want to write tests however having never written tests in this language before, I needed some help.
Introduce copilot and the magic generate tests

In one of the projects I'm currently working on I have a HTTP transport package that I'm using throughout the application. Nothing fancy just a simple 4 method interface of Get | Post | Do | DoWithBackoff
type HttpClient interface {
	Get(ctx context.Context, url string, recv any) error

	Post(ctx context.Context, url string, recv, send any) error

	DoWithBackoff(ctx context.Context, req *http.Request, recv any) error

	Do(ctx context.Context, req *http.Request, recv any) error
}
Like I said, nothing fancy.
Alongside this I have defined a mock for testing, again, nothing too clever but allows me to table responses back into the application:
type MockHttpClient struct {
	responses []struct {
		code int
		body string
	}
}

func (m *MockHttpClient) Get(ctx context.Context, url string, recv any) error {
	return m.DoWithBackoff(context.Background(), nil, recv)
}

func (m *MockHttpClient) Post(ctx context.Context, url string, recv, send any) error {
	return m.DoWithBackoff(context.Background(), nil, recv)
}

func (m *MockHttpClient) DoWithBackoff(ctx context.Context, req *http.Request, response any) error {
	return m.Do(context.Background(), req, response)
}

func (m *MockHttpClient) Do(ctx context.Context, req *http.Request, recv any) error {
	if len(m.responses) == 0 {
		return nil
	}

	response := m.responses[0]
	m.responses = m.responses[1:]

	switch response.code {
	case 400, 401, 403, 404, 501:
		return &transport.ErrUnknown{
			Code: response.code,
			Body: []byte(response.body),
		}
	case 429, 500, 502, 503, 504:
		return m.Do(ctx, req, recv)
	}
	err := json.Unmarshal([]byte(response.body), recv)
	return err
}
Now it's worth noting at this point that the mock currently doesn't live inside the transport package but instead lives inside the cmd package that it was originally written for. That's not an oversight - I tend to keep code with the original package, until I need it to be more generic and as soon as I find myself starting to duplicate code, that's when I refactor it out.
So lets look at what I need testing:
Run: func(cmd *cobra.Command, args []string) {
	var (
		password, token string
		err error
		kdf types.KDFInfo
		req *http.Request
		ctx context.Context = context.Background()
	)
	...

	var localAddress string = fmt.Sprintf("https://%s:%d",
			vaultItem.Server, vaultItem.Port)

	if err = transport.HttpClient.Get(
		ctx, localAddress+"/api/v1/kdf", &kdf,
	); err != nil {
		fatal("unable to get kdf info: %q", err)
		return
	}
	...
	
	req, err = http.NewRequest(
		"POST", localAddress+"/api/v1/revoke", nil)
	if err != nil {
		fatal("unable to create request for %s: %q", address, err)
		return
	}
	
	var r struct {
		Code int `json:"statuscode"`
		Message string `json:"message"`
	}
	
	if err = transport.HttpClient.DoWithBackoff(ctx, req, &r); err != nil {
		fatal("unable to send request for %s: %q", address, err)
		return
	}
	
	if r.Code != 204 {
		fatal("Failed to revoke token: %s", r.Message)
		return
	}
	...
},
copilot write me some tests

One of the things I've come to appreciate about copilot is as long as you're paying attention, when you start writing something it auto-completes the line for you and gets it right. This is important to remember as for vast swathes of code you can literally just tab return tab return.
This is perfect up to a point but there are two things to note.

It doesn't always get it right
It really doesn't know when to stop

If you're not careful, you'll find copilot writing code that has absolutely no place in the function you're currently developing or simply inventing stuff that seems to make sense but really doesn't fit your style, pattern or even the behaviour of the code under development.
To try and give an example of this, in one place in my current project I have a secret cache.
For very good reason this is a singleton and must remain a singleton to ensure synchronicity and safety of its contents. Part of the struct is memory locked as it contains a decrypted master key and with this in mind, the singleton pattern fits best to ensure that your master key is not spread all over system memory potentially increasing the attack vector.
Access to the singleton is by calling Instance
var (
	secretCache *SecretCache
	lock     = &sync.Mutex{}
	Instance = instance
)

func instance(masterpw, salt string, kdf types.KDFInfo) (*SecretCache, error) {
	lock.Lock()
	defer lock.Unlock()
	if secretCache != nil {
		return secretCache, nil
	}

	secretCache = &SecretCache{
		KDF: kdf,
	}
	err := secretCache.setMasterPassword(masterpw, email)
	if err != nil {
		err = fmt.Errorf("failed to set master password: %w", err)
	}
	return secretCache, err
}
Now lets look at how copilot recommends accessing this (this is just one example):
expected

secrets, err := cache.Instance(...)
actual

// create a new secret cache if it doesn't already exist
if cache.Secrets == nil {                // non-existant field
	cache.Secrets = cache.NewSecretCache() // non-existant constructor
	if secrets.HashPassword(password) == "" {
	    key, salt = cache.DeriveMasterKey(password, email, kdf)
		secrets.SetMasterPassword(key, salt)
	}
}
secrets = &cache.Secrets
So lets look at this a little.
First off it's blatantly invented details:

cache.Secrets doesn't exist and never will. secretCache does and this is deliberately an un-exported field. I don't want direct access to it, ever.
NewSecretCache is a constructor pattern. I would expect this to give me a new instance each and every time I call it.
HashPassword - OK I can kind of understand this call as it wants to check if the master password has been set but this method actually creates a pbkdf2.Key and returns it as a base64 encoded string which would never be equal to an empty string. In fact you can try a version of this for yourself here: https://go.dev/play/p/2_rZAg-HJNp
cache.DeriveMasterKey This exists yes. In the crypto package.
SetMasterPassword is in reality, the un-exported method setMasterPassword

Now at first glance, apart from the obvious syntax errors, someone looking at this might just think that they need to implement the missing / broken parts of the code. It looks viable. It almost makes sense even though it's completely wrong and would break a critical section of the project.
Whats more worrying about this is that someone new to a project might think that this was OK, implement the missing methods and even if they fixed the obvious bug (call cache.MasterPassword instead of trying to compare HashPassword), this would still fundamentally break a critical part of a codebase in a way that would undermine the safety and security of a password database.
Back to testing

It seems fairly innocuous after the above example but even generate tests has this flaw. In fact when trying to run this, I've seen at least 3 different testing patterns when I'd be happy with the consistency of just one.
In most instances, I've only seen simple methods working out of the box. In many instances they might need one or two tweaks to get working but in the vast majority of cases the tests written by copilot have zero relationship to the code they are trying to test and it would been more help if it had simply given me the empty test stub
func TestSomething (t *testing.T) {
	// put some test code here
}
Is this a fair statement? In my opinion yes.
In the first few tests I had it generate I'd try and fix the errors in the test and make it work but in the majority of cases I found myself ripping out what copilot had written and starting again from scratch.
Whilst on this project, this isn't costing me anything other than my own time, I could imagine this being expensive inside a company. If you can't tell at a glance if something is going to waste hours of debugging, only for it to be rewritten from the ground up then the cost of using that tool outweighs the benefits it might bring.
So how did copilot try and test my Run function?
It made shit up of course.
func TestRevokeToken(t *testing.T) {
	mock := transport.MockHttpClient{}
	mock.Code = 200
	mock.Body = []byte("Token revoked successfully for address 127.0.0.1")
	transport.SetDefaultClient(mock) // doesn't exist

	...
	ctx := context.Background()
	ctx = context.WithValue(ctx, types.AuthToken{}, encryptedToken)

	mock.ExpectedGet(ctx, url).WillReturnJson(`{"message": "kdf info"}`)
	mock.ExpectedPost(ctx, url, data).WillReturnJson(`{"message": "token revoked"}`)

	cmd := revokeCmd
	...
	cmd.Execute()
	if err := mock.AssertExpected(); err != nil {
		t.Errorf("Assertions failed: expected %q actual %q", expected, mock.Assertions())
	}
}
Wrapping up

I'll be fair, this is a little bit of a rant against copilot. Don't get me wrong though, I've seen many cases where copilot really works well. When writing test tables, there's a lot of automation that can be done there and for some tests all I ever had to do was tab > return > tab > return and add in the occasional missing comma where I'd forget tab > tab > return > ...
I can see a powerful future for AI in some aspects of development but I would never trust it near anything complicated.
One of my greatest concerns is that the bugs it writes are subtle. They look like they might work and are simply highlighting missing methods.
This would be a mistake to believe and a trap to fall into.
Honestly, I'm torn on this part because I have personally also found it damn useful on the projects I've tried it on. However I know that code, I wrote it myself, I'm familiar with its concepts and where I want it to go.
That said though, many engineers are not always familiar enough with the architecture of the code they are working with to know that changing from a singleton to a constructor pattern could have disastrous consequences on security of an entire application without some serious forethought around it (and potentially tons of refactoring).
We shouldn't ignore AI completely - It's too powerful a tool to have at our disposal but even as a guide, its guidance should be taken with extreme trepidation. Use it for the easy stuff, it gets that right. Test it with the harder stuff but don't expect it to know what you want, how you want it doing or even what the heck it's doing.
And at the end of the day, I'm perfectly capable of writing my own g'damn bugs.