Copilot. Love it or hate AI, copilot is a useful tool but after using it for a couple of weeks, with intense activity over the last few days I have to say I have mixed feelings about it as a tool.
I think like any other tool, or human for that matter, there are times copilot gets it right, and there are times it really gets it wrong.
To give an example of where it seems to consistently get it wrong, I want to look at testing.
Now when it comes to testing in golang
I am not at my finest. In previous roles I was almost pious about it. "You will write tests, you must cover every line and every branch" - but after an extended stint in a company where writing tests wasn't just frowned upon but was actively discouraged, I found my own attitude to writing them changed. Therefore, until recently I've had an extremely relaxed attitude towards them which has left me learning how to write them all over again.
As I have come to write more and more in golang
, I find myself in a position where I not only need to write tests, but have recovered some of my earlier piousness towards them and actively want to write tests however having never written tests in this language before, I needed some help.
In one of the projects I'm currently working on I have a HTTP transport package that I'm using throughout the application. Nothing fancy just a simple 4 method interface of Get | Post | Do | DoWithBackoff
type HttpClient interface {
Get(ctx context.Context, url string, recv any) error
Post(ctx context.Context, url string, recv, send any) error
DoWithBackoff(ctx context.Context, req *http.Request, recv any) error
Do(ctx context.Context, req *http.Request, recv any) error
}
Like I said, nothing fancy.
Alongside this I have defined a mock for testing, again, nothing too clever but allows me to table responses back into the application:
type MockHttpClient struct {
responses []struct {
code int
body string
}
}
func (m *MockHttpClient) Get(ctx context.Context, url string, recv any) error {
return m.DoWithBackoff(context.Background(), nil, recv)
}
func (m *MockHttpClient) Post(ctx context.Context, url string, recv, send any) error {
return m.DoWithBackoff(context.Background(), nil, recv)
}
func (m *MockHttpClient) DoWithBackoff(ctx context.Context, req *http.Request, response any) error {
return m.Do(context.Background(), req, response)
}
func (m *MockHttpClient) Do(ctx context.Context, req *http.Request, recv any) error {
if len(m.responses) == 0 {
return nil
}
response := m.responses[0]
m.responses = m.responses[1:]
switch response.code {
case 400, 401, 403, 404, 501:
return &transport.ErrUnknown{
Code: response.code,
Body: []byte(response.body),
}
case 429, 500, 502, 503, 504:
return m.Do(ctx, req, recv)
}
err := json.Unmarshal([]byte(response.body), recv)
return err
}
Now it's worth noting at this point that the mock currently doesn't live inside the transport
package but instead lives inside the cmd
package that it was originally written for. That's not an oversight - I tend to keep code with the original package, until I need it to be more generic and as soon as I find myself starting to duplicate code, that's when I refactor it out.
So lets look at what I need testing:
Run: func(cmd *cobra.Command, args []string) {
var (
password, token string
err error
kdf types.KDFInfo
req *http.Request
ctx context.Context = context.Background()
)
...
var localAddress string = fmt.Sprintf("https://%s:%d",
vaultItem.Server, vaultItem.Port)
if err = transport.HttpClient.Get(
ctx, localAddress+"/api/v1/kdf", &kdf,
); err != nil {
fatal("unable to get kdf info: %q", err)
return
}
...
req, err = http.NewRequest(
"POST", localAddress+"/api/v1/revoke", nil)
if err != nil {
fatal("unable to create request for %s: %q", address, err)
return
}
var r struct {
Code int `json:"statuscode"`
Message string `json:"message"`
}
if err = transport.HttpClient.DoWithBackoff(ctx, req, &r); err != nil {
fatal("unable to send request for %s: %q", address, err)
return
}
if r.Code != 204 {
fatal("Failed to revoke token: %s", r.Message)
return
}
...
},
One of the things I've come to appreciate about copilot
is as long as you're paying attention, when you start writing something it auto-completes the line for you and gets it right. This is important to remember as for vast swathes of code you can literally just tab return tab return.
This is perfect up to a point but there are two things to note.
- It doesn't always get it right
- It really doesn't know when to stop
If you're not careful, you'll find copilot
writing code that has absolutely no place in the function you're currently developing or simply inventing stuff that seems to make sense but really doesn't fit your style, pattern or even the behaviour of the code under development.
To try and give an example of this, in one place in my current project I have a secret cache.
For very good reason this is a singleton
and must remain a singleton to ensure synchronicity and safety of its contents. Part of the struct is memory locked as it contains a decrypted master key and with this in mind, the singleton pattern fits best to ensure that your master key is not spread all over system memory potentially increasing the attack vector.
Access to the singleton is by calling Instance
var (
secretCache *SecretCache
lock = &sync.Mutex{}
Instance = instance
)
func instance(masterpw, salt string, kdf types.KDFInfo) (*SecretCache, error) {
lock.Lock()
defer lock.Unlock()
if secretCache != nil {
return secretCache, nil
}
secretCache = &SecretCache{
KDF: kdf,
}
err := secretCache.setMasterPassword(masterpw, email)
if err != nil {
err = fmt.Errorf("failed to set master password: %w", err)
}
return secretCache, err
}
Now lets look at how copilot recommends accessing this (this is just one example):
secrets, err := cache.Instance(...)
// create a new secret cache if it doesn't already exist
if cache.Secrets == nil { // non-existant field
cache.Secrets = cache.NewSecretCache() // non-existant constructor
if secrets.HashPassword(password) == "" {
key, salt = cache.DeriveMasterKey(password, email, kdf)
secrets.SetMasterPassword(key, salt)
}
}
secrets = &cache.Secrets
So lets look at this a little.
First off it's blatantly invented details:
cache.Secrets
doesn't exist and never will.secretCache
does and this is deliberately an un-exported field. I don't want direct access to it, ever.NewSecretCache
is a constructor pattern. I would expect this to give me a new instance each and every time I call it.HashPassword
- OK I can kind of understand this call as it wants to check if the master password has been set but this method actually creates apbkdf2.Key
and returns it as abase64
encoded string which would never be equal to an empty string. In fact you can try a version of this for yourself here: https://go.dev/play/p/2_rZAg-HJNpcache.DeriveMasterKey
This exists yes. In thecrypto
package.SetMasterPassword
is in reality, the un-exported methodsetMasterPassword
Now at first glance, apart from the obvious syntax errors, someone looking at this might just think that they need to implement the missing / broken parts of the code. It looks viable. It almost makes sense even though it's completely wrong and would break a critical section of the project.
Whats more worrying about this is that someone new to a project might think that this was OK, implement the missing methods and even if they fixed the obvious bug (call cache.MasterPassword
instead of trying to compare HashPassword
), this would still fundamentally break a critical part of a codebase in a way that would undermine the safety and security of a password database.
It seems fairly innocuous after the above example but even generate tests
has this flaw. In fact when trying to run this, I've seen at least 3 different testing patterns when I'd be happy with the consistency of just one.
In most instances, I've only seen simple methods working out of the box. In many instances they might need one or two tweaks to get working but in the vast majority of cases the tests written by copilot
have zero relationship to the code they are trying to test and it would been more help if it had simply given me the empty test stub
func TestSomething (t *testing.T) {
// put some test code here
}
Is this a fair statement? In my opinion yes.
In the first few tests I had it generate I'd try and fix the errors in the test and make it work but in the majority of cases I found myself ripping out what copilot
had written and starting again from scratch.
Whilst on this project, this isn't costing me anything other than my own time, I could imagine this being expensive inside a company. If you can't tell at a glance if something is going to waste hours of debugging, only for it to be rewritten from the ground up then the cost of using that tool outweighs the benefits it might bring.
So how did copilot
try and test my Run
function?
It made shit up of course.
func TestRevokeToken(t *testing.T) {
mock := transport.MockHttpClient{}
mock.Code = 200
mock.Body = []byte("Token revoked successfully for address 127.0.0.1")
transport.SetDefaultClient(mock) // doesn't exist
...
ctx := context.Background()
ctx = context.WithValue(ctx, types.AuthToken{}, encryptedToken)
mock.ExpectedGet(ctx, url).WillReturnJson(`{"message": "kdf info"}`)
mock.ExpectedPost(ctx, url, data).WillReturnJson(`{"message": "token revoked"}`)
cmd := revokeCmd
...
cmd.Execute()
if err := mock.AssertExpected(); err != nil {
t.Errorf("Assertions failed: expected %q actual %q", expected, mock.Assertions())
}
}
I'll be fair, this is a little bit of a rant against copilot
. Don't get me wrong though, I've seen many cases where copilot really works well. When writing test tables, there's a lot of automation that can be done there and for some tests all I ever had to do was tab > return > tab > return and add in the occasional missing comma where I'd forget
tab > tab > return > ...
I can see a powerful future for AI in some aspects of development but I would never trust it near anything complicated.
One of my greatest concerns is that the bugs it writes are subtle. They look like they might work and are simply highlighting missing methods.
This would be a mistake to believe and a trap to fall into.
Honestly, I'm torn on this part because I have personally also found it damn useful on the projects I've tried it on. However I know that code, I wrote it myself, I'm familiar with its concepts and where I want it to go.
That said though, many engineers are not always familiar enough with the architecture of the code they are working with to know that changing from a singleton to a constructor pattern could have disastrous consequences on security of an entire application without some serious forethought around it (and potentially tons of refactoring).
We shouldn't ignore AI completely - It's too powerful a tool to have at our disposal but even as a guide, its guidance should be taken with extreme trepidation. Use it for the easy stuff, it gets that right. Test it with the harder stuff but don't expect it to know what you want, how you want it doing or even what the heck it's doing.
And at the end of the day, I'm perfectly capable of writing my own g'damn bugs.