Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Risk Assessment of GitHub Copilot

Risk Assessment of GitHub Copilot

0xabad1dea, July 2021

this is a rough draft and may be updated with more examples

GitHub was kind enough to grant me swift access to the Copilot test phase despite me @'ing them several hundred times about ICE. I would like to examine it not in terms of productivity, but security. How risky is it to allow an AI to write some or all of your code?

Ultimately, a human being must take responsibility for every line of code that is committed. AI should not be used for "responsibility washing." However, Copilot is a tool, and workers need their tools to be reliable. A carpenter doesn't have to worry about her hammer suddenly turning evil and creating structural weaknesses in a building. Programmers should be able to use their tools with confidence, without worrying about proverbial foot-guns.

A follower of mine on Twitter joked that they can't wait to allow Copilot to write a function to validate JSON web tokens so they can commit the code without even looking. I prompted Copilot accordingly and the result was truly comedic:

function validateUserJWT(jwt: string): boolean {
    return true;
}

This is the worst possible implementation short of also deleting your hard drive, but it's so obviously, trivially wrong that no professional programmer could think otherwise. I am more interested in whether Copilot generates bad code that looks reasonable at first glance, something that might slip by a programmer in a hurry, or seem correct to a less experienced coder. (Spoiler: yes.)

I will be intentionally pushing Copilot to produce code that should really be done by hand, because users absolutely will do that in production.

Nondeterminism

Copilot is a generative model, meaning its purpose is to generate output that statistically resembles its input, the training data. (GitHub has not provided a complete list of training data, but it appears to be roughly everything publicly available on GitHub.) The goal is not to exactly reproduce the input, because you definitely do not need a machine learning system for that. This type of model has a parameter frequently called the "temperature," which is a sliding scale between conservatism (mimicking the most common inputs) and originality. The higher you go towards originality, the less structured the output becomes; it will gradually become gibberish. GitHub controls the temperature of Copilot, and they are presumably fine-tuning it based on the current trial phase.

Prompting a generative model the same way twice will often not get you the same output twice. It will run its statistical model anew each time. This has its advantages; if you don't like the model's first suggestion, you can just ask again. It also reduces the amount of identical code being generated amongst different users of Copilot. However, this is at odds with one of the most basic principles of reliability: determinism. It might complete your boilerplate exactly the way you want one day, and do it all wrong the next.

As an example, here are two different implementations of a moon phase calculator from the exact same input (a file containing only the comment, the keyword "function," and one space):

Moon Phase Calculator A

// returns the current phase of the moon
function getPhase() {
    var phase = Math.floor((new Date().getTime() - new Date().setHours(0,0,0,0)) / 86400000) % 28;
    if (phase == 0) {
        return "New Moon";
    } else if (phase == 1) {
        return "Waxing Crescent";
    } else if (phase == 2) {
        return "First Quarter";
    } else if (phase == 3) {
        return "Waxing Gibbous";
    } else if (phase == 4) {
        return "Full Moon";
    } else if (phase == 5) {
        return "Waning Gibbous";
    } else if (phase == 6) {
        return "Last Quarter";
    } else if (phase == 7) {
        return "Waning Crescent";
    }
}

Moon Phase Calculator B

// returns the current phase of the moon
function getPhase() {
    var phase = Math.floor((new Date().getTime() - new Date(new Date().getFullYear(), 0, 0).getTime()) / (1000 * 60 * 60 * 24)) + 1;
    if (phase > 29) {
        phase = 29;
    }
    return phase;
}

It's clear at a glance that they're doing roughly the same thing, but one is returning a text label for one of eight multi-day phases, and the other seems to be treating every day of the lunar month as a unique phase with no text label. Oh, and they're both wrong. Calculator A says that today (July 2 2021) is a new moon, and Calculator B generates an absurdly high phase number before truncating it to 29. Human-written astronomy websites say that today is a waning crescent. Both look plausibly correct at a glance, and if you were comparing several different outputs, it would be very difficult to say which if any were correct.

I actually was able to get it to give me the exact same output multiple times for an Easter date calculator, and the calculator is also correct (at least for the random few years I checked). I suspect this means that Easter calculator implementations have much less deviation from one another than lunar calculators.

Easter calculator

// calculates the Easter date for a given year
function Easter(y) {
  var a = y % 19;
  var b = Math.floor(y / 100);
  var c = y % 100;
  var d = Math.floor(b / 4);
  var e = b % 4;
  var f = Math.floor((b + 8) / 25);
  var g = Math.floor((b - f + 1) / 3);
  var h = (19 * a + b - d - g + 15) % 30;
  var i = Math.floor(c / 4);
  var k = c % 4;
  var l = (32 + 2 * e + 2 * i - h - k) % 7;
  var m = Math.floor((a + 11 * h + 22 * l) / 451);
  var n0 = (h + l + 7 * m + 114)
  var n = Math.floor(n0 / 31) - 1;
  var p = n0 % 31 + 1;
  return new Date(y, n, p);
}

Inscrutability

The Easter calculator above is correct, but I only know that experimentally; it's utterly inscrutable. (Update: someone in the comments pointed out it has a typo affecting a tiny subset of years – a bug which resisted my testing!) Copilot can and sometimes does add its own comments, but it did not bother here. The variables also have totally useless names. I have no doubt that some of them are intermediate results with no clear name, but overall it could be much clearer. Sometimes going back and prompting with the beginning of a comment will get Copilot to try to explain. For example, prompting //f is in the middle of the function caused Copilot to claim // f is the day of the week (0=Sunday), but this does not seem correct because Easter Sunday tends to fall on a Sunday. It also claimed // Code from http://www.codeproject.com/Articles/1114/Easter-Calculator, which appears to have never been a real URL. The comments generated by Copilot are sometimes correct, but are not reliable.

I tried a variety of time-related functions, and the Easter calculator was the only one that was correct. Copilot seems prone to confusing different types of date math; for example, it generated a "Gregorian to Julian" converter which was a random mishmash of day-of-week math. It is very difficult even for an experienced programmer to discern correct time conversion math from statistically similar nonsense.

Keys and Other Secrets

Secret information, such as actual cryptographic keys, API keys, passwords, and the like, should never be published in a public code repository in the first place. GitHub actively scans for such keys and warns the repository holder if one is detected. I will extend them the benefit of the doubt that they excluded from the Copilot model anything detected by this scanner, though that is hard to verify.

This type of data is (hopefully) high entropy, so a model like Copilot is very unlikely to exactly memorize them from only one exposure. If you try to prompt it, Copilot will generally either give you an obvious placeholder like "1234" or a blotch of hexadecimal that looks random at first glance but largely alternates between 0-9 and A-F digits. (Do not use this to intentionally generate random numbers. They are grammatically structured and Copilot may suggest the same number to someone else.) However, it is still possible to recover real keys from Copilot, especially if you open a pane with ten suggestions rather than just one. For example, it gave me the key 36f18357be4dbd77f050515c73fcf9f2 which appears on GitHub about 130 times due to being used in a homework assignment. Anything that appears on GitHub 100+ times is unlikely to be actually sensitive. The most realistic risk here is a naive programmer accepting an autocomplete for a cryptographic key which sets it to be a random-looking but dangerously low-entropy value.

Prompting for passwords yields a variety of amusingly insecure specimens that are probably largely meant as placeholders in the training data. The crowd favorite was "mongoose." The fact that it generated curse words may be an issue for some users.

License Washing

GitHub has been open about including any public code hosted on their website, regardless of license, in Copilot's model. They clearly are of the opinion that this counts as fair use and is not subject to licensing restrictions; whether or not this holds up in court... remains to be seen.

It is trivially verifiable that this includes GPL code, as Copilot can easily recite the GPL license text from memory. It's also easy to get it to write code that resembles specific GPL projects with unique naming conventions.

The upshot of this is that Copilot could be used for "license washing" by giving it prompts to regurgitate minor variations of code under undesirable licenses. This may or may not suddenly become a huge legal liability for anyone using Copilot.

Example security flaw: HTML Parser in C

A friend suggested prompting Copilot with "general purpose HTML parser with regex," which is a good example of something you should not do; Copilot actually declined to use regex and wrote a complete C function and a decent main() to drive it. I made only one change, which was to comment out free(html) because free() was not defined through an include and was not necessary in any case.

#include <stdio.h>

// general purpose HTML parser with regex
void parse(char *html) {
    int i = 0;
    while (html[i] != '\0') {
        if (html[i] == '<') {
            // find the end of the tag
            int j = i + 1;
            while (html[j] != '>') {
                j++;
            }
            // find the end of the attribute
            int k = j + 1;
            while (html[k] != ' ' && html[k] != '\0') {
                k++;
            }
            // copy the tag and attribute to a new string
            char tag[j - i + 1];
            char attr[k - j + 1];
            for (int m = 0; m < j - i; m++) {
                tag[m] = html[i + m];
            }
            tag[j - i] = '\0';

            for (int m = 0; m < k - j; m++) {
                attr[m] = html[j + m];
            }
            attr[k - j] = '\0';
            printf("%s %s\n", tag, attr);
            // skip the end of the tag
            i = k;
        } else {
            i++;
        }
    }
}



int main(int argc, char *argv[]) {
    if (argc < 2) {
        printf("usage: %s <html file>\n", argv[0]);
        return 1;
    }

    FILE *fp = fopen(argv[1], "r");
    if (fp == NULL) {
        printf("error opening %s\n", argv[1]);
        return 1;
    }
    char *html = NULL;
    size_t len = 0;
    if (getline(&html, &len, fp) == -1) {
        printf("error reading %s\n", argv[1]);
        return 1;
    }
    parse(html);
    //free(html); // copilot inserted this but it's not declared and not necessary anyway
    return 0;
}

This is well-formed and even commented C that sure looks like it parses HTML, and the main function has some useful boilerplate around opening the file. However, the parsing is loaded with issues.

  • First, and most importantly, if the parsed string contains no >, the parser will run off the end of the buffer and crash.
  • The parser has no awareness of literal > vs quoted ">" and will take the first > it sees without considering its grammatical function.
  • Its definition of an attribute is a non-whitespace sequence starting immediately after the closing bracket. For example, in <body>glory, glory is an attribute. Attributes belong inside their tag and there can be more than one whitespace-delimited attribute.
  • No ability to indicate any error conditions

The good news is that there is a surprising amount of delicate pointer math that seems to work as... intended? It's kind of hard to say what's intended because this function does not quite do anything useful, though it's 80% of the way to something that could conceivably be considered a basic parser. The trivially triggered crash from running off the end of the buffer is, of course, a fatal security issue.

Example security flaw: SQL Injection in PHP

The first two lines are my prompt.

<?php
 // connects to the database
 include('../db_connect.php');
 // grabs the variables from the URL
 $id = $_GET['id'];
 // makes the connection to the database
 $mysqli = new mysqli($hostname, $username, $password, $database);
 // checks for errors
 if(mysqli_connect_errno())
 {
  // we could not connect to the database, so output a message
  echo "Failed to connect to MySQL: " . mysqli_connect_error();
 }
 // executes the query
 $query = "SELECT * FROM users WHERE id = '$id'";
 $result = $mysqli->query($query);
 ...

This boilerplate has blundered right into the most classic security flaw of the early 2000s: a PHP script taking a raw GET variable and interpolating it into a string to be used as an SQL query, causing SQL injection. It's hard to blame PHP beginners for making this mistake, because the PHP documentation and ecosystem is setting them up to fail. Now PHP's notorious propensity for security issues is infecting even non-human life.

Furthermore, when prompted with shell_exec(), Copilot was happy to pass raw GET variables to the command line.

Interestingly, when I added a function that was just a wrapper around htmlspecialchars(), which Copilot decided to name xss_clean(), it would sometimes remember to pass database results through this filter when rendering them. But only sometimes.

Example security flaw: Off By One

I prompted Copilot for a basic listening socket. It helpfully wrote a large amount of boilerplate and it compiled effortlessly. However, the function that does the actual listening has a basic off-by-one buffer error.

// a function that opens a socket and accepts commands into a buffer
int accept_commands(int sockfd) {
   char buffer[1024];
   int n;

   while (1) {
       n = read(sockfd, buffer, sizeof(buffer));
       if (n < 0) {
           perror("read");
           exit(1);
       }
       if (n == 0) {
           printf("connection closed\n");
           exit(0);
       }
       buffer[n] = '\0';
       printf("%s\n", buffer);
   }
   return 0;
}

buffer[n] can point one past the end of the buffer if it is filled, causing an out-of-bounds NUL write. This is a good example of the sorts of little bugs that grow like weeds in C; it may or may not be exploitable in practice. It would be very easy for a programmer using Copilot to accept this sort of code and not notice the off-by-one.

Conclusion

These three example pieces of flawed code did not require any cajoling; Copilot was happy to write them from straightforward requests for functional code. The inevitable conclusion is that Copilot can and will write security vulnerabilities on a regular basis, especially in memory-unsafe languages.

Copilot excels at producing boilerplate that may bog down programmers trying to get to the good part, and is highly accurate at guessing the correct constants and setup functions and so on and so forth. However, relying on it for application logic can quickly go astray. This is partly due to the fact that Copilot cannot always maintain sufficient context to write correct code across many lines, and partly because there is a lot of buggy code on GitHub. There does not appear to be any systematic separation of professionally produced code from beginners' homework assignments in the model. Neural network see, neural network do.

Treat any application logic produced by Copilot with healthy skepticism. As a code reviewer, I would want clear indications about which code is Copilot-generated. I do not expect this situation to ever be fully resolved; these issues are fundamental to how generative models work. There may be incremental improvements, but Copilot will always be capable of emitting flawed code as long as it's capable of emitting code.

@AshleyPinner

This comment has been minimized.

Copy link

@AshleyPinner AshleyPinner commented Jul 12, 2021

Slightly off-topic, but here we go:

It's hard to blame PHP beginners for making this mistake, because the PHP documentation and ecosystem is setting them up to fail

I mean, the documentation page for mysqli_query literally says this on it in large letters in a red box:

Warning
Security warning: SQL injection
If the query contains any variable input then parameterized prepared statements should be used instead. Alternatively, the data must be properly formatted and all strings must be escaped using the mysqli_real_escape_string() function.

(soft reminder that mysqli_real_escape_string() is a function from the C lib itself, not a fabrication by PHP devs!)

The manual also has a section on SQL injection, so the idea that the documentation sets them up for failure is incorrect.

As for the ecosystem, with things like packagist and frameworks offering libs and ways to interact with databases in ways that prevent such obvious SQL injection, along with 3rd party resources like php the right way, the ecosystem is vastly trying to get people to do things from the get-go in a useful way.

I'd like to know what more can be done? The impression that the ecosystem and documentation don't prevent this, despite the contiarty, indicate the message still isn't reaching people. The language and the ecosystem of PHP has evolved in the past 20 years, how do we communicate that across to people who haven't checked back in with PHP recently?

@Donnie

This comment has been minimized.

Copy link

@Donnie Donnie commented Jul 12, 2021

Using GPT for writing code is like monkey on a typewriter. ba-dum-tsss

I totally agree with you.

@Genroa

This comment has been minimized.

Copy link

@Genroa Genroa commented Jul 12, 2021

I'd like to know what more can be done?

We should never teach people to produce SQL queries by building a string by themselves. They should use PDO or an equivalent to prepare+execute them, and all other solutions should be deprecated and then removed. Building queries by hand using mysqli is still the default explanation in a lot of schools, because the language still lets people do that easily.

@marcan

This comment has been minimized.

Copy link

@marcan marcan commented Jul 12, 2021

I'd like to know what more can be done? The impression that the ecosystem and documentation don't prevent this, despite the contiarty, indicate the message still isn't reaching people. The language and the ecosystem of PHP has evolved in the past 20 years, how do we communicate that across to people who haven't checked back in with PHP recently?

Hi, and welcome to 2021! I recommend traveling back to November 2020, when the mysqli documentation was still teaching people to interpolate strings into queries without proper escaping.

They might've fixed that one in the past few months, but that's about two decades too late, and it's going to take another two decades to undo the damage.

@0xabad1dea

This comment has been minimized.

Copy link
Owner Author

@0xabad1dea 0xabad1dea commented Jul 12, 2021

The manual also has a section on SQL injection, so the idea that the documentation sets them up for failure is incorrect.

It is understandable if you saw the existence of that page and assumed it is helpful, but it is not. The entire concept of parametrized/prepared statements is reduced to one sentence jammed somewhere in the middle, and the single "correct" example does not use prepared statements, it uses sprintf. The average beginner is not going to walk away from this page with a correct understanding, assuming they even find this page in the first place.

And of course, googling "php sql" opens a portal to the hell dimension.

@aarondfrancis

This comment has been minimized.

Copy link

@aarondfrancis aarondfrancis commented Jul 12, 2021

The language and the ecosystem of PHP has evolved in the past 20 years, how do we communicate that across to people who haven't checked back in with PHP recently?

Thank you for saying this. It's really popular to say "PHP iS a FrAcTal of BaD DeSiGn" when you haven't actually worked with the language in 5+ years.

@Gby56

This comment has been minimized.

Copy link

@Gby56 Gby56 commented Jul 12, 2021

It's not like there were enough cheatsheets of proper implementations of things all around OWASP and the SEI CERT Coding Standards (https://wiki.sei.cmu.edu/confluence/display/seccode) right....?
M$ decided they wanted to reuse their previous "learning" Twitter bot experience ?
What I always wonder: how much energy did that consume to build the equivalent of a bad "offline" stack overflow, not even far from the parody https://github.com/hieunc229/copilot-clone

@mrphlip

This comment has been minimized.

Copy link

@mrphlip mrphlip commented Jul 12, 2021

For reference, the inscrutable Easter calculation matches a standard algorithm listed on Wikipedia (and, presumably, elsewhere). I've used it in projects before. So it's not susprising that it has appeared in enough places, unmodified, that Copilot can reproduce it verbatim.

However, it's not, quite. Your listing calculates n0 as h + l + 7m + 114 while 'kipedia gives it as h + l - 7m + 114... Now, I couldn't tell you what that means, this algorithm is inscrutable. But presumably for the test years you tried, whatever m represents is zero? It does look like it would be zero most of the time, since it's adding a bunch of smaller terms, dividing by 451 and taking the floor? Must be some rarely-used correction term of some kind.

Which only goes to prove the point, that these snippets from Copilot can easily have subtle bugs that may not come out in testing, and if no humans involved in the dev process know what the code is doing, they can slip past unnoticed, until suddenly it breaks...

@voltagex

This comment has been minimized.

Copy link

@voltagex voltagex commented Jul 12, 2021

And of course, googling "php sql" opens a portal to the hell dimension.

I've wanted to do a site along the lines of "your tutorial is bad and you should feel bad" but I don't know how to do it without being gatekeeper-y. The way (mostly free) resources are ranked by search engines and then not updated must have led to a lot of problems. I think we're seeing some of that with CoPilot, in a way.

@minimaxir

This comment has been minimized.

Copy link

@minimaxir minimaxir commented Jul 12, 2021

GitHub controls the temperature of Copilot, and they are presumably fine-tuning it based on the current trial phase.

You can change the temperature from 0.0 - 1.0 manually using a VS Code extension setting.

It's unclear what the default temperature is, but the model gets very fun at 1.0.

@plamentotev

This comment has been minimized.

Copy link

@plamentotev plamentotev commented Jul 12, 2021

The Easter calculator raises another interesting question. Orthodox Easter usually is not on the same date as the rest of the Christian world, so if it works for you, most likely it would not work for me (some years the date is the same, but usually it is not). And if you need an example where cultural differences can cause serious issues you can check the cause of the Mars climate orbiter failure.

@Serdan

This comment has been minimized.

Copy link

@Serdan Serdan commented Jul 12, 2021

@aarondfrancis - I've worked with PHP for the past three years and my professional opinion based on very recent experience, is that PHP remains a clusterf* of bad design. That doesn't mean it's unusable or that you can't write good code in PHP (I'd like to think I've done so myself), and the project has certainly come a long way, but removing all the bad design overnight would still break absolutely everything.

To answer @AshleyPinner 's question: Core devs are on the right track. If they keep it up I may actually be able to recommend PHP to other devs in another five years time. The problem is that there are still enough footguns and weird edge-cases where I can't recommend it without a metric ton of asterisks attached, which means I can't recommend it to beginners at all.

@dvogel

This comment has been minimized.

Copy link

@dvogel dvogel commented Jul 12, 2021

The aspect of these examples that is most concerning to me is Github's marketing claim that it will also write tests for you:

Screenshot from 2021-07-12 11-07-20

I suspect this marketing claim is stretching a bit but the promise of models like this is that they will get better over time so it's plausible to me that it could get there. Did you try to get Copilot to generate tests for any of these buggy examples?

@rbeverly

This comment has been minimized.

Copy link

@rbeverly rbeverly commented Jul 12, 2021

Thanks, this is an interesting write-up. It's worth considering that new would-be developers may enter the workforce with the assumption that tools like this will "make their jobs easier." That could definitely cause problems down the line, as they fail to learn the actual concepts and practices they need to write good, secure code.

I can also imagine the people who stuff their personal repos with plagiarized code using this. Weeding out bad hires, contractors and freelancers could get worse, because they'll be able to produce somewhat convincing code that might actually run. And it won't be identical to whatever project they swiped it from (which makes it harder to identify as "not really their code"). A little like camouflage for incompetent coders? If you don't look closely and test their code, you might even mistake them for a real developer.

@rbeverly

This comment has been minimized.

Copy link

@rbeverly rbeverly commented Jul 12, 2021

The aspect of these examples that is most concerning to me is Github's marketing claim that it will also write tests for you:

Screenshot from 2021-07-12 11-07-20

I suspect this marketing claim is stretching a bit but the promise of models like this is that they will get better over time so it's plausible to me that it could get there. Did you try to get Copilot to generate tests for any of these buggy examples?

That seems even more terrifying. Not only is the code insecure, but it writes the tests too. Hey the test passed! Everything must be Ok.

@0xabad1dea

This comment has been minimized.

Copy link
Owner Author

@0xabad1dea 0xabad1dea commented Jul 12, 2021

GitHub controls the temperature of Copilot, and they are presumably fine-tuning it based on the current trial phase.

You can change the temperature from 0.0 - 1.0 manually using a VS Code extension setting.

It's unclear what the default temperature is, but the model gets very fun at 1.0.

Strange, I do not seem to have this setting, even after reloading the plugin. Maybe it's only available to some users right now.

edit: ah, I found it, it doesn't expose it in the interface.

@minimaxir

This comment has been minimized.

Copy link

@minimaxir minimaxir commented Jul 12, 2021

edit: ah, I found it, it doesn't expose it in the interface.

FWIW I already flagged it as an issue to make temperature/stops more visible as they are very helpful features.

@waffletower

This comment has been minimized.

Copy link

@waffletower waffletower commented Jul 12, 2021

The language and the ecosystem of PHP has evolved in the past 20 years, how do we communicate that across to people who haven't checked back in with PHP recently?

Thank you for saying this. It's really popular to say "PHP iS a FrAcTal of BaD DeSiGn" when you haven't actually worked with the language in 5+ years.

It's really popular to say "dumping DDT in the ocean is bad" when you haven't actually dumped DDT in the ocean for 5+ years. My apologies for trolling PHP apologists.

@michaelcullum

This comment has been minimized.

Copy link

@michaelcullum michaelcullum commented Jul 12, 2021

I'd like to know what more can be done? The impression that the ecosystem and documentation don't prevent this, despite the contiarty, indicate the message still isn't reaching people. The language and the ecosystem of PHP has evolved in the past 20 years, how do we communicate that across to people who haven't checked back in with PHP recently?

Hi, and welcome to 2021! I recommend traveling back to November 2020, when the mysqli documentation was still teaching people to interpolate strings into queries without proper escaping.

They might've fixed that one in the past few months, but that's about two decades too late, and it's going to take another two decades to undo the damage.

@marcan That page you linked to casts the variable to an integer to prevent that and even talks about this in the comments:

// ... Let's make it
// default to 1, and cast it to an integer as to avoid SQL injection
// and/or related security problems

And even so, I'm not sure it helps to sling about how things used to be not as good as they are now. As @AshleyPinner said, let be productive and look at what more can be done now and in the future to better coach people down better paths if we're going to be commenting on such things.

@ArthurClune

This comment has been minimized.

Copy link

@ArthurClune ArthurClune commented Jul 12, 2021

The original OpenAI paper describes the training set at a (very) high level: https://arxiv.org/abs/2107.03374

@chx

This comment has been minimized.

Copy link

@chx chx commented Jul 12, 2021

"because the PHP documentation and ecosystem is setting them up to fail."

Sigh where? php.net documentation has been explicitly suggesting good practices since 2005 at least (possibly earlier too). This is offensive to the people working on that documentation. Also, what ecosystem are you talking of? It's been nine years since composer released. Or are you saying there are popular database abstraction libraries which can be had via composer require and are doing the wrong thing? Where?

@marcan

This comment has been minimized.

Copy link

@marcan marcan commented Jul 13, 2021

"because the PHP documentation and ecosystem is setting them up to fail."

Sigh where? php.net documentation has been explicitly suggesting good practices since 2005 at least

php.net documentation has been explicitly teaching people terrible practices in its beginner-friendly documentation until last November, at least.

Screen Shot 2021-07-13 at 13 40 47

@marcan That page you linked to casts the variable to an integer to prevent that and even talks about this in the comments:

Yes, and yet mysqli supports prepared statements which are the correct way to do this, not a cast. The code isn't exploitable, but the practices it teaches are. That's bad. It completely defeats the purpose of using mysqli instead of mysql.

And even so, I'm not sure it helps to sling about how things used to be not as good as they are now.

The PHP community has been teaching people to do the wrong thing for two decades, so you can't expect people to start doing the right thing after you fix the docs. Things are getting better? Great! Now we just need to wait a couple more decades for the damage to be undone. It takes a long time to undo two decades of educational negligence. Broken PHP code is all over the web. You can't fix rampant bad practices in a community overnight. So don't be surprised that people continue to point out the bad habits of the PHP community for many years to come. That is the cost of not caring about this for so long.

@oodler577

This comment has been minimized.

Copy link

@oodler577 oodler577 commented Jul 13, 2021

It be groovy if there was a open source license that barred any of its controlled code or derivatives thereof from being used to train or included in AI generated code.

@thehydrogen

This comment has been minimized.

Copy link

@thehydrogen thehydrogen commented Jul 13, 2021

Very interesting, I hadn't thought about it creating bugs yet. I think a great follow-up would be asking it to create tests for the code it generated, just as someone else suggested.

Also, can all the people in the comments stop arguing about PHP? It's not even remotely related to the main point of this write-up.

@JavierLuna

This comment has been minimized.

Copy link

@JavierLuna JavierLuna commented Jul 13, 2021

It's also easy to get it to write code that resembles specific GPL projects with unique naming conventions.

Could you expand on this with an example? How have you verified that the naming conventions were in fact unique?

As a code reviewer, I would want clear indications about which code is Copilot-generated

I feel like this would unnecessary bias reviewers. We should be always thorough when reviewing code, independent of who (or what) wrote the code.

There may be incremental improvements, but Copilot will always be capable of emitting flawed code as long as it's capable of emitting code.

So.. like humans? 😄

One of the topics I was expecting this risk assessment to cover is what happens when you are coding on your/company's project and you accept one of the suggestion that Copilot gives you. I'm guessing that is used to train the model using some kind of federated learning, but are we consenting to that? If they use some kind of anonymization of our code, is that legal?

@0xabad1dea

This comment has been minimized.

Copy link
Owner Author

@0xabad1dea 0xabad1dea commented Jul 13, 2021

One of the topics I was expecting this risk assessment to cover is what happens when you are coding on your/company's project and you accept one of the suggestion that Copilot gives you. I'm guessing that is used to train the model using some kind of federated learning, but are we consenting to that? If they use some kind of anonymization of our code, is that legal?

This is all spelled out on the Copilot homepage, but:

Your local, private code in the editor does not feed back into the model. Even if GitHub hadn't said they won't do that, it would not make much sense to do that, as the code would usually be incomplete and untested.

They do record which suggestions are accepted. It's unclear if they maintain a record of which user accepted which suggestion; if they did, then in principle, they could infer a lot about the kind of code you're writing.

@JavierLuna

This comment has been minimized.

Copy link

@JavierLuna JavierLuna commented Jul 13, 2021

Could you point me to the bit you are mentioning? (EDIT: Found it 😄 )
All I could find is this bit, which let me a bit more worried than before:

In order to generate suggestions, GitHub Copilot transmits part of the file you are editing to the service

Link

And...

Inspection of the gathered source code will be predominantly automatic, and when humans read it, it is specifically with the aim of improving the model or detecting abuse.

Link

So it does send my company's proprietary code to github and they have the capability to read it, unless I am missing something.

That would be a no no for a close source project and company developed products.

@AshleyPinner

This comment has been minimized.

Copy link

@AshleyPinner AshleyPinner commented Jul 13, 2021

So it does send my company's proprietary code to github and they have the capability to read it, unless I am missing something.

I think the bit they send is the current line and comment above it, eg:

//this generates numbers from the fibonacci sequence
function fibonacci

This is what would be needed to get Copilot to spit out the rest of the code. However, someone would, for sure, need to check the actual data that is being sent!

That would be a no no for a close source project and company developed products.

I think this is true for anyone who isn't writing GPL-licenced code, or maybe even everyone, as you're going to be getting a mix of licences for the code with no ability to know which licence the original code came from. For sure this is something that will end up being tested in court otherwise.

@togakangaroo

This comment has been minimized.

Copy link

@togakangaroo togakangaroo commented Jul 13, 2021

Front-line manager hat: Because the code often times seems to have little bugs and inconsistencies, an interest in using (a more mature version of) CoPilot would also be a good reason to promote the use of mutation testing.

@kennyk

This comment has been minimized.

Copy link

@kennyk kennyk commented Jul 13, 2021

I wonder at what point the model starts to feedback onto itself, and what will be amplified due to this. Meaning, people will publish CoPilot-generated code to GitHub, which will be again ingested by the model, and so on. Or perhaps the volume of generated code will be insufficient to make a difference…

@Eitz

This comment has been minimized.

Copy link

@Eitz Eitz commented Jul 14, 2021

Front-line manager hat: Because the code often times seems to have little bugs and inconsistencies, an interest in using (a more mature version of) CoPilot would also be a good reason to promote the use of mutation testing.

This is exactly where code generation will shine. With a proper test coverage (also AI generated?) plus maybe some mutation testing the implementation, it might become common practice to let the AI do the hard lifting and having the tests ensuring the applicability of the function. The dev would have to ensure proper functional coverage (both positive and negative) of the desired results for this to work well.

@smurfix

This comment has been minimized.

Copy link

@smurfix smurfix commented Jul 15, 2021

I mean, the documentation page for mysqli_query literally says this on it in large letters in a red box:

… while not actually providing an example for doing it right. It just refers to mysqli_real_escape_string, which (of course) uses string assembly instead of parameterized queries.

The habit of bad patterns, which PHP is infamous for, continues. Referring people to mysqli_bind_param instead would have been a welcome surprise, but no.

@AshleyPinner

This comment has been minimized.

Copy link

@AshleyPinner AshleyPinner commented Jul 15, 2021

It just refers to mysqli_real_escape_string

Which is proof you didn't read the message, as the first link in the message points to parameterized queries: https://www.php.net/manual/en/mysqli.quickstart.prepared-statements.php

@smurfix

This comment has been minimized.

Copy link

@smurfix smurfix commented Jul 15, 2021

Which is proof you didn't read the message

I was referring to the PHP documentation for mysqli_query, not to any message, thus whether or not I read the latter is irrelevant.

@AshleyPinner

This comment has been minimized.

Copy link

@AshleyPinner AshleyPinner commented Jul 15, 2021

I was referring to the PHP documentation for mysqli_query, not to any message, thus whether or not I read the latter is irrelevant.

The message I'm on about is also on the documentation page for mysqli_query as well.

@merlinpatt

This comment has been minimized.

Copy link

@merlinpatt merlinpatt commented Jul 16, 2021

I wonder how easily Copilot could be tricked. Since they're using public repos to generate code samples, one could theoretically post some number of repos with intentionally insecure code samples that look legit to then trick users into using bad code. If used for scripting, it could even harm your own computer.

@mesketh

This comment has been minimized.

Copy link

@mesketh mesketh commented Jul 23, 2021

The Moon Phase Calculator B unnecessarily zero's the month component of the baseline date (now - baseline-date) - funny that it got that wrong since it's fine in version A.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment