Skip to content

Instantly share code, notes, and snippets.

@JD-P
Last active April 30, 2024 01:59
Show Gist options
  • Save JD-P/b47ce9a67666b09302f2f552ff7688f5 to your computer and use it in GitHub Desktop.
Save JD-P/b47ce9a67666b09302f2f552ff7688f5 to your computer and use it in GitHub Desktop.

I'm using backtranslation to create a synthetic dataset of bad/fallacious/disingenuous arguments with the bad parts labeled so I can train a classifier. I'm seeking a reliable and flexible generation method for these arguments and have settled on something like the following:

Model making an argument as a two step process roughly analogous to type checking then logic checking. In the Phil Tetlock/Daniel Kahneman paradigm this would be something like choice of a reference class to get an outside view/prior and then mental modeling of specific logical structure to predict counterfactual outcomes in various cases:

  • Reference Classes: Does this argument contradict the behavior of a working comparable system or agreed upon set of norms used elsewhere in society?
  • Mental Models: Does this argument imply a model that captures the behavior of X correctly?

"Fallacies" as traditionally understood are usually only helping with the type check step, which is important but also unclear to what extent this sort of syntactic evaluation is really going to scale. Type checking is more like a search process, mental modeling is more like the construction of a dynamical system that you predict the macro-scale outcomes of.

Lets perform these two steps together and then you can do the next ones on your own.

We want to start with our controversial subject and position. We then determine salient features of the subject and use them to choose a reference class. From the reference class we make a list of prior arguments implied by how society treats the reference class. One of these becomes the chosen-argument. A list of differences between the prior argument from the reference class and the current subject is made and used to produce the analogical translation of the prior argument into the new situation. Lets try it now with AI regulation:

{"subject":"ai-regulation", "position":"for", "salient-features":["Deep learning requires large capital expenditures", "Deep nets could become smarter than humans or otherwise escape human control", "Deep learning has the possibility to empower small groups more than it empowers large trusted bureaucracies", "Deep nets could be used in the creation of WMDs", "Many experts warn that deep nets could cause human extinction"], "reference-class":"Nuclear Energy", "prior-arguments":["Nuclear technology has an air of secrecy that goes beyond legal statute, it is considered antisocial to share ideas about building things with nuclear power", "Nuclear power is suppressed by government action, various forms of pressure are used to discourage people from finding improvements to nuclear capabilities", "Nuclear technology reaching foreign actors is discussed with negative valence terms like 'proliferation'", "Only the state is allowed to build nuclear weapons", "There are international treaties to restrict and delay the development of nuclear capabilities", "It is illegal to own more than small amounts of material that could go into a nuclear reaction"], "chosen-argument":"Nuclear power is suppressed by government action, various forms of pressure are used to discourage people from finding improvements to nuclear capabilities", "differences":["AI is created from technologies (general purpose computers) that have many other uses, whereas fissile material is mostly useful for making nuclear technology", "From a product standpoint nuclear technologies have one primary use case: Generating energy, which is a commodity market with near perfect competition. Consumers do not notice if their electricity generator changes from coal to nuclear powered steam turbines because electricity is generated in central facilities, by contrast AI products are diverse, differentiated, and have user facing consequences when their design is changed. This makes their regulation much more visible to society.", "Deep learning architectures are usually not terribly complicated to implement and discovering better methods is mostly a matter of testing at small scales guided by the right kind of mathematical intuition"], "analogical-translation":"Given the existential risks posed by AI it's unclear why the government isn't treating this more like nuclear weapons. There are no 'open source' nuclear weapons, nor should there be. There are no nuclear startups, and again it is not clear that there should be. Advancing AI is, in general and without careful direction, bad. It reduces the amount of time we have until AGI that disempowers (and then presumably kills) humanity."}

Now then, our goal here is to generate fallacious arguments, you yourself previously advised I could do that in this process by:

  1. Reference Class Arguments: Generate arguments that contradict established norms or the behavior of comparable systems. These arguments can be created by:

    • Misinterpreting or misapplying established norms or rules.
    • Drawing incorrect comparisons between dissimilar systems or situations.
    • Ignoring relevant context or information when making comparisons.
    • Cherry-picking data or examples that support the argument while ignoring contradictory evidence.
  2. Mental Model Arguments: Generate arguments that imply incorrect or oversimplified models of a given phenomenon. These arguments can be created by:

    • Oversimplifying complex systems or processes.
    • Misunderstanding or misrepresenting cause-and-effect relationships.
    • Ignoring feedback loops or interdependencies between variables.
    • Assuming linear relationships between variables when the relationships are actually nonlinear.
    • Failing to account for randomness or uncertainty in the model.

So here is my proposal. We will take the process I just outlined and then add corruption steps at the choice of reference class, choice of prior argument and during the analogical translation. You will:

  1. Pick the least charitable/most negative reference class you can plausibly get away with.
  2. Pick the prior argument which is the greatest stretch/least analogous to the thing you want to translate it to.
  3. Go out of your way in the analogical translation to do sleight of hand and avoid bringing the differences to the readers mind.
  4. Write out how you did each of these three things in a key appended to the end of the JSON dictionary like {"corruptions":[CORRUPTION1, CORRUPTION2, CORRUPTION3]}.

Do this for the subject of "Genetically Modified Organisms".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment