Skip to content

Instantly share code, notes, and snippets.

@chorn
Created June 29, 2017 11:47
Show Gist options
  • Save chorn/e2257a3ea861e10d342fa0da6062aff7 to your computer and use it in GitHub Desktop.
Save chorn/e2257a3ea861e10d342fa0da6062aff7 to your computer and use it in GitHub Desktop.

footer: @chorn · 2017-06-15 · © RoleModel Software slidenumbers: true slidecount: false autoscale: true build-lists: false theme: Merriweather, 8

Fake News

^

  • I saw a news report about the number of bot accounts that are following Trump.
  • It said that when he was online and tweeting, fleets of twitter bots would tweet conspiracy theories at him.

Detection?

^

  • Naturally my first thought was wow, that's crazy, how would I do that myself?
  • The politics and conspiracy theories were not the part I was interested in, it was the architecture of a solution.
  • Could I detect it? That seems like a Google sized problem.
  • In order to detect it, I should figure out how to generate it.

[.build-lists: true]

Architecture

  1. Automated Twitter Signup
  2. Twitter bot
  3. Command & Control Infrastructure
  4. Content Automation

^

  • #1 Is a solution that has no responsible use.
  • #2 & #3 Are unnecessary without a need for a production deployment, and not interesting.
  • #4 Is an interesting problem

Content Automation

^

  • I set a new goal for my exercise, and started to plan.
  • The idea is that you have a bunch of content, and you want to make more content like that.

ENGLISH WORD RANDOM MAKING

^

  • Random words won't make sense.
  • What field of study will?
  • Natural Language Processing (NLP) seems like a good place to start.

Ruby-NLP1


Markov Chains2

^

  • A Markov chain is a stochastic process that allows you to predict future states based on the present state.
  • In other words, it's a probability/mathy thing, that let's us guess at new things.
  • That means we need something to base our guesses on.
  • To be interesting, you've got to start with a common type, theme, or genre.
  • https://twitter.com/trumpchain

Corpus

^

  • It's usually called a corpus
  • For Fake News experiment it would require scraping news websites, which is always annoying even with great tools.
  • I don't really want to make fake news.

Mark Twain

^

  • Project Gutenberg is a great resource for public domain works, and they have plain text versions.
  • Mark Twain swears a lot
  • Find something fun, recognizable, and not offensive.
  • I tried picking from various sources to make a mashup-style corpus, and the results were creepy.

Harry Potter

^

  • Luckily I own all of the Harry Potter books.
  • Conversion to text was easy.

Markov Potter

require 'marky_markov'

markov = MarkyMarkov::TemporaryDictionary.new(3)
markov.parse_string ARGV.map { |f| IO.read f }.join('\n')

5.times { puts markov.generate_n_sentences 1 }

^

  • Fast: ~30 seconds

  • Dread filled Harry: To repel dementors they would have something to do with the place.” “All the better,” said Tonks, and Harry saw her shudder.
  • Love,” which was taken by Fred Weasley.
  • In the past his scar hurting and you know it.” Professor Umbridge’s face was growing rather red.
  • Hermione squashed this plan by pointing out that, in the event of the current headmaster being unable to read runes, he could not rest while agitating thoughts whirled through his mind: Sirius falling through
  • Dennis Creevey flying as he led a small group of students who would be receiving zero marks for the day’s work.

N-Grams3


N-Gram Potter

require 'raingrams'
include Raingrams

ngram = if File.exist? 'model.raingrams'
          Model.open('model.raingrams')
        else
          TrigramModel.build do |model|
            model.train_with_text ARGV.map { |f| IO.read f }.join('\n')
          end
        end

ngram.save('model.raingrams')

5.times { puts ngram.random_sentence }

^

  • Slow: ~10 minutes

  • somehow we never asked said Nearly Headless Nick came gliding out of my way downstairs they weren t copying from her limp form and I suppose it was upon it his face Seamus Finnigan sounding revolted.
  • Arithmancy looks terrible said Harry glad of it at least explained why when I drop you off that broom.
  • THERE IS NO HARRY POTTER DISTURBED AND DANGEROUS The boy.
  • Inwardly praying that Neville deserved it Harry look inside your mind remember Yeah I bet my life campaigning against the killin of the sinister winged horses yeh know.
  • Lunch was an ignorant schoolboy.

Recurrent Neural Networks4

^

  • RNN is a centerpiece of machine learning
  • ML is a huge topic

[.build-lists: true]

RNNs at home

  1. Character-level Multi-layer RNN5 with LSTM6
  2. Reimplementation7
  3. Docker images8
  4. NVIDIA Docker Support9

^

  • This stuff is powerful, but has a lot of ugly dependencies
  • Luckily there's Docker, and I already have a docker server
  • Whoa, GPU support? How long is this going to take?

Why

^

  • This is the right time to stop and say, why am I doing this?
  • What does my GPU have to do with Fake News?
  • How many times have I changed my goal?
  • Why am I pursuing this path?
  • Are these valuable skills?
  • Is this difficult?
  • Am I learning?
  • Is this reusable?

Neural Potter

  1. Preprocess
  2. Train
  3. Sample

^

  • Once setup, there are 3 steps
  • Each step has various settings
  • Training has lots of settings, some settings create wildly different results

Preprocess

$ python scripts/preprocess.py  \
  --input_txt /data/hp.txt \
  --output_h5 /data/hp.h5 \
  --output_json /data/hp.json

Train

$ th train.lua -checkpoint_name /data/hp \
  -input_h5 /data/hp.h5 -input_json /data/hp.json \
  -batch_size 96 -seq_length 128 \
  -rnn_size 700 -num_layers 3 \
  -dropout .3 -max_epochs 128

Sample

$ th sample.lua \
  -checkpoint /data/hp_1st_51000.t7 \
  -temperature .3

  • “Well, I see,” said Harry, “but I was the only one who said to you.”
  • “What are you doing?” said Harry, turning away to see him as though he had been put in his hand. “What do you think you’re talking about?”
  • “I don’t know,” said Harry, “I don’t think you are still alive.”
  • “It must be the one who was allowed to see you at the time,” said Harry. “What do you mean?”
  • “I don’t know,” said Harry angrily.

  • “Well, I don’t think there’s a big look at the moment,” said Harry. “I was too long to tell you that I was all right. . . .”
  • “Well, they were still there with the dementors and say they are going to stay around the castle,” said Dumbledore sharply. “I was a bit of the letter to the Ministry of Magic.”
  • “What d’you mean?” said Harry in a strangely voice. “I don’t know what we were all right. I was starting to stay around the staff table at the moment, then —”
  • “I don’t know what you’re doing,” said Harry. “I was a bit more than you were not allowed to tell you that he was a bit more angry about the prophecy

  • Harry looked at him through the burning and the evening stepped toward him. It was a mistake on the opposite dropping door.
  • “Oh if you look for that to you!” said Harry, jumping away from Ron’s head.
  • “He’s fine,” Harry muttered, now following the skrewts flying back over the undergrowth and turned his back on Harry, “I’d better give us a clue, sir, we’re talking about.”
  • “Professor Snape was my death after work to get to him.”

Rails

  def self.warn(registry)
    @desc = 'localhost'
    @url_constraints.each do |defaults, @cache_size, location, locale, accepts = nil
    if old_table && @decorators.accept(name, options)
      if @block
        @executor.configure
      end
      @exclusive_thread = options[:constraints].map
    end
  end

^ The minimal set of gems that Rails requires - Excluding tests and comments


Rails

  def release_write(method, callback)
    constants = {}
    @struct_methods = aliases.any?
    @context = options[:controller]
    self.warning = compute_columns
    @required    = extract_delivery_options
    @translations = []
    @current = concurrent
    @wheres = {}
    @tempfile = parent
    @wrapped_string.delete(arg)
    @args.include?(node)
  end

Rails

  def orders
    super
    @stopped = attributes.size
    @non_enum_resent_callbacks = Array(options[:wrapped])
    self.name = true
    super(value, node_hash)
    super
  end
  def on_bjock
    @scope.synchronize { @exception_options[:user] }
  end

Gherkin

Scenario: Search product
  Given I am requement comserssill form it
  And I should see "Pohplete "sent products

Scenario: Editing now page
  And I preade a pustomer has a scorecermon to order
  Then I should see a bereaved for "Lron Now"
  When I open the refund order exists
  Then I should see the order topmeate the new customers fran order fhotal books for the courters
  And I should see "New logged in te my cart
  And I have admin user
  And I should see "St. Ering coupon code and body is to the order as order total of "$29.06"
  Then I should see "John Se /of. 11900 and a link and the ofd cart
  And I log on the credit card corders
  Given I am logged in as a submitted

^

  • Based on our existing feature files

Gherkin

@javascript
 Scenario: Order with refund bue line item
   Given I am logged in as a card
   Then I should see "Date Coupon Add to the customer page

   When I view my cart
   When I visit the order
   Then I should see "Golden Nuggets" in the new sent of a new result path
   When I click on "Gefs manage"
   And I should see "St. Criox Admin Iuttor "0030" and dollowing" product be $000.08
   And I should see a new deceaved
   Then I should have the size list of a predicted version
   And I should see "St. Brook" "Tale St.  Frenter Plus" size Hroduct Page

Readability10

^

  • I had planned to experiment with measuring readability
  • A loop that generates random stuff and uses stats to throw some away
  • https://github.com/shivam5992/textstat
  • Running them by hand was more fun

Q & A

Footnotes

  1. https://github.com/diasks2/ruby-nlp

  2. http://setosa.io/ev/markov-chains/

  3. https://en.wikipedia.org/wiki/N-gram

  4. http://karpathy.github.io/2015/05/21/rnn-effectiveness/

  5. https://github.com/karpathy/char-rnn

  6. http://blog.echen.me/2017/05/30/exploring-lstms/

  7. https://github.com/jcjohnson/torch-rnn

  8. https://github.com/crisbal/docker-torch-rnn

  9. https://github.com/NVIDIA/nvidia-docker

  10. https://en.wikipedia.org/wiki/Readability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment