Skip to content

Instantly share code, notes, and snippets.

View epwalsh's full-sized avatar

Pete epwalsh

  • Central Oregon
  • 05:47 (UTC -07:00)
  • X @epwalsh
View GitHub Profile
@epwalsh
epwalsh / scrape.sh
Created June 5, 2016 21:41
Python + R libraries to WordCloud
find ~/ISU-DMC/dmc2016 -name '*.h' -o -name '*.R' > RFILES
find ~/ISU-DMC/dmc2016 -name '*.h' -o -name '*.py' > PYFILES
@epwalsh
epwalsh / pull_requests.md
Last active February 22, 2019 18:10
How to make a PR to an open source project without pissing everyone off

Initial setup

Step 1: Fork the repo.

Step 2: Clone your fork locally.

git clone https://github.com/USERNAME/REPO.git
@epwalsh
epwalsh / partial_config.jsonnet
Created February 12, 2019 16:27
AllenNLP learning rate schedulers
{
"trainer": {
"cuda_device": 0,
"learning_rate_scheduler": {
"type": "triangular",
// total number of epochs, should match the trainner param `num_epochs` below
"num_epochs": 80,
// increase LR linearly for 20 epochs
"warm_up": 20,
// then decrease LR linearly for 30 epochs
class CopyNetSeq2Seq(Model):
# snip...
def _get_generation_scores(self, state: Dict[str, torch.Tensor]) -> torch.Tensor:
# `self._output_generation_layer` is just a PyTorch linear layer with an input
# dimension equal to the decoder hidden state size, and an output dimension
# equal to the size of the target vocabulary.
return self._output_generation_layer(state["decoder_hidden"])
class CopyNetSeq2Seq(Model):
# snip...
def _get_copy_scores(self, state: Dict[str, torch.Tensor]) -> torch.Tensor:
# NOTE: here `trimmed_source_length` refers to the input sequence length minus 2,
# so that the special START and END tokens in the source are ignored. We also need to
# ignore PAD tokens, but that happens elsewhere using a mask.
# shape: (batch_size, trimmed_source_length, encoder_output_dim)
trimmed_encoder_outputs = state["encoder_outputs"][:, 1:-1]
class CopyNetSeq2Seq(Model):
# snip...
def _decoder_step(self,
last_predictions: torch.Tensor,
selective_weights: torch.Tensor,
state: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
# shape: (group_size, max_input_sequence_length, encoder_output_dim)
encoder_outputs_mask = state["source_mask"].float()
class CopyNetSeq2Seq(Model):
# snip...
def _get_ll_contrib(self,
generation_scores: torch.Tensor,
generation_scores_mask: torch.Tensor,
copy_scores: torch.Tensor,
target_tokens: torch.Tensor,
target_to_source: torch.Tensor,
@epwalsh
epwalsh / github-labels.sh
Created January 13, 2020 17:13
Create good default labels for a repository. Adapted from https://github.com/amatkivskiy/github-labels-creator.
#!/bin/bash
label_names=(
'Status: Changes Requested'
'Status: Do Not Merge'
'Status: Help Wanted'
'Status: In Progress'
'Status: Mergeable'
'Status: Review Needed'
'Type: Bug'
local transformer_model = 'bert-base-cased';
local epochs = 1;
local batch_size = 8;
{
"dataset_reader": {
"type": "transformer_squad",
"transformer_model_name": transformer_model,
"skip_invalid_examples": true,
@epwalsh
epwalsh / dataset_reader.py
Created July 1, 2020 18:25
Dataset Reader API
"""
Proposal for new DatasetReader API.
For this to work, all `Instance`s would have to be efficiently serializable.
So `TextField`s, for example, shouldn't contain `TokenIndexer`s.
The flow of data would look like this (boxes represent separate Python processes):
```
+-----------------------------------------------+