Skip to content

Instantly share code, notes, and snippets.

#!/usr/bin/env python
# coding=utf-8
# Copyright The HuggingFace Team and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
@machelreid
machelreid / CNN Daily Mail Tips.md
Created April 28, 2021 00:43
Tips for CNN daily mail abstractive summarization

Abstractive Summarization (CNN-DM)

Here are things that I spent a lot of time on, so you don’t have to - especially with regard to preprocessing data for abstractive summarization. It will be pretty disorganized, but bear with me - there might be something useful in here.

Using Rouge

Important: Don’t use pure Python implementations of Rouge!! Use the following Python wrapper for the original Perl package: https://github.com/pltrdy/files2rouge

Preprocessing Data for Abstractive Summarization

@machelreid
machelreid / get_transformer_paramters.py
Created December 8, 2020 06:33
Get the parameters for the vanilla Transformer (Vaswani et al., 2017)
import argparse
def get_enc_params(embed_dim, ffn_dim):
return embed_dim * embed_dim * 4 + embed_dim * ffn_dim * 2 + embed_dim * 5 + ffn_dim
def get_dec_params(embed_dim, ffn_dim, encoder_embed_dim=None):
return (
embed_dim * embed_dim * 4
+ embed_dim * ffn_dim * 2
import sys
import os
import hashlib
import struct
import subprocess
import collections
dm_single_close_quote = u'\u2019' # unicode
dm_double_close_quote = u'\u201d'
#!/bin/bash
TEXT=$1
echo Finding $TEXT
grep -iRl "$TEXT" ./