Mentors:
- Morgan Roff
- Sayak Paul
- jaeyounkim
This is a summary of my GSoC 2021 project. In this project, I tried to produce text embedding modules trained on underrepresented languages like Arabic and Swahili and publish them on tfhub.dev.
# Copyright 2022 Google LLC. | |
# SPDX-License-Identifier: Apache-2.0 | |
# Author: Maithra Raghu <maithra@google.com> | |
def compute_distance_matrix(patch_size, num_patches, length): | |
"""Helper function to compute distance matrix.""" | |
distance_matrix = np.zeros((num_patches, num_patches)) |
# Copyright 2021 Google LLC. | |
# SPDX-License-Identifier: Apache-2.0 | |
import kfp | |
import json | |
import time | |
from google.cloud import bigquery | |
from google.cloud.exceptions import NotFound | |
from kfp.v2.google.client import AIPlatformClient | |
client = bigquery.Client() |
import functools | |
import numpy as np | |
import tensorflow.compat.v1 as tf | |
from tensorflow.python.tpu import tpu_function | |
BATCH_NORM_DECAY = 0.9 | |
BATCH_NORM_EPSILON = 1e-5 | |
To be posted in: https://forums.fast.ai/c/fastai-users/fastai-v2/
Title: Proposed workflow to compare & monitor models using WandbCallback
Content:
Hi,
I’ve been working on WandbCallback
for the past few months (with a lot of help from @sgugger) and I'm very excited to show how it works!
def get_classification_report(y_test, y_pred): | |
'''Source: https://stackoverflow.com/questions/39662398/scikit-learn-output-metrics-classification-report-into-csv-tab-delimited-format''' | |
from sklearn import metrics | |
report = metrics.classification_report(y_test, y_pred, output_dict=True) | |
df_classification_report = pd.DataFrame(report).transpose() | |
df_classification_report = df_classification_report.sort_values(by=['f1-score'], ascending=False) | |
return df_classification_report |
First: install the CLI program for your distribution: https://cloud.google.com/sdk/install
Modify accordingly:
export REGION='us-central1'
export ZONE='us-central1-f'
export PROJECT_NAME='proj'
word_to_id = imdb.get_word_index() | |
word_to_id = {k: (v+3) for k, v in word_to_id.items()} | |
id_to_word = {value: key for key, value in word_to_id.items()} | |
id_to_word[0] = "" # Padding | |
id_to_word[1] = "" # Start token | |
id_to_word[2] = "�" # Unknown | |
id_to_word[3] = "" # End token | |
def decode(word): |