This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# import embeddings | |
!ls shared/MUSE | |
import musevecs | |
mdls = musevecs.MUSE('shared/MUSE/wiki.multi.{0}.vecfull.txt', {'en', 'de', 'fr'}, nmax=200000) | |
enmdl = mdls.vecmap['en'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# -*- coding: utf-8 -*- | |
import lxml.etree as et | |
xml=""" | |
<text:doc xmlns:text="www.doc.com"> | |
<text:span text:style-name="Kursiv_5f_KeiinRecht"> | |
<text:span text:style-name="T25">madhuparká</text:span> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:text="www.doc.com"> | |
<xsl:output method="xml" indent="yes"/> | |
<xsl:strip-space elements="*"/> | |
<xsl:template match="@*|node()"> | |
<xsl:copy> | |
<xsl:apply-templates select="@*|node()"/> | |
</xsl:copy> | |
</xsl:template> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /bin/bash | |
chunk_input_dir=$1 | |
chunk_output_dir=$2 | |
chunk_prefix=$3 | |
chunk_index=$4 | |
model_paths=$5 | |
batch_size=$6 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
# author: Mathias Mueller / mathias.mueller@uzh.ch | |
# purpose: train word alignment models with fast_align | |
# usage information | |
if [ $# -lt 4 ] | |
then | |
echo "[ERROR] Too few arguments. Expected 4 command line arguments." 1>&2 | |
echo "Usage: $0 <language 1> <language 2> <path to training set without language suffix> <output directory for trained model>" 1>&2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
# author: Mathias Mueller / mathias.mueller@uzh.ch | |
# purpose: apply word alignment models trained with fast_align | |
# usage information | |
if [ $# -lt 6 ] | |
then | |
echo "[ERROR] Too few arguments. Expected 6 command line arguments." 1>&2 | |
echo "Usage: $0 <language 1> <language 2> <source txt file> <target txt file> <directory of trained model> <output file path>" 1>&2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Author: Mathias Müller / mmueller@cl.uzh.ch | |
import mxnet as mx | |
import numpy as np | |
from typing import Optional, List | |
SOFTMAX_NAME = "softmax" | |
WEIGHTED_CROSS_ENTROPY_NAME = "weighted_cross_entropy" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import mxnet as mx | |
import numpy as np | |
import random | |
from collections import namedtuple | |
def run(seed: int, init_method: str): | |
mx.random.seed(seed) |
OlderNewer