Skip to content

Instantly share code, notes, and snippets.

@magic-lantern
Last active September 6, 2022 19:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save magic-lantern/553d51d9a3839ade03d9bdd53b007a61 to your computer and use it in GitHub Desktop.
Save magic-lantern/553d51d9a3839ade03d9bdd53b007a61 to your computer and use it in GitHub Desktop.
Eureka Notes from beta testing

Eureka New User Guide

General Eureka Information

See https://www.healthdatacompass.org/cloud-analytics-infrastructure/using-eureka-app-vm

Initial Setup

To setup your account on the machine. This must be done before you can use Google Cloud Source Repositories and several other commands. In a terminal on the VM run:

gcloud init

Output and steps will look something like this:

Welcome! This command will take you through the configuration of gcloud.

Your current configuration has been set to: [default]

Choose the account you would like to use to perform operations for 
this configuration:
 [1] logging-monitoring@<project>.iam.gserviceaccount.com
 [2] Log in with a new account
Please enter your numeric choice:  2


You are running on a Google Compute Engine virtual machine.
It is recommended that you use service accounts for authentication.

You can run:

  $ gcloud config set account `ACCOUNT`

to switch accounts if necessary.

Your credentials may be visible to others with access to this
virtual machine. Are you sure you want to authenticate with
your personal account?

Do you want to continue (Y/n)?  y

Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?...

This tool has been deprecated, use 'gio open' instead.
See 'gio help open' for more info.
Checking default browser status ended.
You are logged in as: [first.last@hdcuser.org].

Pick cloud project to use: 
 [1] project1
 [2] project2
 [3] project3
 [4] 
 ...
 [15] Create a new project
Please enter numeric choice or text value (must exactly match list 
item):  12

Your current project has been set to: [selectedproject].

Not setting default zone/region (this feature makes it easier to use
[gcloud compute] by setting an appropriate default value for the
--zone and --region flag).
See https://cloud.google.com/compute/docs/gcloud-compute section on how to set
default compute region and zone manually. If you would like [gcloud init] to be
able to do this for you the next time you run it, make sure the
Compute Engine API is enabled for your project on the
https://console.developers.google.com/apis page.

Your Google Cloud SDK is configured and ready to use!

* Commands that require authentication will use first.last@hdcuser.org by default
* Commands will reference project `selectedproject` by default
Run `gcloud help config` to learn how to change individual settings

This gcloud configuration is called [default]. You can create additional configurations if you work with multiple accounts and/or projects.
Run `gcloud topic configurations` to learn more.

Some things to try next:

* Run `gcloud --help` to see the Cloud Platform services you can interact with. And run `gcloud help COMMAND` to get help on any gcloud command.
* Run `gcloud topic --help` to learn about advanced features of the SDK like arg files and output formatting

Next, before attempting to install anything with R or use the Google BigQuery bq command line tool, run this R script:

run_me_first.R

Internet Access

Before attempting to install an R/Python package or connect to websites such as Github, you need to first enable outbound Internet using the Eureka Limited Internet application. Command line utilities are available in the pattern eureka-internet- (press the tab key after to see the available list). There is also a desktop application called Eureka Limited Internet App. Once you enable outbound Internet, be prepared to wait 5 minute before you have access. Access will automatically disable after about 30 minutes.

R Language Specifics

Installing Packages

Before attempting to install an R package, you need to first enable outbound Internet using the Eureka Limited Internet application. See above for details.

Once you have done this, most packages can simply be installed by running install.packages('packagename') from inside an R session. You can also install packages by using RStudio's GUI.

OS Dependencies/Package Prerequisites

This section only needs to be performed by 1 person as a first time setup for the entire system.

Many R packages are compiled from C/C++ source code and therefore require header files for system libraries. In my experience, some of the most common dependencies are libxml2 and libcurl. In order to install the libraries with the necessary dependencies, run the following:

sudo yum install libcurl-devel
sudo yum install libxml2-devel

If there are other unmet dependencies, they can likely be met by installing a -devel version of the library that the package installation process is missing.

Manually install packages

Although you should be able to use standard R practices for installing packages, occasionally you may want to manually install a package such as when the package is only available via Github.

Most R packages have multiple dependencies. Make sure you install all required dependencies and have necessary OS packages/libraries installed first. If you have many packages to install, this first option will probably be easiest.

  1. Download desired package(s) from your favorite R mirror or website:
  2. Follow instructions in this gist on how to get files to Eureka
  3. In R run:
    library(tools)
    write_PACKAGES("/path/to/packages/")
    
  4. Then run install.packages("package", contriburl="file:///path/to/packages/")

Alternatively, you can do either

install.packages(path_to_file, repos = NULL, type="source")

or

R CMD INSTALL somepackage.tar.gz

See https://stackoverflow.com/questions/10807804/offline-install-of-r-package-and-dependencies#10841614 and https://stackoverflow.com/questions/1474081/how-do-i-install-an-r-package-from-source

Example of how to query BigQuery using R

Before you can use bigrquery you must first give authorization to allow a 3rd party system to access data you have in BigQuery. This is just giving yourself access to run a non-Google API to access your BigQuery data.

Checking the optional boxes is required for bigrquery to work:

Tidyverse Authorization

bigrquery will not work if you fail to properly grant access. The error you will recieve will be something like:

Error: Request had insufficient authentication scopes. [insufficientPermissions]

See gbq_example.R

Python

The Eureka app servers have CentOS 7.5 python 2.7.5 and python 3.6.5. Jupyter is not installed by default.

If you want to use Jupyter, I recommend you install Anaconda's Python distribution. Download the Linux 64 bit version from https://www.anaconda.com/download/

DBUS Warning

Problem: dbus-daemon from /usr/bin conflicts with the version from Anaconda. System python also in /usr/bin, so if /usr/bin first in path, can't use Anaconda, but can use VNC GUI. Other programs besides VNC may be impacted by Anaconda's dbus.

See https://www.centos.org/forums/viewtopic.php?t=66886

Anaconda can't be first in path when starting VNC.

Options:

  • Customize path in ~/.vnc/xstartup. Immediately after #!/bin/sh add the line: export PATH=/usr/bin:$PATH
  • Add anaconda to environment when needed

Getting files to Eureka

There are a couple of options:

  1. Copy files from your machine to https://console.cloud.google.com/storage/ using a browser. Copy down to app server by using VNC and running a browser in the app server.
  2. gsutil cp Documentation available at https://cloud.google.com/storage/docs/gsutil/commands/cp
  3. gcsfuse - This allows you to access your cloud storage bucket as if it were just a local folder on both your local machine and your app server.

gcsfuse instructions

Follow these steps on all machines that you want to access your Google Cloud Storage bucket files on:

  • Install gcsfuse see https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/installing.md - It is already installed on Eureka, you'll just need to follow the steps on your mac
  • Run via command line gcloud auth application-default login
  • Create the desired folder to mount Google Cloud Storage to - I call it "gcs" but you can call it whatever you want: mkdir ~/gcs
  • Mount the folder: using gcsfuse <gcs bucket> <path from previous step>

When done, you can unmount the folder with fusermount -u <path mounted to>

Installing OS Packages

There are two options:

Download and get to app server via your Google Cloud Storage Bucket

Use a resource such as https://rpmfind.net/linux/RPM/index.html or https://centos.pkgs.org/7/centos-x86_64/ to find your desired package. The environment is running Centos 7.5 on x86_64.

Once the desired RPM file is available on the app server, run

sudo rpm -ivh myrpm.rpm

Docker

The Docker Service is not running by default. To start dockerd, run:

sudo systemctl status docker # checks current status of docker
sudo systemctl enable docker
sudo systemctl start docker

Without outbound internet, the easiest method to get docker container to GCE is to have running locally, 'save' copy over to GCE, then 'load' on the app server. See https://www.percona.com/blog/2017/05/23/how-to-save-and-load-docker-images/.

If you have access via ssh proxy, edit /usr/lib/systemd/system/docker.service (sudo vi /usr/lib/systemd/system/docker.service) and add the following line immediately after the line [Service]

Environment=ALL_PROXY=socks5://localhost:1081

Then run:

sudo systemctl daemon-reload
sudo systemctl stop docker
sudo systemctl start docker

And then use docker as normal.

Using Git & Google Cloud Source

To use Git and Google Cloud Source, first create a git repository by navigating to the URL: https://source.cloud.google.com/

Create your new repository via the webUI. Once done, here are commands necessary to get it working on a Google Cloud VM (via SSH session):

Configure your git client by setting up your information:

git config --global user.email "you@example.com"
git config --global user.name "Your Name"

Next make sure your Google Cloud Account has been setup locally:

gcloud init

It will ask you several questions and make you login. For details and example, see above "Initial Setup" section.

Now, you can clone the repository to your VM, replace <yourrepo> and <yourproject> with actual values minus the '<>'. Note you will get some warnings due to CentOS default super old version of Git, but it will work!

gcloud source repos clone <yourrepo> --project=<yourproject>

Here's the output:

WARNING:          You are using a Google-hosted repository with a
         git version 1.8.3.1
which is older than 2.0.1. If you upgrade
         to 2.0.1 or later, gcloud can handle authentication to
         this repository. Otherwise, to authenticate, use your Google
         account and the password found by running the following command.
          $ gcloud auth print-access-token
Cloning into '/home/username/yourrepo'...
Username for 'https://source.developers.google.com': username@domain.tl
Password for 'https://username@domain.tld@source.developers.google.com':
fatal: remote error:


Invalid authentication credentials.

Please generate a new identifier:
  https://source.developers.google.com/new-password


ERROR: (gcloud.source.repos.clone) Command '[u'git', u'clone', u'https://source.developers.google.com/p/<project>/r/<dataset>', u'/home/username/yourrepo']' returned non-zero exit status 128

Follow the provided instructions by going to the URL: https://source.developers.google.com/new-password

Once you get the provided script, run that script in your SSH session and now you should be able to actually clone the Repository:

gcloud source repos clone <yourrepo> --project=<yourproject>

Once you've done that, you can use standard git commands - no need to use gcloud any more.

Note for first push to brand new repo or new branch For the very first push to a brand new repository, make sure you use this:

git push -u origin master

If you are working with a new branch, replace master with your actual branch:

git push -u origin mybranch

Cloning locally via SSH

To clone via SSH you must first configure an SSH key in Google's Git environment. Steps are:

Make sure the key uploaded is in your ssh-agent. For guidance on this see Generating a new SSH key and adding it to the ssh-agent. Once that is done you should be able to use commands like git clone ssh://

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment