lemonteaa/proposal.md

## proposal.md

      
    Raw
  

              proposal.md
            
          
    Project Proposal: Client-side and Blockchain integrated Learning Management System (applied to tech bootcamp)

Motivation

I want to have a learning website for a custom made/DIY tech bootcamp. Two major requirements are that:

It should have lots of hands-on lab components so that students can work on the projects/exercises without leaving the browser.
And yet, we want to have the whole thing be a static website if possible because we do not have the resources to manage a server (both cloud costs and maintainence/scalability concern).

To clarify, using other people's free cloud resources is allowed, but we also want the website to be sustainable and built for longenvity - it should mostly rely on well-funded free resources that have a low risk of going out, or at least, should be robust against them folding by having alternatives that we can replace.
Hands-on components

From a preliminary syllabus, there are a few types of hands-on lab. We summarize them and their respective proposed free cloud resource provider in the table below:


Lab type
Example
Provider


Linux System Admin and some DevOps
Command line, git, docker, VSCode install
KataCoda, Play With Docker (PWD), Google Cloud Shell


Basic Programming Exercises
Intro to programming language
Code snippet/compile tools - e.g. glot.io


Development on IDE
Backend Dev
Gitpod, Google Cloud Shell


Frontend development
Frontend Dev
Frontend online IDE/snippet tools - e.g. CodeSandbox


(Note: replit is not considered due to negative PR relating to the behavior of the CEO. Also, venture captialist funded startup tend to be a bit more risky, even if the finance is impressive - high amount of fund provided often carry an implicit expectation of extreme growth and market capture - which goes against our sustainability requirement)
Limitations

Providing free resources is expensive. It is also risky due to the potential to attract abuse. For these reasons it is reasonable that various limitations are imposed which we need to be aware of, such as:

Need to signup with an account first (but we mostly avoid those that requires providing payment details to use)
Usage/Resource Limita/Quota. This comes in two basic forms:
Amount of compute/memory/network resources used (e.g. Cloud Shell have a limit of 8GB ram, other provider have smaller limit)
Amount of time you can use it per month/week (e.g. gitpod's open source plan is limited to 50 hours/week)
"Mandatory Sharing" requirement. E.g. gitpod's free plan used to be limited to public repository only.
Session Timeout. The provisioned resources spin down automatically after some idle time/inactivity.
Shared Host (Instead of a hard limit on the cloud resource)
System access right restriction - you are usually not given root access, and having limited sudo right is already a competitive edge (gitpod and Cloud Shell both provide it).
No dedicated public IP address.

Method of Integration

There are generally speaking a few methods:

Link to external website: Easiest to do and most widely available, but the UX is not as good as user need to leave the site and may even need to do some manual actions to activate the service.
Embed: A bit harder, also need to beware of CORS issues. But give better experience.
API (with own UI or embed + API control): Hardest to do, and not all provider have this options, but can make for a seamless experience and enable advanced use cases such as providing feedback on submission (achieved using the API to have bidirectional data flow between our website and the embed, so that we can control/peek into the state of the embed etc)

Instructional text

A special case of integration is text to instruct users what to do. Katacoda and Google Cloud Shell both provide support for this. In addition, one can enable similar things for any project when inside VSCode, through an extension called Autodidact.
Alternative considered

There are some experimental projects that explore the possibility of simulating other programming languages, or even an entire VM, inside the browser. This raises an (im)practical possibility of a client-side only hands-on lab.
This is an interesting route. On one hand, it certainly has its appeal - as everything runs on client-side, it fulfills our requirement of not having a server, and it is certainly robust against free cloud provider ceasing the offer since we're not using any in the first place!
However, there are some downsides as well. The underlying technologies involved is very advanced and can be difficult to integrate/adapt, even at a purely application level. And we can expect that it will not be as performant as the cloud equivalent, because it usually involves using the browser's Javascript/WASM engine as a computational substrate, and it usually need to transgress the language boundaries, through advanced transpilation technique. Indeed, experiments suggest that there would be at least one (maybe more) full order of magnitude slowdown compared to native execution. Finally, due to the browser sandbox, certain features are fundamentally impossible without at least some server component - the main example being network access (for the VM case, they are overcome by having a network proxy that let the browser connect via websocket).
Because the tech is not yet fully mature, we ultimately opted for being pragmatic and uses the free cloud resources instead. That said, this can be saved as a fallback option. Also, in some special cases a client-side only option is relatively speaking more feasible:

(?) For short code snippet compilation, one can either uses the Emscripten project, which compiles LLVM -> Javascript/WASM, or uses compiler that directly compiles down to WASM.
For HTML/CSS/Javascript, often we can get away with just an iframe for simple use cases.

Finally, for reference, some famous client-side projects in this space:

Copy/v86 - Emulate a full VM in browser (github, homepage)
Iodide - Jupyter notebook without server (github)

Features

Our website should have the following features:

Able to browse content: Include filtering and sorting course. For official courses, there should be a roadmap like diagram visualizing the dependency/recommended order
Support for different content type
Texts (possibly with interacting diagram), Video, Screencast, hands-on, Quiz
Quiz should support question banks and multiple attempts (when failed) with history
Save learning progress: user can click "Mark complete" on a learning unit. Then in the course's main page, completed unit will have a tick mark.
Bookmark: for quickly navigating to pages when there are many units.
Note taking: with support for rich text (through Markdown). Should support code snippets.
Daily Goal: System should recommend units to take next, and track user's completion of those units. User should be able to initialize the setup by target "work unit" per day, and in more sophisticated case, by setting target completion date and let system infer. (May involve approximation of user's efficiency based on historical data)
Learning Track: On the dashboard, a list of courses on learning track(s) the user have selected should be shown. It should sort and filter based on courses user have completed as well as dependency order, and fulfillment of prerequisite. (Dashboard should also show courses user are currently taking)
Cloud sync of user data
Special requirement: User should enjoy privacy and sovereignty of their data
In particular, user should be able to export/import their personal data as a single zip/text file
Also, the app should continue to work even when offline

Below are more sophisticated features that might be on a future milestone:

Sharing note publicly
Community Contributed Course
A course designer

Architecture and Design

We break down this sections by basic ideas:
Static Site Generator and Client-Side Features

The simplest part is to use a SSG for site content. This will handle the official courses (where at least the core part is fixed, and at any rate is centralized). We choose Gatsby due to its power and flexibility.
At the first iteration, some of the features can be implemented as client-side interactivity in the SPA:

Interactive diagram (diagram.js, mermaid for program flowchart, OO and ER diagram, etc)
For learning progress and bookmark, the original idea is to use localStorage. But there are two problems. The first is that localStorage have a size limit and doesn't have a DB structure; a quick search reveals that we should use indexedDB instead. The second problem is that we still need to handle cloud sync (more on this later).
Daily Goal (Just compute based on data retrived from the indexedDB)

For indexedDB, we will also use the dexie.js library to make it easier to work with. Another plus is that some of its plugin provide advanced functionalities that we will need later on.
Content Types


Texts: Use an extension of Markdown known as mdx and let Gastsby read from the file in repo, then generate pages. (This doesn't work directly for community contributed course though)
Video: The standard answer is to use Youtube. However, due to privacy concern, we would prefer to use something else, probably some form of decentralized solution. After surveying the options, I found that rolling your own may have more flexibility. This involves uploading videos to IPFS, then using a JS video player (video.js) to provide advanced functionality on top of what the native HTML5 <video> provides. Note that there are now both free filecoin service (sponsored by ProtocolLab) and free video transcoding service (Livepeer with 1000 minutes/month, backed by miners in the backend).
Screencast: Something like ascinema should be used to save bandwidth and file size. Unfortunately, after careful research, it seems that asciinema have a monopoly on being the only production-ready solution for our use case. It actually have both an Open source and cloud service component - the later part involves uploading the recorded file and let them hosts it. Since there is a risk of them going out of business, we can opt for the self-host route (self-hosting the client-side player to be precise). This is somewhat involved from other people's experience sharing though.
Quiz: Still studying available libraries out there. Ironically, there is an abundance of tutorial for bootcamper on doing exactly this, but not enough library - they are either so old that it's before react even exists, or not entirely open source (similar to the situation for JS diagramming library)

Cloud Storage of User Data

On first iteration, Google Cloud Drive API can achieve the goal of per user private storage. However, due to privacy concern, we prefer other solutions. Decentralized method comes to mind. After research, we settled down on using Blockstack as it also handle user authentications/identity, and is a mature project with a large user base.
Blockstack uses the Gaia protocol, which is a way to connect to any compatible storage provider (which could be conventional S3, for example), but in a way that preserves privacy. By default, Gaia connect to the stewarding institute's own storage.
Blockstack can be used through their client JS library.
Blockstack's data model is that of a File System - you read and write individual "files" directly.
Offline first

To fulfill this requirement, our conceptual model is to treat the local indexedDB as a form of cache, and then uses the stale-while-invalidate pattern. That is, we will always access data to fulfill user requests through indexedDB, but will perform "sync" to and from local against Blockstack's remote storage behind the scene.
A difficulty here is the impedence mismatch due to differing data model on local (key-value store) vs remote (file system). So far we don't have any good solution other than ad hoc handling in an attempt to reduce unnecessary data transfer.
Note that for this task we will probably use the export/import DB plugin of dexie - it allows us to add a filter criteria instead of exporting the whole DB.
Concurrency Control

There could be multiple concurrent session for the same users, which could results in edit conflicts. This is a tough problem in general, however, for our use case, which is hopefully a low stake situation, it seems obvious that always taking the eventually consistent choice is the right answer. We are in luck here - seems it is okay to always use CRDT (Conflict-Free Replicated Data Type).
In particular:

For marking progress, should be trivially CRDT as it is a monotone data (can only go from incomplete to complete)
For Bookmarks, can be modelled as a simple set (two-phase set in CRDT jargon?) (Seems the Automerge library is nice enough) (See more at here)
The case for notes is more complicated. Seems using a git model is more appropiate. In the extreme case, we may use a client-side, complete git solution (e.g. isomorphic-git).

Some note is in order for the client-side git thing:

It uses a JS emulated file system (famous solution in this space include the BrowserFS, which is one of the support FS backend for isomorphic-git)
Networking is going to be a problem.

Export/Import data

Just uses the export/import plugin of dixie, which implements some advanced functionality like streaming/progress bar etc - crucial for a good UX.
After conversion to/from DB vs in memory JS objects is handled, we still need the user interface without a server.
For downloading, download.js should be enough.
For uploading, uses the File API in HTML5 (see the mozilla article for details) - in particular use the FileReader call against the <input type="file"> element.
User on-boarding and learning profile

To provide a smooth user experience, it should still be possible to use the system as Guest and enjoy most of the features (except Cloud sync). Then, after signing up, the data as Guest should be imported to the User profile. However, there is a risk of accidentally deleting user data if the implementation is not careful. Here is a proposed approach to systematically avoid this, at the cost of some complications.
A user can have multiple profile, and he can switch between them for the "active" one. From a data model point of view, each profile is a completely isolated database containing all data needed to operate a system. Thus each profile map to a database in the indexedDB.
Profile can never be overwritten or accidentally cleared - it can only be deleted via an explicit user action with clear warning. Instead we have the concept of cloning/copying, or moving a profile.
guest is a special profile. It is the only profile available when in the logged out state. Then, when creating a new account and signing in for the first time, the system detects that it has no profile and give the user the following options (This is not triggered if the user already have a profile):

Do not import: A new, empty profile is created and associated to the user.
Import and preserve guest: A clone of the guest profile is created and associated to the user.
Import and clear out guest: The ownership of the guest profile is transferred to the user. A new, empty, guest profile is created.

In the UI, a Profile page should be provided under the Settings section, that allow user to manage his profiles - the option to delete a profile should sit in here.
TODO: Access control/avoid leaking data when multiple users login on the same device?
Phase 2 Features

After some studies, it seems that implementing them while staying true to the constraint of not paying for hosting cost is still feasible. The key is to leverage Application-Specific Blockchain. While normally using a blockchain is not free, there are some exception to this rule, which we will explain in the first section. Then we outline how an implementation may proceed.
Background on Hive blockchain

Introduction
The Hive Blockchain is a decentralized, censorship-resistant, and free (as in beer) social media platform. It is a hard fork of Steemit after a fiasco involving some centralized authority performing censorship successfully on what is promised to be a censorship-resistant platform. It uses Delegated Proof-of-Stake (dPoS) as the consensus mechanism - this makes production of blocks faster than using PoW, and is considered a suitable trade-off for the specific use case of social media. While blockchain transactions usually costs real money, Hive manages to make it free by using the attention economy as the economic component of the blockchain.
Tokens and economic model
There are three tokens. In short, HIVE is the base, liquid cryptocurrency token similar to bitcoin. Then to participate in the network, he needs to convert it into Hive Power (HP) through a process called "Power Up". HP represents the amount of influence an user has as he perform activities. "Cashing out" is possible by doing the reverse conversion called "Power down", however it is a slow process that takes 13 weeks. (Moreover, note that influence will also go down) Therefore HP is an illiquid asset. As an additional option for more conservative users, a third token called Hive Dollar (HBD) is provided alongside HIVE, that is a stable-coin (pegged to the USD).
Transactions on the blockchain are free. However, to prevent spam, they are rate-limited through a mechanism called "Resource Credit". Each user is given a ration of these credits each week, and all transactions cost Resource Credit instead. The tokens play a role here as HP also influence how much Resource Credit you are rationed with each week.
Posts and comments can earn rewards: when a user upvote you, some of the newly minted HIVE/HP in the pool will be alloted to you. The influence of the user upvoting you also affect the size of the reward.
Finally, to allow everyone to participate, new users are given a delegation of some HP for free so they get some non-zero Resource Credit ration. Note that you do not own these HP and so cannot withdraw them/power down.
Therefore, user can participate without paying by using the initial, delegated HP as bootstrap. By creating quality contents, they will gradually earn rewards, which increases their HP and hence Resource Credit Ration, allowing more activities on the network over time. At the same time, user still have the option to get more HP immediately by paying real money, in a way that is similar to "boosting".
Data model
The basic object in Hive is a comment. To make it flexible and accomodate the different data models in different social media platforms, comments have a tree structure: each comment optionally has a parent comment, and root level comment (i.e. those without a parent) is considered a post. It is up to the client side of the app to present these data in the way it likes.
To make it extensible, a comment also have a json_metadata field, that can contain arbitrary JSON data. For example, communities built on Hive have an implicit convention to implement a tag system: Root level posts should have a field tags in the metadata. Different community filter posts based on matching a tag that identifies them, to create the impression of something like a subreddit.
As an alternative, some user may perfer a more centralized approach to community. An example implementation is given at here. The most important point to note are:

Community is created by creating a normal user account with a special username conforming to a format: e.g. hive-10000. This user account is then the Owner of the community and have full control/right. It can then begin by assigning admin users etc.
Operations related to the community is implemented by the custom_json operation on the Hive Blockchain, which is a way to make the blockchain extensible.

User Keys and integrating with third-party apps
Each user account have three sets of keys with different level of authorization: The Posting Key is for everyday activities such as posting, commenting, voting, etc. The Active Key is for more sensitive, admin-like operations such as financial transactions and powering up/down. Finally, the Owner Key have complete access right (so is like the root in linux) - it can be used to change/replace any key, including itself.
To make integration against third-party app possible, without the user having to leak the keys, a special service called "Hivesigner" is provided. It implements an OAuth2 flow so that integrations work as if it is just another Web2.0 app.
Leveraging Hive Blockchain to implement discussion, notes sharing, and community contributed course registry

In all three cases, the basic mechanism is the same:

Certain special (root level) post is created manually/automatically. This post should have some special mean of being identified in the blockchain, such as a combination of tag + metadata.
On client side, the entire post/comment tree is retrieved by first identifying the root post in step 1 using its characteristics.
Additional information can be encoded inside the metadata as well.

Hive Blockchain integration

To integrate with Hive, we should use the Hivesigner for OAuth. On the client side, the official client app's source code should be forked. Then extract the relevant React Components and customize for our need. Fetching of data, sending transactions, etc, are handled using the official Hive SDK.
Other implementation details

Need for standard format for course meta-data
There should be a manifest file (say, in JSON format) containing basic informations about the course, which will be displayed on the course overview page. A course is simply a repository of files with the manifest, plus the course contents in the form of markdown files. Special content type can be handled using JSON instead (e.g. to store the URL of a code snippet on third party service).
For organizations, the content should be put into folders. Then the folder structure may implicitly provide for the course outline. However, this method may fail once more features are added, such as support for multiple (human) language (e.g. Chinese + English).
Hosting and Displaying Community Contributed Course
In general, any static hosting would work. On client side, the React Components should fetch data by calling the URL, then hydrate the components' content with the result. Therefore, the static host here is actually a static API.
The URL should have special format to allow displaying course from any URL - the app should read parameters using a combination of window.location.search and URLSearchParams.
Note however that there will be CORS issue. Hence, use of IPFS to host content is recommended as the public IPFS gateway usually support this. In this case, the official library js-ipfs can be used.
Tools for authoring and submitting course
For convinience, at a minimum some utility tools should be provided:

Course preview - See how the course will render before submitting
Editors for Manifest and Quiz files - editing JSON by hand is cumbersome, so having a more intuitive interface provide a good chunk of the potential value from a fully integrated course editor (which could take significantly more efforts)
Publish Course - make it easy to submit course to the public registry

Since it is probably most practical to save the course content during course development as a repo on github (or similar services like gitlab), providing integration against them seems an obvious step to make. For implementation, again the isomorphic-git library can be used - and this time the git push command should pose no problem - just make sure to read and integrate against github's own API.
As an example of such end-to-end integration, the "Publish Course" utility can provide automation for all three steps:

Pull from github to the browser's storage
Publish to IPFS using some SDK library (and remember the hash)
Submit a post to the Hive blockchain with the needed identifying tags and metadata added automatically (from reading the manifest file)

Plugin for Custom Course Content Type

To make the system more extensible, we should eventually support adding custom content type through a plugin architecture. For example, there could be new free cloud service relevant to learning scenario in the future.
To enable this, we need to be able to load new JS module dynamically (see here and here). Either the plugin should specify the exact import needed, or we may require plugin to conform to a certain format, such as having a default export.
Because plugin involves executing arbitrary third party code, security is a concern, and we should let user control whether plugins are allowed or not.
Moreover, there should be a plugin registry similar to the course registry, except that it should be more tightly regulated - e.g. always require cryptographic signature of the whole payload.
Signature of user generated content

For community contributed course, one idea is that the manifest should include the corresponding user id/handle of the Hive blockchain. To verify, check that the declared Hive id in the manifest matches with the Hive id of the post/submission.

  
## syllabus.md

      
    Raw
  

              syllabus.md
            
          
    Syllabus

Fundamental Track


Linux Shell (+ssh?)
git intro
Concepts of Programming Language

Then, one of: Python/JS/Go/Java


Plus:

Elementary Data Structure/Algorithm
Basic Libraries
Problem Solving/Project


Tooling and best practise

Package Manager + Integrated Tool
Unit testing Libraries
Intro to CI/CD
Coding good practise + refactoring


IDE

Installing and using VSCode
Debugger


Github/Gitlab

Project Management


Frontend Track


More on JS (node vs browser env., concurrency in JS, important browser API: DOM + AJAX)
Typescript
Concepts of Modern Frontend Framework (reactive programming and reflux architecture)

Then, core areas.
Area A: html/css/js/Design


html5
intro to css
CSS frameworks (Boostrap + Bulma)
SASS
Modern CSS3
CSS good practise
Web Design and Figma

Area B: Frameworks


React
React Routing
Redux (and related)
Next.JS

Area C: Build Tool


Concepts
webpack
rollup, parcel
snowpack

Area D: API


BaaS (Parse)
How to use API

Post-development


Storyboard
Jest/Jasmine (testing)

Then,

Options for static hosting
Web hosting in depth

Backend Track


Intro to internet
Concurrent and Network Programming (i.e. multicore + socket)
Concepts of Backend Web Development (Web Server, MVC framework, 3-tier arch)

Then core areas:
Area A: Database


SQL
ORM libraries

Area B: Web Framework

One of:

JS: Express
Python: Flash or Django
Java: Spring

Area C: API


REST/JsonRPC
OpenAPI + JsonSchema

(optional)

HATEOAS
Webhooks and integration

Area D: Session, auth, and security


Intro to web security
Authen and authorization (OAuth, OpenID/Connect, JWT, RBAC)

Post-development


DevOps/Server admin

Docker
nginx + caddy
K8s (?)
API Gateway (?)


End to end testing
Logging, metric, and monitoring
Cloud/Deployment

PaaS
IaaS (scripting with Terraform)
Self-hosting with VPS
(Concepts in cloud computing)


## z-kata-idea.md

      
    Raw
  

              z-kata-idea.md
            
          
    Code Kata Ideas
Sliding windows
Histogram
Interleave
Leaky bucket
Multi-period streaming analytics//Highscore board OOP
Array diff
Read CSV into object
Filtering by tag list
Interval data structure -> Schedule app
Variable length data encoding in binary format
Hydrating object
Object <-> list of tuple representation
POS Order OOP
Game card draw (guaranteed draw mechanics)
Crypto seed phrase generator
Remove duplicate in list
Show path in tree
String substitution?
Lab type	Example	Provider
Linux System Admin and some DevOps	Command line, git, docker, VSCode install	KataCoda, Play With Docker (PWD), Google Cloud Shell
Basic Programming Exercises	Intro to programming language	Code snippet/compile tools - e.g. glot.io
Development on IDE	Backend Dev	Gitpod, Google Cloud Shell
Frontend development	Frontend Dev	Frontend online IDE/snippet tools - e.g. CodeSandbox