Skip to content

Instantly share code, notes, and snippets.

@juliangruber
Last active August 22, 2022 07:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save juliangruber/032276dbdd44e056d5f7fc18d8debc3e to your computer and use it in GitHub Desktop.
Save juliangruber/032276dbdd44e056d5f7fc18d8debc3e to your computer and use it in GitHub Desktop.

L1<>L2 architecture

Conventions

  • Naming
    • L1: L1 machine
    • l1: Individual l1 worker process
  • The overview graph lists all involved components. In future graphs irrelevant components are omitted for readability, which doesn't mean they have been removed

Problems to solve

  • L2->l1 routing
  • l1->L2 routing

Overview

flowchart TB
  subgraph L1
    master
    subgraph workers
      l1
      l1'
    end
  end
  L2
  L2'
  subgraph L2s
    L2
    L2'
  end
  L2s --> |register|workers
  workers --> |request CID|L2s
  L2s --> |send CAR|workers
  client --> |request CID|workers
  workers --> |send CAR|client
sequenceDiagram
  participant Client
  participant l1s
  participant L2s
  L2s->>l1s: Register
  Client->>l1s: Request CID
  l1s->>L2s: Request CID
  L2s->>l1s: Send CAR
  l1s->>Client: Send CAR

Problem: L2->l1 routing

The l1 worker that receives the client request needs to be the one that also receives the CAR file from the L2. If L2s make new HTTP requests for their response, then the Node.js cluster module assigns it to a random l1 worker process.

We run into this problem because L2s likely living in home networks aren't directly reachable by the outside / can't act as servers. Therefore the L2 is the only one that can open new connections to the L1. This means that the L1 can't act as an HTTP client for example, requesting data from the L2. The L1 can request data, but unless we figure out how the L2 can reply directly, it will hit the L1's load balancer and get assigned a random l1, which didn't receive the original HTTP request from the client.

flowchart TB
  subgraph L1
    l1
    l1'
  end
  subgraph L2s
    L2
  end
  L2 --> |register|l1
  l1 --> |request CID|L2
  L2 --> |send CAR|l1'
  client --> |request CID|l1
  l1 --> |send CAR|client
  
  linkStyle 2 stroke:red
  linkStyle 4 stroke:red

Proposal: Workers share files

Worker processes share files with each other

  • significantly more complexity
  • more worker load
flowchart TB
  subgraph L1
    l1
    l1'
  end
  subgraph L2s
    L2
  end
  L2 --> |register|l1
  l1 --> |request CID|L2
  L2 --> |send CAR|l1'
  l1' --> |share CAR|l1
  client --> |request CID|l1
  l1 --> |send CAR|client
  
  linkStyle 3 stroke:yellow
  linkStyle 5 stroke:green
sequenceDiagram
  participant Client
  participant l1
  participant l1'
  participant L2
  L2->>l1: Register
  Client->>l1: Request CID
  l1->>L2: Request CID
  L2->>l1': Send CAR
  l1'->>l1: Share CAR
  l1->>Client: Send CAR

Proposal: Workers expose ports

Workers get their own port, so that L2 can reach the right worker (not behind load balancer).

  • needs one extra port per cpu core
  • complex dynamic nginx configuration
  • complex dynamic docker configuration
  • complex dynamic firewall configuration
flowchart TB
  subgraph L1
    l1
    l1'
  end
  subgraph L2s
    L2
  end
  L2 --> |register|l1
  l1 --> |request CID with port|L2
  L2 --> |send CAR to worker port|l1
  client --> |request CID|l1
  l1 --> |send CAR|client
  
  linkStyle 2 stroke:yellow
  linkStyle 1 stroke:yellow
  linkStyle 4 stroke:green

Proposal: Nginx sticky routing

Make nginx use sticky routing

  • doesn't work with free nginx because the incoming client request and the L2's request are from different sources and so stickyness can't be applied per client
flowchart TB
  subgraph L1
    l1
    l1'
  end
  subgraph L2s
    L2
  end
  L2 -->|register|l1
  l1 --> |request CID|L2
  L2 --> |send CAR|l1
  client --> |request CID|l1
  l1 --> |send CAR|client

TODO

  • If we use the hash routing strategy then we can match L1s and L2s by CID URL path segment.

Next strategies to try

Single TCP Connection

If there is a single TCP connection between l1 and L2, then there's no load balancing between the two and the l1 that asks L2 is also the one that gets the response.

  • libp2p multiplexer
    • I couldn't get the JS version to run unfortunately
  • binary websockets
    • Worked in contrived example
    • CAR responses will be chunked and prefixed by fixed-length request ID
    • This is the current state of the L1<>L2 integration PR, and seems to work
  • reverse http
    • protocol implications (same issue with http/1 and http/2)

Custom routing

If we replace the round robin routing in the Node.js cluster module with our own logic, we can ensure that a L2 response always arrives at the right l1, by inspecting the request URL.

  • custom nodejs cluster implementation
    • likely quite complex

Solution to L2->l1 routing

Use binary WebSockets, which allow for a persistent connection between L2 and l1, bypassing any load balancing.

Protocol

L2 registration

0{L2ID} 

0 + arbitrary length l2 id (utf-8)

L1 request

{requestId[36]}{cid}

36 bytes request Id (utf-8) + cid (utf-8)

L2 response

1{requestId[36]}{chunk?}

1 + 36 bytes request Id (utf-8) + chunk?

If the chunk is empty that means the CAR file has been served completely.

TODO

  • Use protocol buffers

Websocket performance

Unfortunately WebSockets are significantly slower than raw HTTP:

$ time node bench.mjs ws
node bench.mjs ws  5.96s user 0.66s system 104% cpu 6.329 total
$ time node bench.mjs http
node bench.mjs http  0.33s user 0.47s system 64% cpu 1.251 total
import WebSocket, { WebSocketServer } from 'ws'
import http from 'node:http'

const chunk1MB = Buffer.alloc(1024 * 1024).fill(1)

if (process.argv[2] === 'ws') {
  const wss = new WebSocketServer({ port: 3000 })
  wss.on('connection', ws => {
    ws.on('message', d => {
      if (d.length === 0) {
        process.exit()
      }
    })
  })
  const ws = new WebSocket('ws://127.0.0.1:3000')
  ws.on('open', () => {
    let todo = 1024
    const send = () => {
      if (--todo > 0) {
        ws.send(chunk1MB)
        setTimeout(send, 0)
      } else {
        ws.send(Buffer.alloc(0))
      }
    }
    send()
  })
} else {
  const server = http.createServer((req, res) => {
    let todo = 1024
    const send = () => {
      if (--todo > 0) {
        res.write(chunk1MB)
        setTimeout(send, 0)
      } else {
        res.end()
      }
    }
    send()
  })
  server.listen(3000)
  const req = http.get('http://127.0.0.1:3000', res => {
    res.on('data', () => {})
    res.on('end', () => {
      process.exit()
    })
  })
}

Reverse HTTP

For reference, my attempt at a reverse http implementation (ptth):

import http from 'http'
import stream from 'stream'

const server = http.createServer((req, res) => {
  console.log('l1 received request')
  console.log('l1 tries turning into the client now')
  console.log('l1 request cid')
  req.socket.on('data', d => console.log('l1 receive', d.toString()))
  const cidReq = http.request({
    createConnection: () => req.socket,
    // createConnection: () => stream.Duplex.from({ writable: res, readable: req }),
    path: '/cid/CID'
  }, cidRes => {
    console.log('l1 received cid response')
  })
  cidReq.end()
})

server.listen(3000, () => {
  console.log('l1 make request')
  const req = http.request('http://localhost:3000', {
    method: 'POST'
  }, res => {
    console.log('l2 received response')
  })
  req.on('connect', () => {
    console.log('l2 established connection')
    console.log('l2 tries turning into the server now')
  })
  req.end()
})
$ node http.mjs
l1 make request
l1 received request
l1 tries turning into the client now
l1 request cid
node:events:505
      throw er; // Unhandled 'error' event
      ^

Error: Parse Error: Expected HTTP/
    at Socket.socketOnData (node:_http_client:494:22)
    at Socket.emit (node:events:527:28)
    at addChunk (node:internal/streams/readable:315:12)
    at readableAddChunk (node:internal/streams/readable:289:9)
    at Socket.Readable.push (node:internal/streams/readable:228:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
Emitted 'error' event on ClientRequest instance at:
    at Socket.socketOnData (node:_http_client:503:9)
    at Socket.emit (node:events:527:28)
    [... lines matching original stack trace ...]
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23) {
  bytesParsed: 0,
  code: 'HPE_INVALID_CONSTANT',
  reason: 'Expected HTTP/',
  rawPacket: Buffer(64) [Uint8Array] [
     71,  69,  84,  32,  47,  99, 105, 100,  47,  67,  73,  68,
     32,  72,  84,  84,  80,  47,  49,  46,  49,  13,  10,  72,
    111, 115, 116,  58,  32, 108, 111,  99,  97, 108, 104, 111,
    115, 116,  58,  56,  48,  13,  10,  67, 111, 110, 110, 101,
     99, 116, 105, 111, 110,  58,  32,  99, 108, 111, 115, 101,
     13,  10,  13,  10
  ]
}

Suggestions by Miro:

Regarding Reverse HTTP code snippet - I think the client (L2) needs to start with an HTTP/1.1 Upgrade request. The server (L2) responds with HTTP status 101 to confirm the upgrade. Only after this initial exchange you are in control of the TCP connection and can start setting up a reverse HTTP.

Problem: l1->L2 routing

Now that we have solved the load balancing aspect of L2->l1 routing, another problem arises:

Each L2 is connected to one l1. This means that l1s don't always have a working connection to the L2 that should be selected by the consistent hashing algorithm, and can't help themselves but request the CID from the wrong L2, which will then also hydrate itself.

flowchart TB
  subgraph L1
    l1
    l1'
  end
  subgraph L2s
    L2
    L2'
  end
  L2 --> |register|l1
  L2' --> |register|l1'
  l1 --> |request CID|L2
  client --> |request CID|l1
  l1 --> |send CAR|client
  L2 --> |send CAR|l1
  
  linkStyle 2 stroke:red
  linkStyle 4 stroke:red
  linkStyle 5 stroke:red

Proposal: Single proxy

Add a singular proxy process that sits in between all l1s and their connected L2s. This proxy can forward request, selecting the right L2s to contact, and also forward responses to the right l1.

flowchart TB
  subgraph L1
    l1
    l1'
    proxy
  end
  subgraph L2s
    L2
    L2'
  end
  L2 --> |register|proxy
  L2' --> |register|proxy
  l1 --> |request CID|proxy
  proxy --> |request CID|L2
  client --> |request CID|l1
  l1 --> |send CAR|client
  L2 --> |send CAR|proxy
  proxy --> |send CAR|l1

Proposal: Hash routing

Route L1/L2 requests by CID path segment (GET /ipfs/CID & POST /data/CID), this way connecting the right l1s with L2s. This requires us to dith the node cluster approach and let nginx load balance between them.

Downside: l1s need to synchronize with each other, to all know of all connected L2s, and ask the with l1 to ask for a CID from the right L2.

flowchart TB
  subgraph L1
    l1
    l1'
  end
  subgraph L2s
    L2
    L2'
  end
  L2 --> |register|l1
  L2' --> |register|l1
  l1 --> |request CID|L2
  client --> |request CID|l1
  l1 --> |send CAR|client
  L2 --> |send CAR|l1
map $uri $myvar{
    default $uri;
    # pattern "Method1" + "/" + GUID + "/" + target parameter + "/" + HASH;
    "~*/Method1/(.*)/(.*)/(.*)$" $2;
    # pattern "Method2" + "/" + GUID + "/" + target parameter;
    "~*/Method2/(.*)/(.*)$" $2;
}


upstream backend {
    hash $myvar consistent;
    server s1:80;
    server s2:80;
}

Proposal: Single Node.js process

If we switch from Node.js process per CPU core to a singular Node.js http process, then we don't need a routing layer for l1s at all.

This means incoming HTTP requests don't need to be routed to the correct l1, and also L2 responses don't need to be routed to the correct l1. This removes the need for persistent connections between l1s and L2s as well, so we can go back to the NLDJSON + HTTP POST protocol.

Downside: If we need to distribute workload across multiple CPU cores, we need to look into strategies like worker pools with shared array buffers, on top of which streaming data verification (and more work) is implemented. For now, this shouldn't be a concern though.

Protocol

L2 registers, L1 requests data
GET htts://L1_URL/register/L2_ID
{"requestId":"REQUEST_ID","cid":"CID"}\n
{"requestId":"REQUEST_ID","cid":"CID"}\n
{"requestId":"REQUEST_ID","cid":"CID"}\n
\n                                      <- heartbeat
...

"requestId" is created by the client connecting to the L1, passed down to the L2. It will be attached by L1 and L2 to log lines created as part of the request response cycle, for ease of debugging.

L2 sends data
POST https://L1_URL/data/CID

Architecture

flowchart TB
  subgraph L1
    l1
  end
  subgraph L2s
    L2
    L2'
  end
  L2 --> |register|l1
  L2' --> |register|l1
  l1 --> |request CID|L2
  client --> |request CID|l1
  l1 --> |send CAR|client
  L2 --> |send CAR|l1

Potential improvements

Report L2 health through bidirectional ndjson protocol
POST htts://L1_URL/register/L2_ID
> {"ok":true,"cpu":0.8,"memory":0.2}\n
< {"requestId":"REQUEST_ID","cid":"CID"}\n
< {"requestId":"REQUEST_ID","cid":"CID"}\n
< {"requestId":"REQUEST_ID","cid":"CID"}\n
@juliangruber
Copy link
Author

juliangruber commented Aug 16, 2022

  • document Miro's UPGRADE suggestion

@juliangruber
Copy link
Author

juliangruber commented Aug 16, 2022

  • document potential bidirectional ndjson protocol for reporting l2 health

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment