raggi/roflscale.txt

## roflscale.txt
08:58 Defi_: can anyone tell me if there are any obvious disadvantages to patching a blocking library to use fibers at the socket level?
08:58 Defi_: its far too much effort to have to rewrite large chunks of every library just to make it async and fiber-aware
08:59 raggi: if it uses non-stack stored state you could end up with concurrency issues (read: race conditions)
08:59 raggi: fibers as implemented in MRI have limited size stacks (4kb)
09:00 Defi_: hmm alright :/
09:01 raggi: if the library is trying to be thread safe, it can make a real mess too
09:01 raggi: Defi_: just use threads.
09:02 Defi_: raggi: not gonna happen...
09:02 Defi_: even if this was a personal project, i wouldnt use threads
09:03 Defi_: guess ill just have to continue rewriting parts of a bunch of libs as i go
09:03 raggi: oh, wait, i remember who you are
09:03 raggi: you're the omniscient guy who needs roflscale aren't you, and you think threads are going to stop you from servicing your roflmillions of users
09:03 Defi_: and i remember that you hate fibers for whatever reasons
09:04 raggi: i don't hate fibers
09:04 raggi: i just think a lot of people are misusing them for silly things
09:05 Defi_: no.. but my boss does want a highly scalable async platform backend, and from experience, i'd rather wrap the async code in fibers, than have tons of callback spaghetti
09:05 raggi: fibers are spaghetti
09:06 raggi: you're still doing regular jumps around code
09:06 Defi_: im sure plenty people misuse them for plenty reasons, but no more than many other things
09:06 raggi: it's just that you have it stored in stacks instead of objects
09:06 Defi_: so what?
09:06 raggi: which is harder to debug
09:06 Defi_: so you'd rather use tons of callbacks, than fibers?
09:07 raggi: i'd be pragmatic about what the critical path is
09:07 raggi: and optimise only the critical path
09:08 Defi_: heh
09:09 raggi: Defi_: what "scale" are you really talking about, because last time you just started blurting the word cloud at me like it meant something special
09:10 Defi_: raggi: im talking reasonably large scale, obviously not right from the beginning, but as functionality and the client base grows, it needs to scale up majorly, with interfaces to pretty much every large social/media related sites apis on the web
09:11 raggi: talk numbers
09:12 Defi_: raggi: i cant talk numbers, the platform is still in early development stages, but it needs to be able to make many hundreds of requests in realtime
09:12 raggi: you don't know what the numbers are, and yet you're saying "hundreds of requests in realtime"
09:12 raggi: FYI, hundreds of requests "in realtime" can work just fine using sync libs
09:13 Defi_: its going to have to either run on its own netblock of ips or use many proxies
09:13 Defi_: ok, lets say thousands of requests in realtime per online client
09:13 Defi_: that should give you a rough idea
09:14 Defi_: its too early in development to be able to be too specific
09:14 raggi: that just means you are doing no real projections, which means it's purely technological masturbation
09:15 raggi: as for thousands of requests per client "in realtime" that's more than likely a completely silly sentiment
09:15 raggi: for the plain and simple reason, that a particular user is unlikely to even want to read all the results of thousands of requests each time they visit
09:15 raggi: and if you're aggregating, then you probably arent' going to be triggering based on "each online client"
09:16 Defi_: again, im not the boss, i just write the code required to make shit happen
09:16 Defi_: but the data fetched over hundreds of thousands of requests will be processed to averages and summed statistics
09:17 raggi: you should tell your boss to pay for some consulting from someone that's built these kinds of systems before
09:18 Defi_: anyway, discussing this doesnt really help anything
09:18 raggi: no, because you're already set in stone that you need roflscale, which is both architecturally wrong and invalid for the business
09:19 raggi: you also seem to be completely certain that threads are completely inappropriate for your use cases
09:19 Defi_: heh, you really dont have any idea of the specifics of the system, so you cannot judge what sort of scaling is required
09:19 raggi: and you clearly don't understand the non-difference between threads and fibers in this kind of context
09:20 Defi_: i understand that threads have more overhead than fibers and any shared data would require locking and synching
09:20 raggi: Defi_: actually, i can, because i've worked on systems that are in these categories
09:20 Defi_: i also understand all the over disadvantaged of threads
09:20 raggi: Defi_: you still need locking in fibers
09:20 Defi_: nope
09:20 raggi: Defi_: for shared state
09:20 raggi: yes you do
09:20 raggi: lol
09:20 Defi_: fibers do not run concurrently
09:20 xxxxxx: lawl
09:20 raggi: yes they do
09:20 raggi: they don't run in parallel
09:20 raggi: but that's different
09:20 raggi: and they're cooperatively scheduled
09:21 Defi_: by concurrently, i mean in parallel
09:21 Defi_: i know this...
09:21 raggi: hey xxxxxx :)
09:21 xxxxxx: lawls
09:21 xxxxxx: hi
09:21 xxxxxx: this is fun
09:21 raggi: yes
09:21 raggi: roflscale
09:21 Defi_: you do not need to lock an array, if 2 fibers use it
09:21 raggi: Defi_: i think you need shards bro
09:21 Silex: imho concurrently should mean in parralel by default
09:21 Defi_: because they will never access it at the same time
09:21 Defi_: since its cooperatively scheduled
09:21 Defi_: i know its gonna need to be sharded raggi
09:22 Defi_: which is one of the reasons i've gone with MongoDB
09:22 xxxxxx: shard your fibers
09:22 raggi: Defi_: http://gist.github.com/560087
09:22 raggi: shardnull
09:22 raggi: it's faster than mongodb
09:22 raggi: and more reliable
09:22 raggi: you can be 100% certain of what it will do with your data
09:23 Defi_: it is?
09:23 raggi: it's also atomically consistent even under high concurrency, and 100% available
09:23 raggi: iow it completely defeats the CAP theorom
09:23 locks: raggi: are you certain? mongodb is web scale..
09:23 raggi: locks: shardnull is UNIVERSE SCALE
09:24 Defi_: mongo scales very nicely, is it really worth using shardnull?
09:24 raggi: yes
09:24 Defi_: is there a nice ORM for it like MongoMapper?
09:24 locks: definitely
09:24 xxxxxx: wtf defi_ are you actually trying to re-enact that webscale video ?
09:24 Defi_: because i've already rewritten some of the models for this platform twice
09:24 raggi: yes, it can store any objects directly
09:24 Defi_: first with mongoid, then mongomapper
09:24 raggi: so you don't need an ORM, you can just send it pure marshalled ruby
09:25 raggi: and it consistently atomically stores your objects in parallel in a highly available distributed manner
09:25 locks: and it doesn't lock ever
09:25 raggi: all it does is apply some very "clever" mutations to the data on the way into the store
09:26 Defi_: hmm
09:26 raggi: also, it's only 28 lines of ruby that i wrote using FFI to link libc
09:26 xxxxxx: run it by your boss
09:26 raggi: so it's obviously FAST
09:26 Defi_: and how can the data be queried by the database raggi?
09:26 xxxxxx: its also pretty damn memory efficient
09:26 raggi: `less /dev/null`
09:26 Defi_: how can it process through millions of records
09:26 raggi: standard posix operations
09:26 Defi_: are you sure it'll perform as well as mongodb?
09:27 raggi: Defi_: it'll process millions of writes per second with ease
09:27 locks: raggi: does it work on windows though?
09:27 raggi: it performs better than mongodb
09:27 raggi: locks: my shardnull proxies do
09:27 Defi_: raggi: how does that help with summing up millions of records into whatever data
09:27 raggi: locks: that's actually shardNUL:
09:27 raggi: ;-)
09:27 locks: ohhh
09:27 locks: you're pretty clever man
09:28 raggi: i have the sekrets of the webscale sauce
09:28 Defi_: pulling millions of rows of data into ruby and processing surely isnt nearly as efficient as mongodb raggi?
09:28 raggi: Defi_: you can aggregate teh size of the records just by reading stats from /proc
09:28 raggi: ZOMGLOL
09:28 raggi: i'm dieing here
09:29 Defi_: eh.. im failing to see how you would process the data more efficient than with a mongodb query
09:29 raggi: Defi_: no, you can use /proc, so the kernels already done all the work for you! (which is written in C and ASM)
09:29 raggi: i've gotta win some kind of troll-the-troll award, surely?
09:30 xxxxxx: lol
	08:58 Defi_: can anyone tell me if there are any obvious disadvantages to patching a blocking library to use fibers at the socket level?
	08:58 Defi_: its far too much effort to have to rewrite large chunks of every library just to make it async and fiber-aware
	08:59 raggi: if it uses non-stack stored state you could end up with concurrency issues (read: race conditions)
	08:59 raggi: fibers as implemented in MRI have limited size stacks (4kb)
	09:00 Defi_: hmm alright :/
	09:01 raggi: if the library is trying to be thread safe, it can make a real mess too
	09:01 raggi: Defi_: just use threads.
	09:02 Defi_: raggi: not gonna happen...
	09:02 Defi_: even if this was a personal project, i wouldnt use threads
	09:03 Defi_: guess ill just have to continue rewriting parts of a bunch of libs as i go
	09:03 raggi: oh, wait, i remember who you are
	09:03 raggi: you're the omniscient guy who needs roflscale aren't you, and you think threads are going to stop you from servicing your roflmillions of users
	09:03 Defi_: and i remember that you hate fibers for whatever reasons
	09:04 raggi: i don't hate fibers
	09:04 raggi: i just think a lot of people are misusing them for silly things
	09:05 Defi_: no.. but my boss does want a highly scalable async platform backend, and from experience, i'd rather wrap the async code in fibers, than have tons of callback spaghetti
	09:05 raggi: fibers are spaghetti
	09:06 raggi: you're still doing regular jumps around code
	09:06 Defi_: im sure plenty people misuse them for plenty reasons, but no more than many other things
	09:06 raggi: it's just that you have it stored in stacks instead of objects
	09:06 Defi_: so what?
	09:06 raggi: which is harder to debug
	09:06 Defi_: so you'd rather use tons of callbacks, than fibers?
	09:07 raggi: i'd be pragmatic about what the critical path is
	09:07 raggi: and optimise only the critical path
	09:08 Defi_: heh
	09:09 raggi: Defi_: what "scale" are you really talking about, because last time you just started blurting the word cloud at me like it meant something special
	09:10 Defi_: raggi: im talking reasonably large scale, obviously not right from the beginning, but as functionality and the client base grows, it needs to scale up majorly, with interfaces to pretty much every large social/media related sites apis on the web
	09:11 raggi: talk numbers
	09:12 Defi_: raggi: i cant talk numbers, the platform is still in early development stages, but it needs to be able to make many hundreds of requests in realtime
	09:12 raggi: you don't know what the numbers are, and yet you're saying "hundreds of requests in realtime"
	09:12 raggi: FYI, hundreds of requests "in realtime" can work just fine using sync libs
	09:13 Defi_: its going to have to either run on its own netblock of ips or use many proxies
	09:13 Defi_: ok, lets say thousands of requests in realtime per online client
	09:13 Defi_: that should give you a rough idea
	09:14 Defi_: its too early in development to be able to be too specific
	09:14 raggi: that just means you are doing no real projections, which means it's purely technological masturbation
	09:15 raggi: as for thousands of requests per client "in realtime" that's more than likely a completely silly sentiment
	09:15 raggi: for the plain and simple reason, that a particular user is unlikely to even want to read all the results of thousands of requests each time they visit
	09:15 raggi: and if you're aggregating, then you probably arent' going to be triggering based on "each online client"
	09:16 Defi_: again, im not the boss, i just write the code required to make shit happen
	09:16 Defi_: but the data fetched over hundreds of thousands of requests will be processed to averages and summed statistics
	09:17 raggi: you should tell your boss to pay for some consulting from someone that's built these kinds of systems before
	09:18 Defi_: anyway, discussing this doesnt really help anything
	09:18 raggi: no, because you're already set in stone that you need roflscale, which is both architecturally wrong and invalid for the business
	09:19 raggi: you also seem to be completely certain that threads are completely inappropriate for your use cases
	09:19 Defi_: heh, you really dont have any idea of the specifics of the system, so you cannot judge what sort of scaling is required
	09:19 raggi: and you clearly don't understand the non-difference between threads and fibers in this kind of context
	09:20 Defi_: i understand that threads have more overhead than fibers and any shared data would require locking and synching
	09:20 raggi: Defi_: actually, i can, because i've worked on systems that are in these categories
	09:20 Defi_: i also understand all the over disadvantaged of threads
	09:20 raggi: Defi_: you still need locking in fibers
	09:20 Defi_: nope
	09:20 raggi: Defi_: for shared state
	09:20 raggi: yes you do
	09:20 raggi: lol
	09:20 Defi_: fibers do not run concurrently
	09:20 xxxxxx: lawl
	09:20 raggi: yes they do
	09:20 raggi: they don't run in parallel
	09:20 raggi: but that's different
	09:20 raggi: and they're cooperatively scheduled
	09:21 Defi_: by concurrently, i mean in parallel
	09:21 Defi_: i know this...
	09:21 raggi: hey xxxxxx :)
	09:21 xxxxxx: lawls
	09:21 xxxxxx: hi
	09:21 xxxxxx: this is fun
	09:21 raggi: yes
	09:21 raggi: roflscale
	09:21 Defi_: you do not need to lock an array, if 2 fibers use it
	09:21 raggi: Defi_: i think you need shards bro
	09:21 Silex: imho concurrently should mean in parralel by default
	09:21 Defi_: because they will never access it at the same time
	09:21 Defi_: since its cooperatively scheduled
	09:21 Defi_: i know its gonna need to be sharded raggi
	09:22 Defi_: which is one of the reasons i've gone with MongoDB
	09:22 xxxxxx: shard your fibers
	09:22 raggi: Defi_: http://gist.github.com/560087
	09:22 raggi: shardnull
	09:22 raggi: it's faster than mongodb
	09:22 raggi: and more reliable
	09:22 raggi: you can be 100% certain of what it will do with your data
	09:23 Defi_: it is?
	09:23 raggi: it's also atomically consistent even under high concurrency, and 100% available
	09:23 raggi: iow it completely defeats the CAP theorom
	09:23 locks: raggi: are you certain? mongodb is web scale..
	09:23 raggi: locks: shardnull is UNIVERSE SCALE
	09:24 Defi_: mongo scales very nicely, is it really worth using shardnull?
	09:24 raggi: yes
	09:24 Defi_: is there a nice ORM for it like MongoMapper?
	09:24 locks: definitely
	09:24 xxxxxx: wtf defi_ are you actually trying to re-enact that webscale video ?
	09:24 Defi_: because i've already rewritten some of the models for this platform twice
	09:24 raggi: yes, it can store any objects directly
	09:24 Defi_: first with mongoid, then mongomapper
	09:24 raggi: so you don't need an ORM, you can just send it pure marshalled ruby
	09:25 raggi: and it consistently atomically stores your objects in parallel in a highly available distributed manner
	09:25 locks: and it doesn't lock ever
	09:25 raggi: all it does is apply some very "clever" mutations to the data on the way into the store
	09:26 Defi_: hmm
	09:26 raggi: also, it's only 28 lines of ruby that i wrote using FFI to link libc
	09:26 xxxxxx: run it by your boss
	09:26 raggi: so it's obviously FAST
	09:26 Defi_: and how can the data be queried by the database raggi?
	09:26 xxxxxx: its also pretty damn memory efficient
	09:26 raggi: `less /dev/null`
	09:26 Defi_: how can it process through millions of records
	09:26 raggi: standard posix operations
	09:26 Defi_: are you sure it'll perform as well as mongodb?
	09:27 raggi: Defi_: it'll process millions of writes per second with ease
	09:27 locks: raggi: does it work on windows though?
	09:27 raggi: it performs better than mongodb
	09:27 raggi: locks: my shardnull proxies do
	09:27 Defi_: raggi: how does that help with summing up millions of records into whatever data
	09:27 raggi: locks: that's actually shardNUL:
	09:27 raggi: ;-)
	09:27 locks: ohhh
	09:27 locks: you're pretty clever man
	09:28 raggi: i have the sekrets of the webscale sauce
	09:28 Defi_: pulling millions of rows of data into ruby and processing surely isnt nearly as efficient as mongodb raggi?
	09:28 raggi: Defi_: you can aggregate teh size of the records just by reading stats from /proc
	09:28 raggi: ZOMGLOL
	09:28 raggi: i'm dieing here
	09:29 Defi_: eh.. im failing to see how you would process the data more efficient than with a mongodb query
	09:29 raggi: Defi_: no, you can use /proc, so the kernels already done all the work for you! (which is written in C and ASM)
	09:29 raggi: i've gotta win some kind of troll-the-troll award, surely?
	09:30 xxxxxx: lol