- iOS infinite spinner (service worker related?)
- Update to NWL 2020
- Fix NaN% full score bug
- Add "Net" scoring mode
- Support Super Big Boggle on large screens
- Add in all dice configurations, though don't use
- Don't restore application to Dictionary view (ie don't save view)
- Finish gathering updated GIFs which reflect up to date UI
- Host canonically under
boggle.scheibo.com
with GitHub pages
-
-
Save scheibo/f35f30061dc8cc6e15974f5914e95dab to your computer and use it in GitHub Desktop.
==Marty commented out these lines https://github.com/Antar1011/Smogon-Usage-Stats/blob/master/batchMovesetCounter.py#L146-L147==
-
can make adjustment: https://www.smogon.com/forums/threads/gen-8-smogon-university-usage-statistics-discussion-thread.3657197/post-8845077
-
Design doc, discuss:
- list of all possible improvements (address
FIXME
) - anonymizing logs (visitor pattern/logs processing framework)
- compressed directories (ZIP > tar)
- database store
-
generate (static) web pages instead of ASCII tablestable
-able syntax (array of arrays? JSON?), usetable
andhtml
- store only half of encounters/teammates pseudo-symmetric matrix
- short circuit if weight is zero (but counts?)
- continuous mode vs. batch/catchup mode~
- split / apply / combine
- list of all possible improvements (address
-
update to reflects Marty's changes
- Doubles/Other Metagame rises and drops
-
tear out worker infrastructure in favor - [ ] architecture handles child process OR worker threads
-
parse out additional koed+switched information required for 'human-readable' stats from
moveset.txt
-
abstract out the
process
script- need high level script for setting default args, then let individual works do the rest. process can call the sub command and have it parse the rest of the args
- takes common (logs) options and path to worker script, passes additional options to worker
-
track win percentage (TI request for Random Battles)
-
track unique user weights
-
track pre-mega ability
-
process
handles7z
/tgz
: automatically extract in tmp, delete set of files after checkpoint is finalized -
checkpoints store config information in case of changes
-
handle N months worth of logs at once
- take begin and end timestamp (shortcut which allows for
2018-02
to be specified) and only include relevant files - create new checkpoint directory each time, if date range changes and checkpoints dont then will be nonsensical
- just use
2018-02
2018-03
each time and it will only incrementally update the2018-02
reports
- take begin and end timestamp (shortcut which allows for
-
Stats UI: index.html (+ apache rewrites) to serve pages -
anon: worker needs to handle rename as well during reduce stage
-
stats sharding logic: input AND output shards
-
shard over cutoff, tag (pull up, instead of push down)
-
for the output shard only: if one is missing, redo the entire checkpoint
-
error if run with different batch size and only one shard is missing
-
shards:
format/<shard>/day/filejson prepare: needs to make folders for all the shards restore: look for all checkpoints deshard:: sort, notice if any missing, if any missing = REDO whole DAY apply: ONLY GETS format, not shard info?
- shard = input and output => read log file multiple times :(, much easier
- shard => output only = read files once (but at least 4x memory...)
-
how to avoid reading the same log N times
- OK to read data N times = different parts of data = desirable for report sharding
-
conditional probability tables ('bigram' support) needed for EPOke
- can arbitrary plugins be added/removed from the process? just specify
--plugins=bigrams
toprocess
?
- can arbitrary plugins be added/removed from the process? just specify
-
Leads reports: add N (> 2; 1 good, 1 bad) battles from gen7monotype (same teams) + process tags for teams & one nonexistent
-
adjust stats report testing logic to handle monotype
-
fix memory error in
z-std
, use for checkpoints
- design doc completed
- verify one month of data against Smogon-Usage-Stats
- rewritten
anon
on top of@pkmn/protocol
- all known issues from Smogon-Usage-Stats fixed - no more
FIXME
- bigrams published
- daily (hourly?) reports
- Stats UI published
- human stats published
- full run across entire corpus (user with most battles?)
- set cluster: For each analysis set, grab statistics and see how many could match (if spread, convert stats -> spread with assumptions, allow for speed creep/inexact matching)
@pkmn/stats
workflows/ cant be privileged at all, just depend on @pkmn/logs + @pkmn/stats etc and run
process
script hardcodes logic so that anon
=> workflows/anon.ts fallback, but otherwise will look for file for worker
- handle stats
debug.ts
,config.ts
,process
script - work on stats
README.md
andDESIGN.md
- parse configuration
accept(format:id)
=> returns(shard, weight)
, if weight = 0 then drop, otherwise for each batch key up shard shard spin up workers and pass shardfor *(const batch = await batch()) { for (const shard of shards) { yield [shard, batch]; } }
- what is
weight
doing?weight
only works if we go by least loaded, not round robin... - dont do round robin, just pass work to whoever is read for it
- instead, master just sends workers jobs and its up to them to decide how many to do at once?
- figure out logs that need to be processed, start
yield
-ing anything that matchesaccept
- workers save checkpoints for a shard and (shard or
format shard
)
master thread is generating work, passes to least loaded worker
problem => want to be unpacking in most parallel, ie doing multiple formats in parallel, but this means it will take longer before we are done with a format (because other formats are getting done at the same time)
need --sequentialFormats
parameter to ensure all work for a specific format gets done FIRST
- still need to potentially account for checkpoints having happened
/tmp/checkpoints-2pA7Hjx
WORKER (up to worker to determine if checkpoints are still valid)
checkpoints/ - actual checkpoint files
decompressed/ - decompressed data
scratch/ - scratch output from worker
if --checkpoints
is not passed, directory is created in /tmp
and hook installed to delete ALL.
- Worker is always responsible for cleaning up
scratch/
checkpoints/
- if really concerned about space these can be deleted once a format is done (turned into marker)decompressed/
- can be deleted when 'done' with them = all shards have been processed if concerned about space, otherwise useful for future runs and should mirror a hypothetical PS logs directory
DO NOT CLEAN UP CHECKPOINTS ON EXIT IF PASSED IN
-
changing input affects checkpoints? = should be able to still use if they overlap. WORKER is the main thing which affects checkpoints
-
ONLY WORKER SHOULD MATTER
-
delete checkpoints if:
- flag not passed in and we created temp
-
worker responsible for cleaning
// 2020-08/gen1ou/2020-08-14/battle-gen1ou-24687621.log.json -> 2020-08-14_gen1ou_24687621
interface Offset {
day: string; // 2020-08-14
format: string; // gen1ou
log: number; // 24687621
}
// CLEANUP: happens at format level, how to pass up?
interface Batch {
begin: Offset;
end: Offset;
}
type AcceptFn = (format: ID) => string[] | undefined;
interface LogStorage {
// Returns:
// - offsets to pass to bar
// - something to delete
foo(checkpoints: XXX, accept: AcceptFn, begin?: Date, end?: Date): AsyncIterator<>;
// Return names which can be passed to read
bar(begin: Offset, end: Offset): AsyncIterator<string[]>;
read(log: string): Promise<string>
}
interface CheckpointStorage {
read(XXX: string): Promise<string>;
write(checkpoint: Checkpoint): Promise<void>;
}
--constrained
mode = only important if concerned about space
= run formats sequentially = how, need completely different architecture to yield earlier..., need coroutine
= wait until format is done (all shards), turn checkpoint into tombstone and delete decompressed for format
= delete decompressed formats that we dont accept
want to be able to delete a day when done with it is day to granular? delete format instead? surely MONTH-format
month-format-day month-format month
ASSUME NOT SPACE LIMITED (can expand particular formats at will)
process(format, shard, batch); // may stretch over many months...
// need some indicator that ALL batches are done to know to call combine on the shard // if all batches are done, can also know to delete data used by batch...
cleanup and serial formats
{
done(); => }
yield batches
yield batches | done if (Array.isArray(yielded))
else (done());
==PROBLEM== format is under year, actually need MONTH-FORMAT too hard to ever know its OK to delete?
stream results to workers = only know we can finish a form when all formats are done... eg. might find more format data in future months for worker, better if FORMAT-MONTH-DAY-
SELECT format FROM battles WHERE created_at > begin && created_at <= end;
SELECT id FROM battles WHERE format = ? && created_at > begin && created_at <= end;
SELECT output_log FROM battles WHERE format = id;
SELECT id FROM battles WHERE format = ? && id >= ? and id <= ?; -- only works because id is ASC with time
select(format, begin, end): ids read(id);
// problem for 2 reasons
- dont know
2 problems:
1: what if we dont know which formats exist 2: if compressed, select(format, begin, end) is suboptimal because need to wait for
with compress logs
- if theyre compressed in the first place you probably are constrained?
- Must be able to store entire working set at once
- No cleanups, but wont delete
YYYY-MM
└── format
└── YYYY-MM-DD
└── battle-format-N.log.json
BFS to open all YYYY-MM => if 7z youre probably fucked DFS on formats (fine YYYY-MM for each format)
can do formats in parallel or serial
- signal from split -> master that format is done
- signal from master -> split that format can be cleaned up (if constrained)
WITHIN a format = can be parallel across days/months/etc
Offset -> Offset
yield [] yeild [] yield {cleanup()} <= this is signal that format is done, master than call done() to cleanup
HOW DO SHARDS FIT
= each shard might have different batches (has its own checkpoints), need to instead yield
(format, shard, (begin, end))
| (format, shard, done())
==TODO== WORKER
file needs to handle shards changing in stats work! --shard=tag,cutof
--shard=tag
= split out tags but do all cutoffs at once
--shard=cutoff
=- split out cutoffs but do all tags at once
--shard=tag,cutoff
= shard out everything (monowater-1500
)
no sharding = do EVERYTHING at once
parse configuration set up workspace, make sure workspace is compatible with worker start spliting using LogsStorage + Checkpoint Storage (= restore) (heavily storage layer dependent)
storage.logs.process
=> open up all months in parallel and find all month/formats
for formats we accept
. accept
returns true or shards if accepted
=> if constrained
- process each format serially (though internally days can be processed in parallel), send out a done() for each shard after completion
=> if not constrained
- process each format all together
processing format => expand format dir, if constrained delete days out of range, process days in parallel
process day => find all files in range (dont bother deleting files out of range) with range of files: for each shard get batch from range of files + get checkpoints for shard + day process(format, shard, batch)
tasks = Map<{format: ID, shard?: string}, Promises<void>[]>
await storage.logs.process(accept,
process(task) {
let remaining = tasks.get({format: task.format, shard: task.shard}
if (task.done) {
if (!remaining) {
done();
} else {
Promise.all(remaining).then(() => {
combine(task.format, task.shard).then(() => task.done());
});
}
} else {
if (!remaining) {
remaining = [];
tasks.get({format: task.format, shard: task.shard}, remaining);
}
remaining.push(apply(task.format, task.shard, task.batch));
}, begin, end);