Skip to content

Instantly share code, notes, and snippets.

First Principles Redesign

I want a system that makes easy things trivial, and hard things easy.

Metrics and Artifacts: The Easiest Thing

The easiest thing is to collect metrics and render them in the webui. This MUST work by itself, without adoption of any other part of the system. Not even the master should have to be running!

@rb-determined-ai
rb-determined-ai / streaming-updates-erd.md
Last active August 29, 2023 15:37
Streaming Updates ERD

Streaming Updates ERD

This document has been replaced with an embedded DESIGN document that will outlive this project and exist alongside the code it is describing.

(after the streaming-updates branch lands, you should look at this url intead)

@rb-determined-ai
rb-determined-ai / det-basic-arch.png
Last active June 27, 2023 21:14
Determined AI v0.13.3 Architecture Diagram
det-basic-arch.png
@rb-determined-ai
rb-determined-ai / aws-dev-machine.md
Last active September 1, 2023 20:49
rb's guide to an aws dev machine

Configuring a Cloud Dev Machine

We'll set up a 2-node machine. If you set up two of these machines, you'll have everything you need to test multi-node distributed training, and you'll be able to start them up at a moment's notice.

Table Of Contents

@rb-determined-ai
rb-determined-ai / docusaurus-hurdles.md
Last active February 22, 2023 01:03
Docusaurus Hurdles

Docusaurus Hurdles

Semantic References

Semantic references is the ability to link to reference docs by the name of a class or reference a section of a how-to guide by a named anchor in that guide.

I think this is the single greatest feature that Sphinx offers, especially because it allows seamless linking between source docs.

@rb-determined-ai
rb-determined-ai / draft_2.md
Created February 3, 2023 00:16
What if core_context.searcher didn't exist (draft 2)

Draft 2: What if core_context.searcher didn't exist

We're dreaming up an alternate universe here.

Link to previous draft.

Experiments can be a simple list of trials

Think of an experiment as a predefined hyperparameter space. No trials are created if no searcher is defined at experiment creation:

@rb-determined-ai
rb-determined-ai / draft_1.md
Last active February 3, 2023 00:17
What if core_context.searcher didn't exist

Draft 1 is Obsolete, see Draft 2

We're dreaming up an alternate universe here.

Experiments can be a simple list of trials

Think of an experiment as a predefined hyperparameter space. No trials are created if no searcher is defined at experiment creation:

@rb-determined-ai
rb-determined-ai / 1_core_api.py
Last active January 31, 2023 04:43
Proposal: add max_length requirement to searcher provider
"""
max_length() makes core api hpsearch much less invasive
Basically, the object-oriented approach of the current searcher
API just gets in the way if the training loop isn't built around
the core_context.searcher.operations() call.
"""
# original code
def main():
@rb-determined-ai
rb-determined-ai / managed_training.md
Last active January 27, 2023 23:57
Managed Training vs Normal Training

Old Worldview

There are three types of training:

  • Cluster Training (experiments and trials)
  • Detached Training (train a model on your laptop but report to master)
  • Local Training (train a model on your laptop)

We've always had Cluster Training, and we'd like to also support Local Training.

@rb-determined-ai
rb-determined-ai / README.md
Last active December 8, 2022 23:18
Migrate from data layer to yogadl

Migration Plan: data layer -> yogadl

Determined's data layer feature has been deprecated in 0.18.0 (May 2022).

The good news is that the underlying yogadl project may still be used directly. yogadl is owned by Determined and is not under active development... but what it does today is well-defined and it is expected to continue to work until the underlying tensorflow interfaces break.

Migration steps: