Skip to content

Instantly share code, notes, and snippets.

@joshcox
Last active October 24, 2018 15:15
Show Gist options
  • Save joshcox/bb5543143aad9d55a6660c921749c6ff to your computer and use it in GitHub Desktop.
Save joshcox/bb5543143aad9d55a6660c921749c6ff to your computer and use it in GitHub Desktop.

Introduction

This technical design provides a comprehensive overview on the types and functionality necessary to correlate data gathered by the parsers run over a variable number of contentDirectories into a data structure that can both guide the transformations in the generate phase as well be persisted across NPMify runs.

Table of Contents

Persistent State

The information that NPMify persists across runs is a summary of the packages that have been transformed and what identifiers those packages export from their entrypoints. The structure must be conducive to serialization (via JSON, preferably).

 /**
  * What packages have already been transformed? What do those packages export?
  */
interface IPersistentState {
    transformed: string[];
    exports: {
        [identifierName: string]: {
            default: boolean;
            packageName: string;
        }
    };
}

What necessitates the need for a persisted state object?

Consider that NPMify is an iterative process across the framework. It travels through the framework, converting scripts that modify global scope to modules that protect global scope. To remain passive to all artifacts that have not been transformed (at any point in time), NPMify maintains an adapt.js module that imports identifiers that were previously globally available and makes them globally available.

Note that information on what is globally available is obscured as we transform modules. adapt.js could be read to make assumptions about the original state of the global scope. Alernatively, a state object could be maintained that bridges this knowledge gap without developing overly-specific parsing utilities to parse what's already been parsed once before.

Example - How do IPersistentState and adapt.js grow as packages are transformed?

Before

// mpage-core.js
window.MP_Util = {};

// persisted-state.json
{
    "transformed": [],
    "exports": {}
}

After

// mpage-core.js
let MP_Util = {};
export { MP_Util }

// adapt.js
import {MP_Util} from "mpage-core";
window.MP_Util = MP_Util;

// persisted-state.json
{
    "transformed": ["mpage-core"],
    "exports": {
        "MP_Util": {
            "default": false,
            "package": "mpage-core"
        }
    }
}

Parsed Data

This section describes a proposed structure of data that should be provided by the Parser implementations within NPMify. The goal here is to provide structures that provide context and clarity to the analysis phase.

Entrypoint

Everything starts with an entrypoint. This is a path, represented as a string, that will serve as the "main" property value in the package.json and will contain an ExportNamedDeclaration that exports all members of a package that are used globally within the codebase.

type entrypoint = string;

Package JSON

The @proteus/plugin-toolkit already declares a type for IPackageJSON.

import { PackageJSON } from "@proteus/plugin-toolkit";
type IPackageJSON = PackageJSON.IPackageJSON;

References and Exposures

Note the following definitions:

  • An Exposure is a value that has been exposed.
  • A Reference is the usage (declared but potentially unused) of a value that has been exposed.

Via these definitions, we can create some base types that can be used to generically describe any give-and-take situation within the application. Note that these types do not contain specifics as to what is being referenced or exposed. The specifics are thrown out in favor of the common: location and kind. Specifics can be included in extensions to the base types.

/**
 * Where is the Reference or Exposure located?
 */
interface ILocation {
    package: string;
    file: string;
}

/**
 * What are the different ways of referencing an exposed value?
 */
enum ReferenceKind { Import, Window }

/**
 * A Reference is the usage (declared but potentially unused) of a value that has been exposed.
 */
interface IReference {
    kind: ReferenceKind;
    location: ILocation;
}

/**
 * What are the different ways of exposing a value?
 */
enum ExposureKind { Export, WindowAssignment }

/**
 * An Exposure is a value that has been exposed.
 */
interface IExposure {
    kind: ExposureKind;
    location: ILocation;
}

Reference Kinds

Window Reference

ReferenceKind.Window references are those that request a value from the global scope.

Examples:

  • window.foo
  • foo
  • window["bar"]
  • bar

When it comes down to brass tax, the defining characteristic of a ReferenceKind.Window is the identifier. We extend the notion of an IReference with this information.

/**
 * Window References are those that request a value from the global scope.
 */
interface IWindowReference extends IReference {
    kind: ReferenceKind.Window;
    identifier: string | number;
}

/**
 * Is the input [[IReference]] an [[IWindowReference]]?
 */
const isWindowReference = (ref: IReference): ref is IWindowReference =>
    (ref.kind === ReferenceKind.Window);

Import

ReferenceKind.Import references are declarations that request a value from a package or file.

Examples:

  • import { foo } from "foo";
  • import * as f from "foo";
  • import f from "foo";

The defining characteristic of a ReferenceKind.Import is the entire ImportDeclaration node. Since it already contains every specifier and the source package, we will defer to the @babel/types type definition.

/**
 * Import References are those that request a value from a package or file.
 */
interface IImport extends IReference {
    kind: ReferenceKind.Import;
    node: ImportDeclaration;
}

/**
 * Is the input [[IReference]] an [[IImport]]?
 */
const isImport = (ref: IReference): ref is IImport =>
    (ref.kind === ReferenceKind.Import);

Exposure Kinds

Window Assignment

ExposureKind.WindowAssignment Exposures are those that add a value to the global scope.

Examples:

  • window.foo = 5;
  • foo = 5;
  • window['bar"] = 5;
  • bar = 5;

Much like the Window Reference Kind, the defining characteristic of the ExposureKind.Window is the identifier that is being assigned to.

/**
 * A Window Assignment Exposure is a value that is exposed to the global scope.
 */
interface IWindowAssignment extends IExposure {
    kind: ExposureKind.WindowAssignment;
    identifier: string | number;
}

/**
 * Is the input [[IExposure]] an [[IWindowAssignment]]?
 */
const isWindowAssignment = (exp: IExposure): exp is IWindowAssignment =>
    (exp.kind === ExposureKind.WindowAssignment);

Export

ExposureKind.Export Exposures are export declarations that make a value within a module accessible to other modules.

Examples:

  • export * from "foo";
  • export { foo } from "foo";
  • export default foo;
  • export foo;
  • export { foo };

The defining characteristic of a ExposureKind.Export is the entire export node. Exports are more varied than their import counterparts. Likewise in the Import Reference Kind, we will defer to the @babel/types type definitions.

/**
 * An Export Exposure is an export declaration that makes a value within a
 * module accessible to other modules.
 */
interface IExport extends IExposure {
    kind: ExposureKind.Export;
    node: ExportAllDeclaration | ExportDefaultDeclaration | ExportNamedDeclaration;
}

/**
 * Is the input [[IExposure]] an [[IExport]]?
 */
const isExport = (exp: IExposure): exp is IExport =>
    (exp.kind === ExposureKind.Export);

Externals and Aliases

Externals are a utility to encapsulate globals as modules within a bundle; this allows modules within the bundle to import globals as if they were modules. Aliases are a mechanism for simplifying (sub)paths in the source of an ImportDeclaration. Externals and Aliases are both webpack utilities. They allow libraries to give new names to global variables and allow internal modules references without the use of relative file paths. Within NPMify, we need to analyze internal references as well as global variable usage, necessitating the need to unravel import declarations that use Externals and Aliases to their grounded values.

Externals and Aliases are fairly simple to convey as types. Both can be expressed as objects where the keys are the aliases used within the project and the values are the grounded value.

interface IExternal {
    [localExternalName: string]: string;
}

interface IAlias {
    [localAliasName: string]: string;
}

Analysis

The analysis phase is provided the current IPersistentState. It collects data from every parser for every contentDirectory, resolves webpack-only patterns, determines dependencies between packages, converts global references and assignments to ImportDeclarations and ExportNamedDeclarations, and generates a new IPersistentState for use during the next NPMify run.

The following subsections each describe the state of our data as it progresses through the analysis phase of NPMify. They build sequentially and therefore should be read as such. The proposed types are provided to help clarify the transition from one step to the next.

Aggregate Parser Data

Collect the parser data into objects. Note that only whitelisted contentDirectories will need the exports and out properties. Collecting these objects into an Array of IParsedData instead of a Map from contentDirectory names to IParsed data will help to avoid the duplication of Object.keys(data).map((key) => data[key]) transformations, leaving us free to map, filter, and reduce at our leisure.

/**
 * Parsed Data is the set of all data parsed for a `contentDirectory`.
 */
interface IParsedData {
    contentDirectory: string;
    packageJSON: IPackageJSON;
    entrypoint: string;
    imports: IImport[];
    exports?: IExport[];
    in: IWindowReference[];
    out?: IWindowAssignment[];
    externals: IExternal;
    aliases: IAlias;
}

/**
 * Map each `contentDirectory` to an [[IParsedData]] instance.
 */
declare function aggregateParsedData(contentDirectories: string[]): IParsedData[];

Resolve Externals and Aliases

Resolve Externals and Aliases to reduce the IParsedData objects to the following structure. Note that both the externals and aliases properties are removed in the resulting structure. This is because the information contained within the IExternal and IAlias maps can be absorbed into the imports and in properties of an IResolvedData instance.

/**
 * Resolved Data is the set of data where:
 * * External `ImportDeclarations` have been resolved to `IWindowReference`s
 * and `IDestructuredWindowReference`s
 * * Aliased `ImportDeclaration`s have been un-aliased
 */
interface IResolvedData {
    contentDirectory: string;
    packageJSON: IPackageJSON;
    entrypoint: string;
    imports: IImport[];
    exports: IExport[];
    in: Array<IWindowReference | IDestructuredWindowReference>;
    out?: IWindowAssignment[];
}

/**
 * Map an array of [[IParsedData]] instances to an array of [[IResolvedData]] instances.
 */
declare function resolveAliasesAndExternals(parsed: IParsedData[]): IResolvedData[];

Resolving Aliases

Resolving an Alias is a matter of finding and replacing keys on the left-hand side of an IAlias map with the values on the right-hand side.

For example, consider the ImportDeclaration import { foo } from "component/util"; when the IAlias map contains { "component": "./src/main/resources/js" }. The alias within the ImportDeclaration node is resolved to import { foo } from "./src/main/resources/js/util".

/**
 * Resolve `ImportDeclarations` with aliased sources in the [[IResolvedData]] `imports`
 * [[IImport]] array to un-aliased `ImportDeclaration`s.
 */
declare function resolveAliases(aliases: IAlias, imports: IImport[], data: IResolvedData): IResolvedData;

Resolving Externals

Resolving an external consists of removing ImportDeclarations that reference Externals and replacing them with window references. This can be done because Externals are syntactic sugar around global variables.

For example, consider the ImportDeclaration import MP_Util from "mpUtil" when the IExternal map contains { "mpUtil": "MP_Util" }. The IImport instance within the imports property should be removed and an IWindowReference instance, such as { kind: ReferenceKind.Window, identifier: "MP_Util" }, should be added to the in property to denote that a global variable is referenced.

Since External import declarations can be partially destructured (i.e. import {foo} from "mpUtil";), we want to ensure that we keep this information. IWindowReferences, by themselves, don't have knowledge of any internal references, so we introduce a new ReferenceKind and interface, IDestructuredWindowReference, to maintain this data.

/**
 * What are the different ways of referencing an exposed value?
 */
enum ReferenceKind { DestructuredWindow, Import, Inline, Window }

/**
 * Destructured Window References are those that were expressed as an `ImportDeclaration`s
 * with `ImportSpecifiers`.
 */
interface IDestructuredWindowReference extends IReference {
    kind: ReferenceKind.DestructuredWindow;
    identifier: string;
    specifiers: string[];
}

/**
 * Resolve `ImportDeclarations` with external sources to [[IWindowReference]] or
 * [[IDestructuredWindowReference]] instances.
 */
declare function resolveExternals(externals: IExternal, imports: IImport[], data: IResolvedData): IResolvedData;

Inject Persisted Data

Inject IPersistentState into the IResolvedData instance. Each identifierName listed within the IPersistentState["exports"] property should be added to the out property within the IResolvedData with a matching package name.

As a side note, to ensure O(n) over O(n^2) complexity, first organize the identifiers in IPersistentState by packages.

/**
 * [[IPersistentState]] exports organized by package name.
 */
interface IExportsByPackage {
    [packageName: string]: Array<{
        default: boolean;
        identifier: string;
    }>;
}

/**
 * Transform an [[IPersistentState]] instance to an [[IExportsByPackage]] instance,
 * where the data is organized by packages.
 */
declare function organizeStateByPackage(state: IPersistentState): IExportsByPackage;

/**
 * Inject exported identifiers from already-transformed packages into the `IResolvedData`
 * `out` properties.
 */
declare function injectPersistentState(state: IPersistentState, data: IResolvedData[]): IResolvedData;

Correlate Global References and Exposures

By this step, each IResolvedData instance contains the complete global state of the contentDirectory it describes. Externals (import aliases for global references) have been transformed into IWindowReference or IDestructuredWindowReferences and the exported members of already-transformed packages have been injected as outs into their respective IResolvedData instances. We are ready to correlate and determine dependencies.

Dependency Graph Implementation

The dependency graph will be split into three classes: CodeBase, Package, and Identifier. A CodeBase contains zero or more Packages. A Package contains zero or more Identifiers. An Identifier maintains a list of Packages that reference it as well as the Package that owns it. This data class is meant to be lightweight and, as such, only methods that modify the data itself should be implemented. Aggregation, predicates, etc, should be implemented as standalone functions.

/**
 * A data class that contains zero or more [[IPackage]]s.
 */
interface ICodeBase {
    packages: IPackage[];
    add(pkg: IPackage): ICodeBase;
}

/**
 * Locate an [[Identifier]] within an [[ICodeBase]]
 */
declare function findIdentifier(identifierName: string, codeBase: ICodeBase): undefined | Identifier;

/**
 * A data class that contains zero or more exposures and references.
 */
interface IPackage {
    name: string;
    version: string;
    exposures: IIdentifier[];
    references: IIdentifier[];
    expose(identifier: IIdentifier): IPackage;
    reference(identifier: IIdentifier): IPackage;
}

/**
 * Aggregate dependencies of an [[IPackage]]. Dependencies are [[IPackage]] instances
 * that reference [[IIdentifier]]s of the current [[IPackage]].
 */
declare function dependencies(pkg: IPackage): IPackage[];

/**
 * A data class that contains referencing [[IPackage]] instances.
 */
interface IIdentifier {
    file: string;
    name: string;
    package: IPackage;
    references: IPackage[];
    referencedBy(referencingPackage: IPackage): IIdentifier;
}

/**
 * Are there any reference to an [[IIdentifier]] from outside the package the
 * [[IIdentifier]] is contained within?
 */
declare function isGlobal(identifier: IIdentifier): boolean;

Populate the Dependency Graph

Populating the dependency graph consists of registering globally exposed identifier information and then registering global references that match those identifiers that were globally exposed.

For the following pseudo code, assume access to the following:

deps: CodeBase = new CodeBase()
data: IResolvedData = [...]

Exposing out information

For every IResolvedData instance, ensure that a Package instance is registered within the codebase. If the IResolvedData instance includes a non-empty out property, create a new Identifier for each and expose them within the Package instance.

for (d in data)
    pkg = deps.find(d.packageJSON.name);
    if (!pkg)
        pkg = new Package({ name: d.packageJSON.name, version: d.packageJSON.version })
        deps.add(pkg)
    if (d.out && d.out.length)
        for (o in d.out)
            pkg.expose(new Identifier({
                name: o.name,
                file: o.location.file,
                package: pkg
            }))

Correlating in information to exposed out information

For every IResolvedData instance, locate the Package instance registered within the CodeBase. If the IResolvedData instance includes a non-empty in property, attempt to find the identifier referenced. When the identifier belongs to one of the contentDirectories that are being (or have already been) transformed expect an Identifier instance, else expect an undefined value. If the Identifier is located, register a reference to the current Package instance.

for (d in data)
    pkg = deps.find(d.packageJSON.name);
    if (d.in && d.in.length)
        for (i in d.in)
            identifier = findIdentifier(deps, i.name)
            if (identifier)
                pkg.reference(identifier)

Generate Import and Export Declarations

Our dependency graph is populated; it's time to apply the information within the CodeBase instance by creating AST nodes (ImportDeclarations and ExportNamedDeclarations) and noting package dependencies (with versions).

Before starting to generate node, ensure that the array of IResolvedData is filtered to include only contentDirectories that are either currently being transformed or have already been transformed. This stage marks the first stage in which data is being prepared directly for the generation phase; it's pointless to generate AST nodes for contentDirectories that are not being transformed.

After filtering the set of IResolvedData instances, map each IResolvedData instance to a IGeneratedNodeData instance. Note the differences:

  • A dependencies property holds new package dependencies.
  • A destructuringAssignments property holds nodes generated from IDestructuredWindowReference objects.
  • IDestructuredWindowReference objects are removed from the in property.
  • The out property is removed.
/**
 * An [`AssignmentExpression`](https://babeljs.io/docs/en/babel-types#assignmentexpression)
 * that destructures a right expression value into a left `LVal` via the `"="` operator.
 */
interface IDestructuringAssignment {
    location: ILocation;
    node: AssignmentExpression;
}

/**
 * `package.json` dependency information
 */
interface IDependency {
    name: string;
    version: string;
}

/**
 * Generated Node Data is the set of data where:
 * * Import Declaration nodes are generated for all Window References registered in the dependency graph.
 * * Dependencies are inferred from the dependency graph
 * * Destructuring Assignment nodes are generated for Destructuring Window References.
 * * Export Named Declaration nodes are generated for Window Assignments
 */
interface IGeneratedNodeData {
    contentDirectory: string;
    packageJSON: IPackageJSON;
    dependencies: IDependency[];
    entrypoint: string;
    imports: IImport[];
    destructuringAssignments: IDestructuringAssignment[];
    exports: IExport[];
    in: IWindowReference[];
    out?: IWindowAssignment[];
}

/**
 * Map an [[IResolvedData]] instance to an [[IGeneratedNodeData]] instance
 */
declare function generateNodes(data: IResolvedData[], graph: CodeBase):
    IGeneratedNodeData[];

IWindowReference to ImportDeclaration

IWindowReference objects with Identifiers registered within the dependency graph will be converted to IImport objects and added to the imports property in the IGeneratedNodeData instance (those not in the dependency graph will persist in the in property).

When the reference is made within the same package as the Exposure, the relative path between the two files can be determined by comparing the file property in the Identifier instance with the location.file property in the IWindowReference instance. When the reference is made outside the package where the Exposure is located, the Package's name and version will be used to create an IDependency instance that will be added to the dependencies property in the IGeneratedNodeData.

IDestructuredWindowReference to AssignmentExpression

The creation of IImport instances from IDestructuredWindowReference instances is the same as IWindowReferences. However, IDestructuredWindowReferences are different from IWindowReferences in that the listed specifiers must be destructured regardless of whether the identifier is matched to an Exposure in the dependency graph or not. Remember, IDestructureWindowReferences are derived from Externals (see Resolving Externals), so the use of an AssignmentExpression to replace the ImportDeclaration referencing an External helps to facilitate the removal of Externals from the packages being transformed.

For IDestructuredWindowReference instances with Identifiers registered within the dependency graph, an IDestructuringAssignment will be created where the node's (AssignmentExpression) right-hand value will be the identifer imported.

To illustrate, consider the final state of the AST after generation has occurred when foo is exported by package bar:

import { foo } from "bar";
const { baz } = foo;

For IDestructureWindowReference instances without Identifiers registered within the dependency graph, an IDestructuringAssignment will be created where the node's (AssignmentExpression) right-hand value will be the global reference.

To illustrate, consider the final state of the AST after generation has occurred when foo is a globally scoped value:

const { baz } = window.foo;

IWindowAssignment to ExportNamedDeclaration

IWindowAssignment instances will be converted to ExportNamedDeclaration and added to the IGeneratedNodeData's exports property.

Bundle Analysis

In this last step, we generate the final analysis object and resolve the analysis phase.

interface IAnalysis {
    codeGeneration: ICodeGenerationData[];
    state: IPersistentState;
}

Create Code Generation Data

Reduce the set of IGeneratedNodeData into a structure that contains the set of nodes and dependencies that the generate phase will enforce. Largely, the goal of this step is to restructure the data so that it's easily consumable by the generate phase. However, since we're making a pass over all the data it also provides an opportunity for optimization.

Consider the following type:

/**
 * The set of data that is directly consumable by the `generate` phase.
 */
interface ICodeGenerationData {
    [contentDirectory: string]: {
        entrypoint: string;
        dependencies: {
            [dependencyName: string]: string;
        };
        files: {
            [fileName: string]: {
                imports: ImportDeclaration[];
                destructuringAssignments: AssignmentExpression[];
                exports: ExportNamedDeclaration[];
                remainingGlobalReferences: IWindowReference[];
            }
        };
    };
}

At the top level, data is organized by contentDirectory. Each contentDirectory contains an entrypoint, a dependencies object, and a files object containing a map from file names to the AST structure that the generate phase will enforce. Note that IImports, IDestructuredAssignment, and IExports have been simplified to their underlying node types. We can do this because that information is conveyed via the structure of ICodeGenerationData; no data has been lost. Each file also contains remainingGlobalReferences; these are included so that the generate phase knows to skip a global reference when that reference is exposed in a contentDirectory that has not been transformed.

Consolidate Export Declarations

All exports for a given file can be combined into a single ExportNamedDeclaration with the exception of existing ExportNamedDeclarations with sources (i.e. export {foo} from "./bar"). When we move exports from IGeneratedNodeData to ICodeGenerationData, there's a good opportunity to reduce these export nodes without sources into a single ExportNamedDeclaration.

Entrypoint File and IPersistentData

When creating the ICodeGenerateData, ensure that a file object is created that matches the entrypoint path. By locating the IPackage within the CodeBase, the Exposures that are globally referenced provides the set of identifiers that need to be exported by the entrypoint file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment