This technical design provides a comprehensive overview on the types and functionality necessary to correlate data gathered by the parsers
run over a variable number of contentDirectories
into a data structure that can both guide the transformations in the generate
phase as well be persisted across NPMify
runs.
- Introduction
- Table of Contents
- Persistent State
- Parsed Data
- Analysis
The information that NPMify
persists across runs is a summary of the packages
that have been transformed and what identifiers those packages
export from their entrypoints. The structure must be conducive to serialization (via JSON, preferably).
/**
* What packages have already been transformed? What do those packages export?
*/
interface IPersistentState {
transformed: string[];
exports: {
[identifierName: string]: {
default: boolean;
packageName: string;
}
};
}
Consider that NPMify
is an iterative process across the framework. It travels through the framework, converting scripts that modify global scope to modules that protect global scope. To remain passive to all artifacts that have not been transformed (at any point in time), NPMify
maintains an adapt.js
module that imports identifiers that were previously globally available and makes them globally available.
Note that information on what is globally available is obscured as we transform modules. adapt.js
could be read to make assumptions about the original state of the global scope. Alernatively, a state object could be maintained that bridges this knowledge gap without developing overly-specific parsing utilities to parse what's already been parsed once before.
// mpage-core.js
window.MP_Util = {};
// persisted-state.json
{
"transformed": [],
"exports": {}
}
// mpage-core.js
let MP_Util = {};
export { MP_Util }
// adapt.js
import {MP_Util} from "mpage-core";
window.MP_Util = MP_Util;
// persisted-state.json
{
"transformed": ["mpage-core"],
"exports": {
"MP_Util": {
"default": false,
"package": "mpage-core"
}
}
}
This section describes a proposed structure of data that should be provided by the Parser
implementations within NPMify
. The goal here is to provide structures that provide context and clarity to the analysis
phase.
Everything starts with an entrypoint. This is a path, represented as a string
, that will serve as the "main"
property value in the package.json
and will contain an ExportNamedDeclaration
that exports all members of a package that are used globally within the codebase.
type entrypoint = string;
The @proteus/plugin-toolkit
already declares a type for IPackageJSON
.
import { PackageJSON } from "@proteus/plugin-toolkit";
type IPackageJSON = PackageJSON.IPackageJSON;
Note the following definitions:
- An
Exposure
is a value that has been exposed. - A
Reference
is the usage (declared but potentially unused) of a value that has been exposed.
Via these definitions, we can create some base types that can be used to generically describe any give-and-take situation within the application. Note that these types do not contain specifics as to what is being referenced or exposed. The specifics are thrown out in favor of the common: location and kind. Specifics can be included in extensions to the base types.
/**
* Where is the Reference or Exposure located?
*/
interface ILocation {
package: string;
file: string;
}
/**
* What are the different ways of referencing an exposed value?
*/
enum ReferenceKind { Import, Window }
/**
* A Reference is the usage (declared but potentially unused) of a value that has been exposed.
*/
interface IReference {
kind: ReferenceKind;
location: ILocation;
}
/**
* What are the different ways of exposing a value?
*/
enum ExposureKind { Export, WindowAssignment }
/**
* An Exposure is a value that has been exposed.
*/
interface IExposure {
kind: ExposureKind;
location: ILocation;
}
ReferenceKind.Window
references are those that request a value from the global scope.
Examples:
window.foo
foo
window["bar"]
bar
When it comes down to brass tax, the defining characteristic of a ReferenceKind.Window
is the identifier. We extend the notion of an IReference
with this information.
/**
* Window References are those that request a value from the global scope.
*/
interface IWindowReference extends IReference {
kind: ReferenceKind.Window;
identifier: string | number;
}
/**
* Is the input [[IReference]] an [[IWindowReference]]?
*/
const isWindowReference = (ref: IReference): ref is IWindowReference =>
(ref.kind === ReferenceKind.Window);
ReferenceKind.Import
references are declarations that request a value from a package or file.
Examples:
import { foo } from "foo";
import * as f from "foo";
import f from "foo";
The defining characteristic of a ReferenceKind.Import
is the entire ImportDeclaration
node. Since it already contains every specifier
and the source
package, we will defer to the @babel/types
type definition.
/**
* Import References are those that request a value from a package or file.
*/
interface IImport extends IReference {
kind: ReferenceKind.Import;
node: ImportDeclaration;
}
/**
* Is the input [[IReference]] an [[IImport]]?
*/
const isImport = (ref: IReference): ref is IImport =>
(ref.kind === ReferenceKind.Import);
ExposureKind.WindowAssignment
Exposures are those that add a value to the global scope.
Examples:
window.foo = 5;
foo = 5;
window['bar"] = 5;
bar = 5;
Much like the Window Reference Kind, the defining characteristic of the ExposureKind.Window
is the identifier that is being assigned to.
/**
* A Window Assignment Exposure is a value that is exposed to the global scope.
*/
interface IWindowAssignment extends IExposure {
kind: ExposureKind.WindowAssignment;
identifier: string | number;
}
/**
* Is the input [[IExposure]] an [[IWindowAssignment]]?
*/
const isWindowAssignment = (exp: IExposure): exp is IWindowAssignment =>
(exp.kind === ExposureKind.WindowAssignment);
ExposureKind.Export
Exposures are export declarations that make a value within a module accessible to other modules.
Examples:
export * from "foo";
export { foo } from "foo";
export default foo;
export foo;
export { foo };
The defining characteristic of a ExposureKind.Export
is the entire export node. Exports are more varied than their import counterparts. Likewise in the Import Reference Kind, we will defer to the @babel/types
type definitions.
/**
* An Export Exposure is an export declaration that makes a value within a
* module accessible to other modules.
*/
interface IExport extends IExposure {
kind: ExposureKind.Export;
node: ExportAllDeclaration | ExportDefaultDeclaration | ExportNamedDeclaration;
}
/**
* Is the input [[IExposure]] an [[IExport]]?
*/
const isExport = (exp: IExposure): exp is IExport =>
(exp.kind === ExposureKind.Export);
Externals are a utility to encapsulate globals as modules within a bundle; this allows modules within the bundle to import globals as if they were modules. Aliases are a mechanism for simplifying (sub)paths in the source
of an ImportDeclaration
. Externals and Aliases are both webpack utilities. They allow libraries to give new names to global variables and allow internal modules references without the use of relative file paths. Within NPMify
, we need to analyze internal references as well as global variable usage, necessitating the need to unravel import declarations that use Externals and Aliases to their grounded values.
Externals and Aliases are fairly simple to convey as types. Both can be expressed as objects where the keys are the aliases used within the project and the values are the grounded value.
interface IExternal {
[localExternalName: string]: string;
}
interface IAlias {
[localAliasName: string]: string;
}
The analysis phase is provided the current IPersistentState
. It collects data from every parser for every contentDirectory
, resolves webpack-only patterns, determines dependencies between packages, converts global references and assignments to ImportDeclaration
s and ExportNamedDeclaration
s, and generates a new IPersistentState
for use during the next NPMify
run.
The following subsections each describe the state of our data as it progresses through the analysis
phase of NPMify
. They build sequentially and therefore should be read as such. The proposed types are provided to help clarify the transition from one step to the next.
Collect the parser data into objects. Note that only whitelisted contentDirectories
will need the exports
and out
properties. Collecting these objects into an Array
of IParsedData
instead of a Map
from contentDirectory
names to IParsed
data will help to avoid the duplication of Object.keys(data).map((key) => data[key])
transformations, leaving us free to map
, filter
, and reduce
at our leisure.
/**
* Parsed Data is the set of all data parsed for a `contentDirectory`.
*/
interface IParsedData {
contentDirectory: string;
packageJSON: IPackageJSON;
entrypoint: string;
imports: IImport[];
exports?: IExport[];
in: IWindowReference[];
out?: IWindowAssignment[];
externals: IExternal;
aliases: IAlias;
}
/**
* Map each `contentDirectory` to an [[IParsedData]] instance.
*/
declare function aggregateParsedData(contentDirectories: string[]): IParsedData[];
Resolve Externals and Aliases to reduce the IParsedData
objects to the following structure. Note that both the externals
and aliases
properties are removed in the resulting structure. This is because the information contained within the IExternal
and IAlias
maps can be absorbed into the imports
and in
properties of an IResolvedData
instance.
/**
* Resolved Data is the set of data where:
* * External `ImportDeclarations` have been resolved to `IWindowReference`s
* and `IDestructuredWindowReference`s
* * Aliased `ImportDeclaration`s have been un-aliased
*/
interface IResolvedData {
contentDirectory: string;
packageJSON: IPackageJSON;
entrypoint: string;
imports: IImport[];
exports: IExport[];
in: Array<IWindowReference | IDestructuredWindowReference>;
out?: IWindowAssignment[];
}
/**
* Map an array of [[IParsedData]] instances to an array of [[IResolvedData]] instances.
*/
declare function resolveAliasesAndExternals(parsed: IParsedData[]): IResolvedData[];
Resolving an Alias is a matter of finding and replacing keys on the left-hand side of an IAlias
map with the values on the right-hand side.
For example, consider the ImportDeclaration
import { foo } from "component/util";
when the IAlias
map contains { "component": "./src/main/resources/js" }
. The alias within the ImportDeclaration
node is resolved to import { foo } from "./src/main/resources/js/util"
.
/**
* Resolve `ImportDeclarations` with aliased sources in the [[IResolvedData]] `imports`
* [[IImport]] array to un-aliased `ImportDeclaration`s.
*/
declare function resolveAliases(aliases: IAlias, imports: IImport[], data: IResolvedData): IResolvedData;
Resolving an external consists of removing ImportDeclaration
s that reference Externals and replacing them with window
references. This can be done because Externals are syntactic sugar around global variables.
For example, consider the ImportDeclaration
import MP_Util from "mpUtil"
when the IExternal
map contains { "mpUtil": "MP_Util" }
. The IImport
instance within the imports
property should be removed and an IWindowReference
instance, such as { kind: ReferenceKind.Window, identifier: "MP_Util" }
, should be added to the in
property to denote that a global variable is referenced.
Since External import declarations can be partially destructured (i.e. import {foo} from "mpUtil";
), we want to ensure that we keep this information. IWindowReference
s, by themselves, don't have knowledge of any internal references, so we introduce a new ReferenceKind
and interface
, IDestructuredWindowReference
, to maintain this data.
/**
* What are the different ways of referencing an exposed value?
*/
enum ReferenceKind { DestructuredWindow, Import, Inline, Window }
/**
* Destructured Window References are those that were expressed as an `ImportDeclaration`s
* with `ImportSpecifiers`.
*/
interface IDestructuredWindowReference extends IReference {
kind: ReferenceKind.DestructuredWindow;
identifier: string;
specifiers: string[];
}
/**
* Resolve `ImportDeclarations` with external sources to [[IWindowReference]] or
* [[IDestructuredWindowReference]] instances.
*/
declare function resolveExternals(externals: IExternal, imports: IImport[], data: IResolvedData): IResolvedData;
Inject IPersistentState
into the IResolvedData
instance. Each identifierName
listed within the IPersistentState["exports"]
property should be added to the out
property within the IResolvedData
with a matching package
name.
As a side note, to ensure O(n)
over O(n^2)
complexity, first organize the identifiers
in IPersistentState
by packages.
/**
* [[IPersistentState]] exports organized by package name.
*/
interface IExportsByPackage {
[packageName: string]: Array<{
default: boolean;
identifier: string;
}>;
}
/**
* Transform an [[IPersistentState]] instance to an [[IExportsByPackage]] instance,
* where the data is organized by packages.
*/
declare function organizeStateByPackage(state: IPersistentState): IExportsByPackage;
/**
* Inject exported identifiers from already-transformed packages into the `IResolvedData`
* `out` properties.
*/
declare function injectPersistentState(state: IPersistentState, data: IResolvedData[]): IResolvedData;
By this step, each IResolvedData
instance contains the complete global state of the contentDirectory
it describes. Externals (import aliases for global references) have been transformed into IWindowReference
or IDestructuredWindowReferences
and the exported members of already-transformed packages have been injected as out
s into their respective IResolvedData
instances. We are ready to correlate and determine dependencies.
The dependency graph will be split into three classes: CodeBase
, Package
, and Identifier
. A CodeBase
contains zero or more Package
s. A Package
contains zero or more Identifier
s. An Identifier
maintains a list of Package
s that reference it as well as the Package
that owns it. This data class is meant to be lightweight and, as such, only methods that modify the data itself should be implemented. Aggregation, predicates, etc, should be implemented as standalone functions.
/**
* A data class that contains zero or more [[IPackage]]s.
*/
interface ICodeBase {
packages: IPackage[];
add(pkg: IPackage): ICodeBase;
}
/**
* Locate an [[Identifier]] within an [[ICodeBase]]
*/
declare function findIdentifier(identifierName: string, codeBase: ICodeBase): undefined | Identifier;
/**
* A data class that contains zero or more exposures and references.
*/
interface IPackage {
name: string;
version: string;
exposures: IIdentifier[];
references: IIdentifier[];
expose(identifier: IIdentifier): IPackage;
reference(identifier: IIdentifier): IPackage;
}
/**
* Aggregate dependencies of an [[IPackage]]. Dependencies are [[IPackage]] instances
* that reference [[IIdentifier]]s of the current [[IPackage]].
*/
declare function dependencies(pkg: IPackage): IPackage[];
/**
* A data class that contains referencing [[IPackage]] instances.
*/
interface IIdentifier {
file: string;
name: string;
package: IPackage;
references: IPackage[];
referencedBy(referencingPackage: IPackage): IIdentifier;
}
/**
* Are there any reference to an [[IIdentifier]] from outside the package the
* [[IIdentifier]] is contained within?
*/
declare function isGlobal(identifier: IIdentifier): boolean;
Populating the dependency graph consists of registering globally exposed identifier information and then registering global references that match those identifiers that were globally exposed.
For the following pseudo code, assume access to the following:
deps: CodeBase = new CodeBase()
data: IResolvedData = [...]
For every IResolvedData
instance, ensure that a Package
instance is registered within the codebase. If the IResolvedData
instance includes a non-empty out
property, create a new Identifier
for each and expose them within the Package
instance.
for (d in data)
pkg = deps.find(d.packageJSON.name);
if (!pkg)
pkg = new Package({ name: d.packageJSON.name, version: d.packageJSON.version })
deps.add(pkg)
if (d.out && d.out.length)
for (o in d.out)
pkg.expose(new Identifier({
name: o.name,
file: o.location.file,
package: pkg
}))
For every IResolvedData
instance, locate the Package
instance registered within the CodeBase
. If the IResolvedData
instance includes a non-empty in
property, attempt to find the identifier referenced. When the identifier belongs to one of the contentDirectories
that are being (or have already been) transformed expect an Identifier
instance, else expect an undefined
value. If the Identifier
is located, register a reference to the current Package
instance.
for (d in data)
pkg = deps.find(d.packageJSON.name);
if (d.in && d.in.length)
for (i in d.in)
identifier = findIdentifier(deps, i.name)
if (identifier)
pkg.reference(identifier)
Our dependency graph is populated; it's time to apply the information within the CodeBase
instance by creating AST nodes (ImportDeclaration
s and ExportNamedDeclaration
s) and noting package dependencies (with versions).
Before starting to generate node, ensure that the array of IResolvedData
is filtered to include only contentDirectories
that are either currently being transformed or have already been transformed. This stage marks the first stage in which data is being prepared directly for the generation phase; it's pointless to generate AST nodes for contentDirectories
that are not being transformed.
After filtering the set of IResolvedData
instances, map each IResolvedData
instance to a IGeneratedNodeData
instance. Note the differences:
- A
dependencies
property holds new package dependencies. - A
destructuringAssignments
property holds nodes generated fromIDestructuredWindowReference
objects. IDestructuredWindowReference
objects are removed from thein
property.- The
out
property is removed.
/**
* An [`AssignmentExpression`](https://babeljs.io/docs/en/babel-types#assignmentexpression)
* that destructures a right expression value into a left `LVal` via the `"="` operator.
*/
interface IDestructuringAssignment {
location: ILocation;
node: AssignmentExpression;
}
/**
* `package.json` dependency information
*/
interface IDependency {
name: string;
version: string;
}
/**
* Generated Node Data is the set of data where:
* * Import Declaration nodes are generated for all Window References registered in the dependency graph.
* * Dependencies are inferred from the dependency graph
* * Destructuring Assignment nodes are generated for Destructuring Window References.
* * Export Named Declaration nodes are generated for Window Assignments
*/
interface IGeneratedNodeData {
contentDirectory: string;
packageJSON: IPackageJSON;
dependencies: IDependency[];
entrypoint: string;
imports: IImport[];
destructuringAssignments: IDestructuringAssignment[];
exports: IExport[];
in: IWindowReference[];
out?: IWindowAssignment[];
}
/**
* Map an [[IResolvedData]] instance to an [[IGeneratedNodeData]] instance
*/
declare function generateNodes(data: IResolvedData[], graph: CodeBase):
IGeneratedNodeData[];
IWindowReference
objects with Identifier
s registered within the dependency graph will be converted to IImport
objects and added to the imports
property in the IGeneratedNodeData
instance (those not in the dependency graph will persist in the in
property).
When the reference is made within the same package as the Exposure, the relative path between the two files can be determined by comparing the file
property in the Identifier
instance with the location.file
property in the IWindowReference
instance. When the reference is made outside the package where the Exposure is located, the Package
's name
and version
will be used to create an IDependency
instance that will be added to the dependencies
property in the IGeneratedNodeData
.
The creation of IImport
instances from IDestructuredWindowReference
instances is the same as IWindowReferences
. However, IDestructuredWindowReferences
are different from IWindowReferences
in that the listed specifiers must be destructured regardless of whether the identifier
is matched to an Exposure in the dependency graph or not. Remember, IDestructureWindowReference
s are derived from Externals (see Resolving Externals), so the use of an AssignmentExpression
to replace the ImportDeclaration
referencing an External helps to facilitate the removal of Externals from the packages being transformed.
For IDestructuredWindowReference
instances with Identifier
s registered within the dependency graph, an IDestructuringAssignment
will be created where the node
's (AssignmentExpression
) right-hand value will be the identifer imported.
To illustrate, consider the final state of the AST after generation has occurred when foo
is exported by package bar
:
import { foo } from "bar";
const { baz } = foo;
For IDestructureWindowReference
instances without Identifier
s registered within the dependency graph, an IDestructuringAssignment
will be created where the node
's (AssignmentExpression
) right-hand value will be the global reference.
To illustrate, consider the final state of the AST after generation has occurred when foo
is a globally scoped value:
const { baz } = window.foo;
IWindowAssignment
instances will be converted to ExportNamedDeclaration
and added to the IGeneratedNodeData
's exports
property.
In this last step, we generate the final analysis object and resolve the analysis phase.
interface IAnalysis {
codeGeneration: ICodeGenerationData[];
state: IPersistentState;
}
Reduce the set of IGeneratedNodeData
into a structure that contains the set of nodes and dependencies that the generate
phase will enforce. Largely, the goal of this step is to restructure the data so that it's easily consumable by the generate
phase. However, since we're making a pass over all the data it also provides an opportunity for optimization.
Consider the following type:
/**
* The set of data that is directly consumable by the `generate` phase.
*/
interface ICodeGenerationData {
[contentDirectory: string]: {
entrypoint: string;
dependencies: {
[dependencyName: string]: string;
};
files: {
[fileName: string]: {
imports: ImportDeclaration[];
destructuringAssignments: AssignmentExpression[];
exports: ExportNamedDeclaration[];
remainingGlobalReferences: IWindowReference[];
}
};
};
}
At the top level, data is organized by contentDirectory
. Each contentDirectory
contains an entrypoint
, a dependencies
object, and a files
object containing a map from file names to the AST structure that the generate
phase will enforce. Note that IImports
, IDestructuredAssignment
, and IExports
have been simplified to their underlying node
types. We can do this because that information is conveyed via the structure of ICodeGenerationData
; no data has been lost. Each file also contains remainingGlobalReferences
; these are included so that the generate
phase knows to skip a global reference when that reference is exposed in a contentDirectory
that has not been transformed.
All exports for a given file can be combined into a single ExportNamedDeclaration
with the exception of existing ExportNamedDeclaration
s with sources (i.e. export {foo} from "./bar"
). When we move exports from IGeneratedNodeData
to ICodeGenerationData
, there's a good opportunity to reduce these export nodes without sources into a single ExportNamedDeclaration
.
When creating the ICodeGenerateData
, ensure that a file object is created that matches the entrypoint
path. By locating the IPackage
within the CodeBase
, the Exposures
that are globally referenced provides the set of identifier
s that need to be exported by the entrypoint
file.