Honry/how-to-build-onnxruntime-web-webnn.md Secret

## how-to-build-onnxruntime-web-webnn.md

      
    Raw
  

              how-to-build-onnxruntime-web-webnn.md
            
          
    WebNN EP is built base on ONNX Runtime for Web.
Prerequisites

Following https://onnxruntime.ai/docs/build/web.html#prerequisites to install all prerequisites.
Build

WebNN EP depends on JSEP, in order to take the advantage of its jsepWrapAsync and runAsync implementation in pre-jsep.js for enabling the WebNN async API usage in WebNN EP.

Use the following command line to build wasm with SIMD + multi-thread support:


Build Release:


C:\code\onnxruntime>.\build.bat --config Release --build_wasm --enable_wasm_simd --enable_wasm_threads --use_jsep --use_webnn --target onnxruntime_webassembly --skip_tests


Build Debug (optional for debug purpose):


C:\code\onnxruntime>.\build.bat --config Debug --build_wasm --enable_wasm_simd --enable_wasm_threads --use_jsep --use_webnn --target onnxruntime_webassembly --enable_wasm_debug_info --skip_tests


Prepare js/web:


C:\code\onnxruntime>cd js

C:\code\onnxruntime\js>npm ci

C:\code\onnxruntime\js>cd common

C:\code\onnxruntime\js\common>npm ci

C:\code\onnxruntime\js\common>cd ..\web

C:\code\onnxruntime\js\web>npm ci

C:\code\onnxruntime\js\web>npm run pull:wasm


For Release:


C:\code\onnxruntime\js\web>copy /Y ..\..\build\Windows\Release\ort-wasm-simd-threaded.jsep.mjs .\dist\
C:\code\onnxruntime\js\web>copy /Y ..\..\build\Windows\Release\ort-wasm-simd-threaded.jsep.wasm .\dist\


For Debug:


C:\code\onnxruntime\js\web>copy /Y ..\..\build\Windows\Debug\ort-wasm-simd-threaded.jsep.mjs .\dist\
C:\code\onnxruntime\js\web>copy /Y ..\..\build\Windows\Debug\ort-wasm-simd-threaded.jsep.wasm .\dist\


Build Web artifacts:


C:\code\onnxruntime\js\web>npm run build

This generates the final JavaScript bundle files to use. They are under folder <ORT_ROOT>/js/web/dist.
Run npm tests


Run with Chrome Stable (Support WebNN CPU deviceType only at present):


C:\code\onnxruntime\js\web>npm test -- model <path-to-ort-model-test-folder> -b=webnn --wasm-number-threads=1 --debug


Run with Chrome Canary (Support both WebNN CPU and GPU deviceType):

set CHROME_BIN ENV:


C:\code\onnxruntime\js\web>set CHROME_BIN=<path-to-chrome-canary>


Run test:


C:\code\onnxruntime\js\web>npm test -- model <path-to-ort-model-test-folder> -b=webnn --wasm-number-threads=1 --debug

e.g. npm run test -- suite1 -b=webnn --wasm-number-threads 1 --debug --webnn-device-type gpu

Usage

Make sure the files mentioned above are put in dist folder.

Import from HTML's <script> tag:

<script src="./dist/ort.all.min.js"></script>
or
<script src="./dist/ort.all.js"></script>

Import from source code inside <script type="module"> tag (ESM):

<script type="module">
  import * as ort from "ort.all.min.mjs";
</script>
or
<script type="module">
  import * as ort from "ort.all.mjs";
</script>

Import in a CommonJS project (CJS format, resolve from package.json "exports" field):

const ort = require('onnxruntime-web/all');

Import in an ESM project (ESM format, resolve from package.json "exports" field):

import * as ort from 'onnxruntime-web/all';

Use released version:

WebNN EP has been released in onnxruntime-web dev channel started from 1.18.0-dev.20240126, you can either use this version or a newer one.
<script src="https://cdnjs.cloudflare.com/ajax/libs/onnxruntime-web/1.18.0-dev.20240126-fc44f96ad5/ort.all.min.js" integrity="sha512-YZnoZqAi/xvZBkTDkyLGRAcNST4wpq/vtIJ+0NCvC8j0qJ9WyVWNfqSLe6co1VOoKfX+zc415jLZaCQuVu/QqA==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
or
npm i onnxruntime-web@1.18.0-dev.20240126-fc44f96ad5

Set ORT Web ENV:

ort.env.wasm.numThreads = 4; // only available when the server allows SharedArrayBuffer
ort.env.wasm.proxy = false; // true for running behind Web Worker
ort.env.logLevel = 'warning'; // set the severity level for logging. 'verbose' | 'info' | 'warning' | 'error' | 'fatal'
ort.env.debug = false;  // true for build Wasm with Debug

Create ORT inference session:

const options = {
  executionProviders: [
    {
      name: 'webnn',
      deviceType: 'cpu', // 'cpu', 'gpu' or 'npu'
      powerPreference: "default",
      numThreads: 1, // allows using multi-threads for for WebNN deviceType 'cpu'
    },
  ],
}
// Free dimension override. Only need for the model with dynamic input shape.
options.freeDimensionOverrides = {
  batch_size: 1, // "batch_size", a example of dynamic dimension name in the input shape
};
options.logSeverityLevel = 0; // 0: kVERBOSE|1: kINFO|2: kWARNING|3: kERROR|4: kFATAL
// Check https://github.com/microsoft/onnxruntime/blob/main/js/common/lib/inference-session.ts for more options.

// Create inference session.
const session = await ort.InferenceSession.create(modelPath, options);

Run ORT inference session:

// prepare inputs
const dataA = new Float32Array(12).fill(1);
const dataB = new Float32Array(12).fill(2);
const tensorA = new ort.Tensor('float32', dataA, [3, 4]);
const tensorB = new ort.Tensor('float32', dataB, [4, 3]);

// prepare feeds. use model input names as keys.
const feeds = {
  a: new Tensor('float32', dataA, [3, 4]),
  b: new Tensor('float32', dataB, [4, 3])
};

// feed inputs and run
const results = await session.run(feeds);
Check more examples of API usage from https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js.