I'm currently testing with Chrome v137.0.7151.69 and macOS 13.0. I'm using [email protected]. I can't seem to run the new v3 model with webgpu onnnx EP without a huge performance degradation compared to the older model.
Code:
https://gist.github.com/mattdesl/30bc5de23eb6edfd7362d91d43170922
(change the "provider" and "model" variables in main.js)
I'm testing three models:
{
// The "new" v3 model
v3: {
image_encoder: "uform3-image-text-english-small/image_encoder.onnx",
text_encoder: "uform3-image-text-english-small/text_encoder.onnx",
},
// The "old" models ...
fp16: {
text_encoder: "uform-vl-english-small-gpu-fp16/text_encoder.onnx",
image_encoder: "uform-vl-english-small-gpu-fp16/image_encoder.onnx",
},
fp32: {
text_encoder: "uform-vl-english-small-cpu-fp32/text_encoder.onnx",
image_encoder: "uform-vl-english-small-cpu-fp32/image_encoder.onnx",
},
}
Using webgpu backend, testing only image encoding / inference time:
v3 ~7000 ms
fp16 ~800 ms
fp32 ~750 ms
The v3 model seems to produce inaccurate/incorrect cosine similarity in webgpu mode.
Using cpu backend:
v3 ~6500 ms
fp16 N/A
fp32 ~7000 ms
I am hoping it's just something I've done wrong that is causing the v3 webgpu to both fail to infer correctly and perform very slowly?
I'm currently testing with Chrome v137.0.7151.69 and macOS 13.0. I'm using
[email protected]. I can't seem to run the new v3 model with webgpu onnnx EP without a huge performance degradation compared to the older model.Code:
https://gist.github.com/mattdesl/30bc5de23eb6edfd7362d91d43170922
(change the "provider" and "model" variables in main.js)
I'm testing three models:
Using webgpu backend, testing only image encoding / inference time:
v3 ~7000 ms
fp16 ~800 ms
fp32 ~750 ms
The v3 model seems to produce inaccurate/incorrect cosine similarity in webgpu mode.
Using cpu backend:
v3 ~6500 ms
fp16 N/A
fp32 ~7000 ms
I am hoping it's just something I've done wrong that is causing the v3 webgpu to both fail to infer correctly and perform very slowly?