Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop support for non-SIMD version #945

Open
Balearica opened this issue Aug 25, 2024 · 0 comments
Open

Drop support for non-SIMD version #945

Balearica opened this issue Aug 25, 2024 · 0 comments

Comments

@Balearica
Copy link
Member

Note: As of 8/24 there are no immediate plans to drop the non-SIMD version, this issue is for planning purposes.

The Tesseract.js-core package currently includes 4 different versions of the Tesseract.js WebAssembly build: Legacy+LSTM vs. LSTM-only and SIMD-support vs. non-SIMD support. This causes build times to be long, and has bloated the Tesseract.js-core npm package to ~31MB. While the separate Legacy+LSTM and LSTM-only builds will always need to exist, we will eventually be able to drop support for the non-SIMD version, which will reduce the total number of builds from 4 to 2.

The latest versions of every major browser on every major platform now supports WebAssembly SIMD. Therefore, this is simply a question of waiting for user adoption of the latest browsers/devices to become sufficiently high. There are 2 sources of data that can inform this decision: (1) general stats on browser adoption, and (2) our own stats on Tesseract.js usage from the JSDelivr CDN.

Regarding the former, according to caniuse.com, as of 8/24, 92.23% of users have browsers that support WebAssembly SIMD. However, this is misleadingly low (as it relates to WebAssembly SIMD specifically), as it includes ancient browsers such as Internet Explorer that would not be supported by any version of Tesseract.js. When we use the 96.78% of browsers that support WebAssembly as the denominator, the percentage that support SIMD is 95.3%. The largest group of browsers that supports WebAssembly but not WebAssembly SIMD is Safari for iOS, which accounts for ~2% of total users.

Regarding the second data source, according to JSDelivr, the CDN used by default, in Q2 2024 the most commonly used 2 versions of Tesseract.js-core were 5.0.0 and 4.0.4. These two versions had a combined 11,256,421 hits for the SIMD-supported version, and 86,202 hits for non-SIMD versions, which means the SIMD versions were 99.2% of the total.

I have no plans to drop the non-SIMD versions for now, however we can revisit these stats down the line, and hopefully the number of devices that support WebAssembly but not WebAssembly SIMD drops close to 0%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant