-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tesseract.js Bug on IBM i Server #930
Comments
Do you have reason to believe that this issue is specific to IBM i, or is that simply extra context? Can you provide an example repo (or standalone flie) that is sufficient to reproduce this error? As On a completely unrelated note, it is generally inadvisable to create new workers within the function that runs recognition within real applications. The reason is that, in addition to creating overhead every time the function is run, there is no limit to the number of workers that can end up being created. As a result, you either end up running 1 recognition job at a time to be safe (which is slow), or allowing for an unlimited number of jobs to run at the same time, which can crash your application. The recommended approach is to create a scheduler. This allows you to define a fixed number of workers (say, 4) that persist between jobs, and use them to run recognition in parallel. See this guide for an explanation. |
Repository I already tried it on a windows server and it works just fine. Thanks for the advise, yeah I definitely need to correct this out ! |
Several of the dependencies in your repo's import { createWorker } from 'tesseract.js';
(async () => {
const worker = await createWorker('eng');
const ret = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
console.log(ret.data.text);
await worker.terminate();
})(); If the issue is with |
Thanks for confirming. Unless there is something particular about your settings, it sounds like there is indeed some platform-specific issue with IBM i. It looks like the code in question is not originally from either the Tesseract.js or Tesseract.js-core repos, but rather is code that is added by Emscripten, which is the compiler used to go from C/C++ to webassembly. I am currently not sure what is happening here. It appears to be something filesystem-related. Upon a brief search of the emscripten issues I did not see any references to IBM i. |
Ok so there's no solution to it ? |
So is there going to be a new patch soon ? |
It is likely that this can be fixed, however that would require troubleshooting by you or another IBM i user to figure out what the root cause is. I am not able to troubleshoot a platform-specific issue on a proprietary platform that I do not have access to. |
Thank you and by the way there is this pulbic server available to anyone https://pub400.com You can create credentials and test the code directly there ! That's the same working environment as on my machine. Thank you for your help, its much appreciated ! |
Environment
I am using tesseract.js@5.1.0 running on nodejs (v20.11.1)
My issue
So basically I am getting this same error TypeError [Error]: Arguments to path.resolve must be string...Emitted 'error' event on Worker instance at... whenever I am running my application and it hits the tesseract's createWorker instance inside the performOCR function :
performOCR(fs.readFileSync(filePath));
async function performOCR(imageFile) {
const worker = await createWorker("eng"); <--- issue occuring there
const ret = await worker.recognize(imageFile);
await worker.terminate();
console.log(ret.data.text);
}
Package.json
{
"name": "file-reader",
"version": "1.0.0",
"main": "index.js",
"scripts": {
"start": "node backend/index.js",
"dev": "nodemon backend/index.js",
"test": "echo "Error: no test specified" && exit 1"
},
"author": "",
"license": "ISC",
"description": "",
"dependencies": {
"cors": "^2.8.5",
"dotenv": "^16.4.5",
"express": "^4.19.2",
"ibm_db": "^3.2.4",
"idb-pconnector": "^1.1.1",
"multer": "^1.4.5-lts.1",
"mysql2": "^3.10.0",
"pdf-parse": "^1.1.1",
"tesseract.js": "^5.1.0",
"tesseract.js-core": "^5.1.0"
}
}
Specs
I tried to remove and reinstall tesseract.js as well as tesseract.js-core, to use an image path instead of stream but still getting the exact same error. Any help would be much appreciated !
The text was updated successfully, but these errors were encountered: