-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
worker.recognize fails when run before previous job finishes #875
Comments
Thanks for reporting. I was able to replicate. A minimal reproducible example is below.
This appears to occur when calling Regarding why this would not happen using If you want to run jobs in parallel, the best way to handle this is using schedulers. Schedulers allow for using a defined number of workers in parallel to process jobs. Schedulers are explained here, and an example using schedulers is here. |
Thanks for getting back so quick! Actually I was awaiting the result. I just tried again like this and unfortunately I just keep waiting and waiting for the result of the second image...
I also tried this method and it works fine and is quite fast aswell. But I assume starting a new worker and terminating it is quite inefficient? Even if I terminate it every time? On my machine it works quite fast though:
|
When you use
The reason why this is fast is because your application is running recognition on different workers in parallel--if you have 5 URLs then it is making 5 workers that run at the same time. As noted in my comment above, this is a problematic way to implement parallel processing because (among other things) the number of Tesseract.js workers it creates is undefined--you can end up creating 12 workers at the same time and crashing your application. If you want to run multiple workers at the same time--which is the most efficient way to do things--you should implement a scheduler using the resources I linked in my previous comment. |
The root cause of this bug is that Tesseract.js workers only store one promise for each https://github.com/naptha/tesseract.js/blob/master/src/createWorker.js#L51-L71 This causes the first This should be easily fixable by making the identifier for all promises unique, which can be achieved by appending |
The fix described above has been implemented in the master branch and will be included in the next patch release ( |
Thanks a lot for implementing it that quick! I am using schedulers now. |
Hello!
I have just set up the OCR function and it seems to work if I use this (old deprecated):
const result = await Tesseract.recognize(url, "eng");
But does not work if I use this (new):
const result = await worker.recognize(url);
When I try to run it on a document the "new one" just randomly stops at certain pages / images and doesn't finish without throwing a bug.
See full code below.
As I understood from the migration guide this is the only line I had to change?!
Thanks so much for maintaining this library!
The text was updated successfully, but these errors were encountered: