You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When uploading pictures that are either rotated to the left, right og upside down, TesseractJS successfully reports 90, 180 or 270 degrees in most cases. But when reading the text, it only writes "rubbish" like this:
The same image uploaded with correct rotation gives this output, which is great:
FOR VIDEREGÅENDE OPPLÆRING Navn: — Wilhelm Khalid Tjemsland Sletvold Fødselsnummer : 21126423464 har gjennomført opplæring som omfatter i utdanningsprogram for MEMOK ]1---- Medjer og konunuqikasjon, lår Studiespesialisering bestått MEMOK2---- — Medier og kommunikasjon, 2. år Studiespesialisering bestått MEMOK3---- — Medier og kommunikasjon, 3. år Studiespesialisering fullført (...)
To Reproduce
Save the following code in a .html file, and load it in the browser. Upload the image below.
Thanks for making this new issue, and providing a sample document. I was able to replicate this using the provided code and image.
I am confused as to why this is happening, however this appears to be a bug inherited from the main Tesseract codebase rather than something introduced in the Tesseract.js repo. I tested with my local version of the Tesseract CLI, and experienced the same behavior.
Regarding a path forward, we should check for existing issues in the Tesseract GitHub page to see if this has already been reported. I would assume a bug this notable would have already been reported at some point. The possible outcomes as they pertain to Tesseract.js are:
Ideally the issue has already been reported or fixed, or a fix is in process.
The version of Tesseract used by Tesseract.js lags behind the main release by a small amount, so there is a non-zero chance that we can fix by simply updating to the latest version of Tesseract.
If not, then a fix should be developed and contributed to Tesseract.
The scope of Tesseract.js is a JavaScript/WebAssembly port of Tesseract--as a rule, we try to stay as close to that codebase as possible.
I have contributed patches to Tesseract in the past, so may have time to look into this if needed.
If the Tesseract maintainers are (for some unforeseen reason) resistant to patching, then we can consider implementing something within this codebase.
I am not sure what the root cause is, but I tested this image at various angles, and it appears to recognize correctly at 0 degrees and 90 degrees, but incorrectly at 180 and 270 degrees. Therefore, it appears that orientation is sometimes working as intended, which makes this more perplexing.
matsklevstad
changed the title
Detects correct rotating/degress, but fails to read the text
Detects correct rotation/degrees, but fails to read the text
Aug 1, 2024
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
https://cdn.jsdelivr.net/npm/tesseract.js@5/dist/tesseract.min.js
Describe the bug
When uploading pictures that are either rotated to the left, right og upside down, TesseractJS successfully reports 90, 180 or 270 degrees in most cases. But when reading the text, it only writes "rubbish" like this:
The same image uploaded with correct rotation gives this output, which is great:
To Reproduce
Save the following code in a .html file, and load it in the browser. Upload the image below.
Image used:
Expected behavior
Uploading the image gives this:
osdAngle: 270 (degrees)
autoRotateAngle: 0 (degrees)
totalAngle: 270 (degrees)
Which is the correct angle. But should TesseractJS be able to turn the image to 0 degrees and then perform OCR and get a more correct output?
Device Version:
The text was updated successfully, but these errors were encountered: