Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppressing "Corrupt JPEG data: 1 extraneous bytes before marker 0xd9" output #929

Closed
matthieuEv opened this issue Jun 3, 2024 · 5 comments

Comments

@matthieuEv
Copy link

Tesseract.js version 5.1.0

Hello,

I am using Tesseract.js for a project and have encountered an issue where the library outputs the following message to the console:

Corrupt JPEG data: 1 extraneous bytes before marker 0xd9

While I understand that this message indicates a problem with the JPEG data, it is not something that affects the functioning of my application. However, the frequent appearance of this message in the console clutter the output and make it difficult to debug other issues.

I would like to know if there is a way to suppress this specific output from being logged to the console. I have attempted to use custom logger functions, but the message still appears. Temporarily overriding console.log, console.warn, and console.error works, but it feels like a hack rather than a solution.

Steps to Reproduce:

  • Use Tesseract.js to recognize text from a JPEG image.
  • Ensure the image has minor corruption that triggers the "Corrupt JPEG data" warning.
  • Observe the console output.

Expected Behavior:
There should be an option to disable or suppress specific log messages, such as the "Corrupt JPEG data" warning.

Actual Behavior:
The message is printed to the console regardless of custom logger settings or other configurations.

Request:
Could you please provide a way to suppress this specific log message, or guide me on how to properly handle it without resorting to overriding global console methods?

Thank you for your assistance!

@ImSpyke
Copy link

ImSpyke commented Jun 3, 2024

I am working on this project with matthieu, dont hesitate to ping me too if you have any solutions ^^

@matthieuEv
Copy link
Author

Any news? :)

@Balearica
Copy link
Member

Please provide an example image that can be used to replicate this issue.

@Balearica
Copy link
Member

I would like to know if there is a way to suppress this specific output from being logged to the console. I have attempted to use custom logger functions, but the message still appears. Temporarily overriding console.log, console.warn, and console.error works, but it feels like a hack rather than a solution.

Since these warning messages are created by Tesseract rather than being something added by Tesseract.js, disabling specific warning messages at the source is outside of the scope of this repo. Indeed, the only way filter messages Tesseract prints to stderr without modifying the Tesseract codebase would be to implement the hacky solutions you describe (e.g. overriding console.log).

However, Tesseract does support saving warning messages as an output, rather than printing them to stderr (which maps to console.warn in JavaScript). Therefore, by redirecting the messages to the output, you should be able to implement something similar to what you are describing. To capture debugging messages in the output object, rather than printing them to console, set the debug to true within the output options.

const worker = await await Tesseract.createWorker("eng");
const ret = await worker.recognize(files[i], undefined, {debug: true});
console.log(ret.data.text);
console.log(`Captured message: ${ret.data.debug}`);
Captured message: Estimating resolution as 390

Once you have the debug output string, you can either ignore it or save it to a string--essentially suppressing all warning messages from the console--or write a wrapper function that prints warnings only when they meet certain criteria.

@matthieuEv
Copy link
Author

Hello, sorry for my inactivity here😅, this method seems to work for me, thanks again 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants