Skip to content

Instantly share code, notes, and snippets.

@osv
Created October 16, 2021 12:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save osv/03fe0fb1be0f3561f5ca65cc1382034a to your computer and use it in GitHub Desktop.
Save osv/03fe0fb1be0f3561f5ca65cc1382034a to your computer and use it in GitHub Desktop.
html -> canvast -> tesseract example
// Search table for, caprute rendered value and recognize using teseract. Pretty slow but works.
// https://habr.com/ru/news/t/578832/comments/#comment_23499710
// Run script in dev console for this website: http://www.izbirkom.ru/region/region/karachaev-cherkess?action=show&root=92000011&tvd=4094002721588&vrn=100100225883172&region=9&global=&sub_region=9&prver=0&pronetvd=null&vibid=4094002721588&type=242
(async ()=> {
const { default: capture } = await import( 'https://esm.sh/html2canvas' )
const { default: { recognize } } = await import( 'https://esm.sh/tesseract.js' )
const rows = document.querySelectorAll('.table-responsive tr')
const result = []
for( const row of rows ) {
const source = row.children[2]
const image = await capture( source, { imageTimeout: 1 } )
console.log( `%c `, `font-size:1px;padding: ${image.height/2}px ${image.width/2}px; background: url(${ image.toDataURL() })` )
const { data: { text } } = await recognize( image )
console.log( text )
const values = text.split( /\n/g ).filter( Boolean )
result.push([ row.children[1].textContent, ... values ])
}
console.table( result )
})()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment