{"id":1179,"date":"2012-05-30T23:57:20","date_gmt":"2012-05-31T03:57:20","guid":{"rendered":"https:\/\/lowtek.ca\/roo\/?p=1179"},"modified":"2012-05-30T23:57:20","modified_gmt":"2012-05-31T03:57:20","slug":"building-pdfs-with-imagemagick","status":"publish","type":"post","link":"https:\/\/lowtek.ca\/roo\/2012\/building-pdfs-with-imagemagick\/","title":{"rendered":"Building PDFs with ImageMagick"},"content":{"rendered":"<p>I&#8217;ve flipped back and forth between reading physical books and <a href=\"http:\/\/en.wikipedia.org\/wiki\/E-book\">eBooks<\/a> over the last couple of years. I&#8217;m currently in an eBook phase, and it may stick this time. <a href=\"https:\/\/twitter.com\/andrew_low\/status\/184446086746877953\">A sale on Kobo<\/a> let me grab a few I had been meaning to read for next to nothing, now that I&#8217;ve bought a few I&#8217;m more likely to buy more.<\/p>\n<p>Sometime you want to move some content into a format that can be easily read using one of the <a href=\"http:\/\/en.wikipedia.org\/wiki\/E-book_reader\">eReaders<\/a>. Let&#8217;s consider two scenarios: a) You have a paper copy of something you want to scan and convert, b) there is a web resource that is formatted as pages but isn&#8217;t in <a href=\"http:\/\/en.wikipedia.org\/wiki\/Pdf\">PDF<\/a> format. Under Ubuntu I like <a href=\"https:\/\/launchpad.net\/simple-scan\">Simple Scan<\/a>, it allows you to easily scan multi-page documents. If dealing with a web resource, a full screen browser window and <a href=\"http:\/\/en.wikipedia.org\/wiki\/Print_screen\">Alt-Print Screen<\/a> will perform a screen capture allowing you to save a series of pages quickly.<\/p>\n<p>Simple Scan will save multiple scanned pages with filenames (Scanned Document-1.jpg) which sort nicely in order of scan. The screen shot utility uses\u00a0filenames in the format \u00a0&#8220;Screenshot at YYYY-MM-DD HH:MM:SS.png&#8221; so again we have perfect alphabetic sorting in the directory. Having the files in the directory in the correct order will be helpful later on.<\/p>\n<p>Now with both scanning and screen capture there will be elements in the image that we want to crop. As we&#8217;re likely dealing with 10&#8217;s of pages, we don&#8217;t want to have to open <a href=\"http:\/\/www.gimp.org\/\">GIMP<\/a> on each of them and edit. Enter <a href=\"http:\/\/www.imagemagick.org\/\">ImageMagick<\/a> &#8211; a command line friendly tool for image processing. My screen resolution is 1680&#215;1050 and the screen shots were all 1680&#215;1026 (due to the Ubuntu desktop title bar). The screen shot contained the browser &#8220;<a href=\"http:\/\/en.wikipedia.org\/wiki\/User_interface_chrome#User_interface_and_interaction_design\">chrome<\/a>&#8221; as well as portions of the page I didn&#8217;t want. Using GIMP I was able to determine the upper left (491&#215;126) and lower right (1170&#215;1026)\u00a0corners of the image, a little math told me the cropped image size was\u00a0679&#215;900. I made a copy of one of the images and called it x.png, this let me experiment to make sure I got it right.<\/p>\n<p><code>$ convert x.png -crop 679x900+491+126 y.png<\/code><\/p>\n<p>Excellent, the resulting y.png file is properly cropped. Now I want to convert all of the files in the directory, and in fact I want to mutate them in place. It turns out <a href=\"http:\/\/www.imagemagick.org\/www\/mogrify.html\">mogrify<\/a> is the the solution:<\/p>\n<p><code>$ mogrify -crop 679x900+491+126 *.png<\/code><\/p>\n<p>This will modify all of the images &#8220;in place&#8221; in the directory I&#8217;m using. For scanned images we have pretty much the same process yet the cropping dimensions will be different.<\/p>\n<p>At this point I jumped the gun and converted all of the files in the directory into a pdf. Here is a screen capture of the PDF viewer showing a simple example to demonstrate the problem:<\/p>\n<p><a href=\"https:\/\/lowtek.ca\/roo\/wp-content\/uploads\/2012\/05\/pdf-example.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1180\" title=\"pdf-example\" src=\"https:\/\/lowtek.ca\/roo\/wp-content\/uploads\/2012\/05\/pdf-example.png\" alt=\"\" width=\"500\" height=\"326\" \/><\/a><\/p>\n<p>So while the cropped .png displays properly with no whitespace around it, the PDF clearly has additional whitespace. The ImageMagick <a href=\"http:\/\/www.imagemagick.org\/www\/identify.html\">identify<\/a> utility helps explain what&#8217;s wrong here:<\/p>\n<p><code>$ identify Screenshot\\ at\\ 2012-05-29\\ 20\\:26\\:25.png<br \/>\nScreenshot at 2012-05-29 20:26:25.png PNG 679x900 1680x1026+491+126 8-bit DirectClass 1.263MB 0.050u 0:00.050<\/code><\/p>\n<p>Ah, so the image still has the original size, but it&#8217;s been cropped to the corrected size. It turns out I want to apply an additional processing step to the images, <a href=\"http:\/\/www.imagemagick.org\/script\/command-line-options.php#repage\">+repage<\/a> (to completely remove\/reset the virtual canvas meta-data from the images)<\/p>\n<p><code>$ mogrify +repage *.png<\/code><\/p>\n<p><code>$ identify Screenshot\\ at\\ 2012-05-29\\ 20\\:26\\:25.png<br \/>\nScreenshot at 2012-05-29 20:26:25.png PNG 679x900 679x900+0+0 8-bit DirectClass 1.263MB 0.050u 0:00.050<\/code><\/p>\n<p>Now I&#8217;m ready to create a PDF file:<\/p>\n<p><code>$ convert *.png book.pdf<\/code><\/p>\n<p>This works like a charm because my files are in the correct order. The resulting PDF size is a little bit bigger than the sum of the individual image files. I did explore ways to reduce this, but all of them resulted in lower quality images in the PDF and that impacted readability.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve flipped back and forth between reading physical books and eBooks over the last couple of years. I&#8217;m currently in an eBook phase, and it may stick this time. A sale on Kobo let me grab a few I had been meaning to read for next to nothing, now that I&#8217;ve bought a few I&#8217;m &hellip; <a href=\"https:\/\/lowtek.ca\/roo\/2012\/building-pdfs-with-imagemagick\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Building PDFs with ImageMagick&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,12],"tags":[],"class_list":["post-1179","post","type-post","status-publish","format-standard","hentry","category-computing","category-how-to"],"_links":{"self":[{"href":"https:\/\/lowtek.ca\/roo\/wp-json\/wp\/v2\/posts\/1179","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lowtek.ca\/roo\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lowtek.ca\/roo\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lowtek.ca\/roo\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lowtek.ca\/roo\/wp-json\/wp\/v2\/comments?post=1179"}],"version-history":[{"count":5,"href":"https:\/\/lowtek.ca\/roo\/wp-json\/wp\/v2\/posts\/1179\/revisions"}],"predecessor-version":[{"id":1185,"href":"https:\/\/lowtek.ca\/roo\/wp-json\/wp\/v2\/posts\/1179\/revisions\/1185"}],"wp:attachment":[{"href":"https:\/\/lowtek.ca\/roo\/wp-json\/wp\/v2\/media?parent=1179"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lowtek.ca\/roo\/wp-json\/wp\/v2\/categories?post=1179"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lowtek.ca\/roo\/wp-json\/wp\/v2\/tags?post=1179"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}