Ubuntu - paperless office on a budget

By Leo Gaggl

August 19, 2013

Since paper and myself have never gotten on well I have always been dreaming of a paperless office. A while ago I purchased a Fujitsu ScanSnap S1500 scanner for the office. I did this after doing some research on which Automatic Document Feed (ADF) multipage & duplex scanners were both affordable as well as supported on Linux.

[![Paperless office? by Terry Freedman, on Flickr](http://farm9.static.flickr.com/8383/8510145300_155fe844c8.jpg "Paperless office? by Terry Freedman, on Flickr")](http://www.flickr.com/photos/terryfreedman/8510145300/) [![Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 Generic License](http://i.creativecommons.org/l/by-nc-nd/2.0/80x15.png "Creative Commons Attribution-Noncommercial-No Derivative Works 2.0 Generic License")](http://creativecommons.org/licenses/by-nc-nd/2.0/) by [ ](http://www.flickr.com/people/terryfreedman/)[Terry Freedman](http://www.flickr.com/people/terryfreedman/)

It took a while for me to get around to set all of this up, but the result now is that this scanner is connected to a headless Ubuntu VM and the press of the scanner button will:

scan the document
perform OCR to convert to text
combine the text with PDF to create a searchable PDF
OPTIONAL – send the resulting document into Alfresco Document Management Server via FTP

Install dependencies

NOTE: PPA is only required for support of Fujitsu ScanSnap S1500
sudo apt-add-repository ppa:rolfbensch/sane-git sudo apt-get update sudo apt-get install sane sane-utils imagemagick tesseract-ocr pdftk libtiff-tools libsane-extras exactimage wput

Install scanbuttond

Download the “Debian Experimental” package from http://pkgs.org/download/scanbuttond
sudo dpkg -i scanbuttond_0.2.3.cvs20090713-14_i386.deb

This step is only for the Fujitsu ScanSnap support. For other scanners you can probably install from the Ubuntu Repository

Scanner config

vim 40-libsane.rules #add this line ATTRS{idVendor}=="04c5", ATTRS{idProduct}=="11a2", ENV{libsane_matched}="yes"

Permissions

sudo adduser saned scanner

Useful command lines for troubleshooting

Since I had a few trouble getting this scanner to work properly I found the following commands highly useful in locating the issue.
man sane-usb sane-find-scanner scanimage -L dmesg tail /var/log/udev 

NOTE: If you are using a notebook devices be careful as I spent quite a few hours troubleshooting an error when opening the device from saned. It turned out to be that the USB power-management on the Toshiba notebook caused havoc with saned (http://askubuntu.com/questions/55140/error-during-device-i-o-when-using-usb-scanner). Switching to the desktop that is now housing the scanner fixed that problem. Thank you VIRTUALBOX (I ended up setting up a dedicated VM for this task) !

Configure scanbuttond

vim /etc/default/scanbuttond #change this line from no to yes RUN=yes

cd /etc/scanbuttond sudo cp initscanner.sh.example initscanner.sh sudo vim initscanner.sh

Uncomment or copy any scanner init command(s).

sudo cp buttonpressed.sh.example buttonpressed.sh sudo vim buttonpressed.sh

Copy the contents of the scan script below. The script is also hosted on GitHub (https://github.com/leogaggl/misc-scripts/blob/master/buttonpressed.sh)

Scan script

#!/bin/bash OUT_DIR=/output/directory/name TMP_DIR=mktemp -d FILE_NAME=scan_date +%Y%m%d-%H%M%S cd $TMP_DIR echo "################## Scanning ###################" scanimage --resolution 150 --batch=scan_%03d.pnm --format=pnm --mode Gray --device-name "fujitsu:ScanSnap S1500:67953" --source “ADF Duplex” --page-width 210 --page-height 297 --sleeptimer 1 -y 297 -x 210 echo "################## Cleaning ###################" for f in ./*.pnm; do unpaper --size "a4" --overwrite "$f" "$f" done echo "############## Converting to TIF ##############" mogrify -format tif *.pnm echo "################ OCR ################" for f in ./*.tif; do tesseract "$f" "$f" -l eng hocr hocr2pdf -i "$f" -s -o "$f.pdf"

Credits:

A big thank you & hat tip to the following authors of the following pages:

EDIT (2013-09-16): I found this link describing how to remove empty pages: http://philipp.knechtges.com/?p=190 – might have to investigate this when I have some time.