Paperless Office using the Raspberry Pi

By Leo Gaggl

March 23, 2015

This is a follow-up on an older blog using Ubuntu.

[![r by rosmary, on Flickr Creative Commons Creative Commons Attribution 2.0 Generic License](https://farm6.static.flickr.com/5290/5328014910_0b3bdd6718.jpg https://farm6.static.flickr.com/5290/5328014910_0b3bdd6718.jpg “r by rosmary, on Flickr Creative Commons Creative Commons Attribution 2.0 Generic License”)](https://farm6.static.flickr.com/5290/5328014910_0b3bdd6718.jpg “r by rosmary, on Flickr” https://www.flickr.com/photos/rvoegtli/5328014910/ http://i.creativecommons.org/l/by/2.0/80x15.png “Creative Commons Creative Commons Attribution 2.0 Generic License” http://creativecommons.org/licenses/by/2.0/)

Creative Commons Creative Commons Attribution 2.0 Generic License )](http://creativecommons.org/licenses/by/2.0/) by

Raspberry Pi Prerequisites

Since this will be a purely headless install designed to sit in a corner behind the scanner I am using a Base Raspian (Debian Wheezy) install (I personally like the clean minimal install via https://github.com/debian-pi/raspbian-ua-netinst the best).

apt-get install sudo vim wget wput libusb-dev build-essential git-core

Add non-privileged user account(s)

adduser USERNAME adduser USERNAME sudo groupadd scanner usermod -a -G scanner USERNAME

Install Sane

The version of sane from the Raspbian repos is not working with the Fujitsu ScanJet range and needs to be built from source.

git clone git://git.debian.org/sane/sane-backends.git cd sane-backends BACKENDS=epjitsu ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var make make install

Install S1300i Driver

You need to get the driver file (‘1300i_0D12.nal’) from the CD that came with the scanner. If you still have access to a CDROM drive that is. :(

mkdir -p /usr/share/sane/epjitsu/ cp 1300i_0D12.nal /usr/share/sane/epjitsu/

Check /etc/sane.d/epjitsu.conf and see if the following line is there (in my case it was already created by sane build).

# Fujitsu S1300i firmware /usr/share/sane/epjitsu/1300i_0D12.nal usb 0x04c5 0x128d

sane-find-scanner -q

found USB scanner (vendor=0x04c5 [FUJITSU], product=0x128d [ScanSnap S1300i]) at libusb:001:004
found USB scanner (vendor=0x0424, product=0xec00) at libusb:001:003

scanimage -L

device `epjitsu:libusb:001:004′ is a FUJITSU ScanSnap S1300i scanner

Copy libsane rules from the sane build directory to udev rules.
sudo cp sane-backends/tools/udev/libsane.rules /etc/udev/rules.d/60-libsane.rules

Logout and log in a the non-privileged user account previously created.

If the scanimage -L command works as above you have fully configured the scanner to work under that user account.

Start saned on boot-up

Edit the /etc/rc.local file and add the following line before the ‘0’ line to ensure saned is running as the non-privileged user when you have to reboot.

saned -a USERNAME

Installing Conversion Tools

sudo apt-get install imagemagick bc exactimage pdftk tesseract-ocr tesseract-ocr-eng unpaper

You can add other languages such as tesseract-ocr-deu if you require OCR support for those.

Scan to Repository Script

The script is hosted on Github: https://github.com/leogaggl/misc-scripts/blob/master/scan2repo.sh

#!/bin/bash # Thanks to Andreas Gohr (http://www.splitbrain.org/) for the initial work # https://github.com/splitbrain/paper-backup/ OUT_DIR=~/scan TMP_DIR=mktemp -d FILE_NAME=scan_date +%Y%m%d-%H%M%S LANGUAGE="eng" echo 'scanning...' scanimage --resolution 300 \ --batch="$TMP_DIR/scan_%03d.pnm" \ --format=pnm \ --mode Gray \ --source 'ADF Duplex' echo "Output saved in $TMP_DIR/scan*.pnm" cd $TMP_DIR # cut borders echo 'cutting borders...' for i in scan_*.pnm; do mogrify -shave 50x5 "${i}" done # check if there is blank pages echo 'checking for blank pages...' for f in ./*.pnm; do unpaper --size "a4" --overwrite "$f" echo “$f” | sed ’s/scan/scan_unpaper/g’ #need to rename and delete original since newer versions of unpaper can't use same file name rm -f "$f" done # apply text cleaning and convert to tif echo 'cleaning pages...' for i in scan_*.pnm; do echo "${i}" convert "${i}" -contrast-stretch 1% -level 29%,76% "${i}.tif" done # Starting OCR echo 'doing OCR...' for i in scan_*.pnm.tif; do echo "${i}" tesseract "$i" "$i" -l $LANGUAGE hocr hocr2pdf -i "$i" -s -o "$i.pdf"