Paperless Office using the Raspberry Pi
By Leo Gaggl
This is a follow-up on an older blog using Ubuntu.
[](https://farm6.static.flickr.com/5290/5328014910_0b3bdd6718.jpg “r by rosmary, on Flickr” https://www.flickr.com/photos/rvoegtli/5328014910/ http://i.creativecommons.org/l/by/2.0/80x15.png “Creative Commons Creative Commons Attribution 2.0 Generic License” http://creativecommons.org/licenses/by/2.0/)
Creative Commons Creative Commons Attribution 2.0 Generic License )](http://creativecommons.org/licenses/by/2.0/) by
Raspberry Pi Prerequisites
Since this will be a purely headless install designed to sit in a corner behind the scanner I am using a Base Raspian (Debian Wheezy) install (I personally like the clean minimal install via https://github.com/debian-pi/raspbian-ua-netinst the best).
apt-get install sudo vim wget wput libusb-dev build-essential git-core
Add non-privileged user account(s)
adduser USERNAME<br></br>adduser USERNAME sudo<br></br>groupadd scanner<br></br>usermod -a -G scanner USERNAME
Install Sane
The version of sane from the Raspbian repos is not working with the Fujitsu ScanJet range and needs to be built from source.
git clone git://git.debian.org/sane/sane-backends.git<br></br>cd sane-backends<br></br>BACKENDS=epjitsu ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var<br></br>make<br></br>make install
Install S1300i Driver
You need to get the driver file (‘1300i_0D12.nal’) from the CD that came with the scanner. If you still have access to a CDROM drive that is. :(
mkdir -p /usr/share/sane/epjitsu/<br></br>cp 1300i_0D12.nal /usr/share/sane/epjitsu/
Check /etc/sane.d/epjitsu.conf and see if the following line is there (in my case it was already created by sane build).
# Fujitsu S1300i<br></br>firmware /usr/share/sane/epjitsu/1300i_0D12.nal<br></br>usb 0x04c5 0x128d
sane-find-scanner -q
found USB scanner (vendor=0x04c5 [FUJITSU], product=0x128d [ScanSnap S1300i]) at libusb:001:004
found USB scanner (vendor=0x0424, product=0xec00) at libusb:001:003
scanimage -L
device `epjitsu:libusb:001:004′ is a FUJITSU ScanSnap S1300i scanner
Copy libsane rules from the sane build directory to udev rules.
sudo cp sane-backends/tools/udev/libsane.rules /etc/udev/rules.d/60-libsane.rules
Logout and log in a the non-privileged user account previously created.
If the scanimage -L command works as above you have fully configured the scanner to work under that user account.
Start saned on boot-up
Edit the /etc/rc.local file and add the following line before the ‘0’ line to ensure saned is running as the non-privileged user when you have to reboot.
saned -a USERNAME
Installing Conversion Tools
sudo apt-get install imagemagick bc exactimage pdftk tesseract-ocr tesseract-ocr-eng unpaper
You can add other languages such as tesseract-ocr-deu if you require OCR support for those.
Scan to Repository Script
The script is hosted on Github: https://github.com/leogaggl/misc-scripts/blob/master/scan2repo.sh
#!/bin/bash<br></br># Thanks to Andreas Gohr (http://www.splitbrain.org/) for the initial work<br></br># https://github.com/splitbrain/paper-backup/<br></br>OUT_DIR=~/scan<br></br>TMP_DIR=
mktemp -d<br></br>FILE_NAME=scan_
date +%Y%m%d-%H%M%S<br></br>LANGUAGE="eng"<br></br>echo 'scanning...'<br></br>scanimage --resolution 300 \<br></br> --batch="$TMP_DIR/scan_%03d.pnm" \<br></br> --format=pnm \<br></br> --mode Gray \<br></br> --source 'ADF Duplex'<br></br>echo "Output saved in $TMP_DIR/scan*.pnm"<br></br>cd $TMP_DIR<br></br># cut borders<br></br>echo 'cutting borders...'<br></br>for i in scan_*.pnm; do<br></br> mogrify -shave 50x5 "${i}"<br></br>done<br></br># check if there is blank pages<br></br>echo 'checking for blank pages...'<br></br>for f in ./*.pnm; do<br></br> unpaper --size "a4" --overwrite "$f"
echo “$f” | sed ’s/scan/scan_unpaper/g’<br></br> #need to rename and delete original since newer versions of unpaper can't use same file name<br></br> rm -f "$f"<br></br>done<br></br># apply text cleaning and convert to tif<br></br>echo 'cleaning pages...'<br></br>for i in scan_*.pnm; do<br></br> echo "${i}"<br></br> convert "${i}" -contrast-stretch 1% -level 29%,76% "${i}.tif"<br></br>done<br></br># Starting OCR<br></br>echo 'doing OCR...'<br></br>for i in scan_*.pnm.tif; do<br></br> echo "${i}"<br></br> tesseract "$i" "$i" -l $LANGUAGE hocr<br></br> hocr2pdf -i "$i" -s -o "$i.pdf"