Paperless Office using the Raspberry Pi
By Leo Gaggl
This is a follow-up on an older blog using Ubuntu.
Raspberry Pi Prerequisites
Since this will be a purely headless install designed to sit in a corner behind the scanner I am using a Base Raspian (Debian Wheezy) install (I personally like the clean minimal install via https://github.com/debian-pi/raspbian-ua-netinst the best).
apt-get install sudo vim wget wput libusb-dev build-essential git-core
Add non-privileged user account(s)
adduser USERNAME<br></br>adduser USERNAME sudo<br></br>groupadd scanner<br></br>usermod -a -G scanner USERNAME
Install Sane
The version of sane from the Raspbian repos is not working with the Fujitsu ScanJet range and needs to be built from source.
git clone git://git.debian.org/sane/sane-backends.git<br></br>cd sane-backends<br></br>BACKENDS=epjitsu ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var<br></br>make<br></br>make install
Install S1300i Driver
You need to get the driver file (‘1300i_0D12.nal’) from the CD that came with the scanner. If you still have access to a CDROM drive that is. :(
mkdir -p /usr/share/sane/epjitsu/<br></br>cp 1300i_0D12.nal /usr/share/sane/epjitsu/
Check /etc/sane.d/epjitsu.conf and see if the following line is there (in my case it was already created by sane build).
# Fujitsu S1300i<br></br>firmware /usr/share/sane/epjitsu/1300i_0D12.nal<br></br>usb 0x04c5 0x128d
sane-find-scanner -q
found USB scanner (vendor=0x04c5 [FUJITSU], product=0x128d [ScanSnap S1300i]) at libusb:001:004
found USB scanner (vendor=0x0424, product=0xec00) at libusb:001:003
scanimage -L
device `epjitsu:libusb:001:004′ is a FUJITSU ScanSnap S1300i scanner
Copy libsane rules from the sane build directory to udev rules.
sudo cp sane-backends/tools/udev/libsane.rules /etc/udev/rules.d/60-libsane.rules
Logout and log in a the non-privileged user account previously created.
If the scanimage -L command works as above you have fully configured the scanner to work under that user account.
Start saned on boot-up
Edit the /etc/rc.local file and add the following line before the ‘0’ line to ensure saned is running as the non-privileged user when you have to reboot.
saned -a USERNAME
Installing Conversion Tools
sudo apt-get install imagemagick bc exactimage pdftk tesseract-ocr tesseract-ocr-eng unpaper
You can add other languages such as tesseract-ocr-deu if you require OCR support for those.
Scan to Repository Script
The script is hosted on Github: https://github.com/leogaggl/misc-scripts/blob/master/scan2repo.sh
#!/bin/bash<br></br># Thanks to Andreas Gohr (http://www.splitbrain.org/) for the initial work<br></br># https://github.com/splitbrain/paper-backup/<br></br>OUT_DIR=~/scan<br></br>TMP_DIR=
mktemp -d<br></br>FILE_NAME=scan_
date +%Y%m%d-%H%M%S<br></br>LANGUAGE="eng"<br></br>echo 'scanning...'<br></br>scanimage --resolution 300 \<br></br> --batch="$TMP_DIR/scan_%03d.pnm" \<br></br> --format=pnm \<br></br> --mode Gray \<br></br> --source 'ADF Duplex'<br></br>echo "Output saved in $TMP_DIR/scan*.pnm"<br></br>cd $TMP_DIR<br></br># cut borders<br></br>echo 'cutting borders...'<br></br>for i in scan_*.pnm; do<br></br> mogrify -shave 50x5 "${i}"<br></br>done<br></br># check if there is blank pages<br></br>echo 'checking for blank pages...'<br></br>for f in ./*.pnm; do<br></br> unpaper --size "a4" --overwrite "$f"
echo “$f” | sed ’s/scan/scan_unpaper/g’<br></br> #need to rename and delete original since newer versions of unpaper can't use same file name<br></br> rm -f "$f"<br></br>done<br></br># apply text cleaning and convert to tif<br></br>echo 'cleaning pages...'<br></br>for i in scan_*.pnm; do<br></br> echo "${i}"<br></br> convert "${i}" -contrast-stretch 1% -level 29%,76% "${i}.tif"<br></br>done<br></br># Starting OCR<br></br>echo 'doing OCR...'<br></br>for i in scan_*.pnm.tif; do<br></br> echo "${i}"<br></br> tesseract "$i" "$i" -l $LANGUAGE hocr<br></br> hocr2pdf -i "$i" -s -o "$i.pdf"
Thanks go to Andi Gohr @ Splitbrain for the excellent blog that helped me to get over the sane problems and also gave me some ideas to make the scan script better (as unpaper was not doing such a good job): http://www.splitbrain.org/blog/2014-08/23-paper_backup_1_scanner_setup