A friend of mine was spending about an hour a day performing manual data entry for his daily appointments. His administrative assistant would book appointments using commercial (and rather expensive) software. The software did not provide a straightforward method of exporting the appointment information; however, it did allow the admin assistant to print the appointments as a PDF document. My friend would then read paper-printed PDF documents and the fields into an Excel spreadsheet. This method was error-prone and tedious, and took him considerable time (since he wasn't a fast keyboard user to begin with).
My goal was to automate his workflow, and to produce a .csv file that he could easily incorporate into his database.
Due to the sensitive nature of the system, there were numerous constraints:
- No new software could be installed on his Windows computer
- Creating an intranet server (e.g. REST service) was prohibited (prohibited by the IT department)
- Transferring the data out to an external server was prohibited
- The system had to be simple enough for his administrative assistant to use it. The admin had limited technical experience: for the device to be useful, it had to "just work"
- Any data created / modified could not be saved for confidentiality reasons
My solution was to create a "Raspberry Pi as a Service." The idea was to setup a Raspberry Pi as a USB On-The-Go ethernet (i.e. g_ether) device which would provide a portable web server. The portable server could then run a website that could take the PDF document, extract the necessary information, and then convert it into a .csv. The data would be uploaded to RAM.
As an additional step, I needed a system that did not look complicated. A solution that seemed hard to use would be rapidly abandoned. I wanted the solution to look elegant, with no chance of losing wires.
- Raspbian was flashed using Etcher. My development machine was a Ubuntu 18.04 XPS 13 laptop
- I created a USB On-The-Go
g_ether
device using Raspbian as follows:- add:
dtoverlay=dwc2
on a new line belowconfig.txt
- add:
modules-load=dwc2,g_ether
afterrootwait
incmdline.txt
- add: an empty
ssh
file to/boot
- Configure:
/etc/network/interfaces
to have a static IP forusb0
- add:
- After starting the device, I could SSH into it by calling
ssh pi@raspberrypi.local
on my host computer - After configuring the wireless internet on the Pi, I ran
sudo apt-get install dnsmasq
. The Pi was then setup to be a DHCP server. After rebooting the Pi, my host computer recognized the Pi as an ethernet device and was assigned an IP - I then installed the back-end by running:
sudo apt-get install poppler-utils
(for pdftotext)sudo apt-get install python-pip
pip install flask
andpip install flask-dropzone
- I then modified the
complete-redirect
example of the flask-dropzone project to save the file to the/tmp
folder. This ensured that unplugging the Pi would destroy any uploaded data - The Flask website was configured to launch at boot. I found that easiest solution was by using 'crontab -e' to run
@reboot
- The administrative assistant simply had to open a browser and clicking on a bookmark to access a pre-defined IP/port:
https://192.168.10.1:4200
. The PDF document was dragged into the site. Once uploaded, I ranpdftotext -layout
to convert the PDF to a text file, and the user was then prompted to save the .csv to their computer. - I wrote a small
awk
program to convert the output ofpdftotext
to a .csv file
To solve my aesthetics issue, the Raspberry Pi was then connected to a MakerFocus Pi Zero USB-A Addon Board V1.1. This not only made the device look super cool, but eliminated the need for any wires. The elegant thing about this device is that it doesn't require any soldering!
- The original idea for this project was to use the
g_mass_storage
device rather than theg_ether
device. The idea would be to have the assistant drag-and-drop the PDF document to the device, and for a CSV file to automagically appear. After a few hours of troubleshooting, I identified two problems. First, when the host computer copied data to the Pi mass storage device, Raspbian would not see the data unless the mass storage partition was re-mounted. I wasn't able to come up with an elegant solution short of constantly mounting and re-mounting the device. Second, if the host and Raspbian computers both mounted the mass storage device, the file invariably got corrupted. Theg_ether
approach was a compromise, and in my opinion is less elegant, but it seemed to work well. - I learned to save time by pre-configuring the
wpa_supplicant.conf
file on the host machine, prior to the first boot. But if I specified a static IP using/etc/network/interfaces/
for theiface usb0
device, I found that I also needed to specify theiface wlan0
device in order for the wireless card to work. For security reasons, I disabled thewlan0
device to run as startup (by not typingauto wlan0
) - I tried several methods to get the flask server to run on startup. My first approach was to edit
/etc/rc.local
. My next approach was to create asystemd
service that launched on startup once theusb0
device was loaded. Both of these failed. It turned out the issue was that the command to open the flask server failed when run as root, and only worked when run as the pi user. I suspect this has to do with how I calledpip
during the installation process. The easiest solution was to runcrontab
as thepi
user