Rasppi-search

This may seem like a simple task given the "Google" world we find ourselves living in, but given a initial criteria for this deployment of the Raspberry Pi was lack of network connectivity, off-loading the search function or even the search index was not useful. Thus a local search of local content using local machine resources was required.

On linux, a natural candidate for searching file content for a specific text-string is the linux command "grep". For the OER collection, searching all files (movies, programs, etc.) is not necessary and reduces the consumption of system resources and the search-time latency. The "grep" command does not appear to easily support a "ignore these type of files" capability. In exploring other capabilities, I choose "ack-grep" as the search tool.

ack-grep ack-grep versus grep ack-grep documentation

There were 3 file types (thanks to Kathleen for pointing this out!) that were not successfully searched by either ack-grep or grep


 * .docx
 * .pptx
 * .pdf

These files are stored in a compressed mode, thus a normal "search file for text-string" failed. At this point I was motivated to revisit my exploration of the Apache "solr" and "lucene" capability. Given that the search functions require a Java environment requiring a large memory and CPU commitment that was incompatiable with the Raspberry Pi environment, I went back to exploring other options. I discovered several "Search PDF tools" and after performance comparisons I chose to use pdfgrep. For reading *.docx and *.pptx files, I found some suggestive code here, that I altered to fit the Raspberry Pi environment, which is a PHP script named look.php.