When using files that should be in an accessible file system, quite often I have been in a situation where
- I couldn't find a document by name but remembered attributes like 'document' (unclear which format) with '> 12 pages' and dates from '2011 - 2015'
- I needed to remove a lot of duplicates for sake of saving disk space but as well for cleaning up directory structures
- searching for 'similar' files (where similiar is to be defined by content type)
Therefore, I initiated a project that consists of three elements
- a database that holds not only file attributes stored in a file system, but additional values such as content type, checksum, type-specific characteristics
- a script to manage that database (import, cleanup, ....)
- a frontend to search for documents of interest
Looking for mad skills in:
patternmatching databases hpc
This project is part of:
Hack Week 17
This project is one of its kind!