Kami
One of the largely overlooked problems in today's technological era is information overload - information has become so pervasive and easy-to-access that our lifestyles demand us to consume increasing amounts of information in a rapidly declining amount of time. With our submission to BridgeHacks, we hoped to alleviate some of this strain by allowing users to more efficiently and rapidly intake and process information - condensing endless filler down to its most important points.
- We first made an API that was served using the Flask framework. Its function was twofold.
- We utilized the PyTesseract OCR engine to read image input and convert it to rawtext.
- We used NLTK's tokenization function to deconstruct the rawtext. By using the TF/IDF algorithms, we were able to weight each clause (indicating how "important" it is to the document as a whole). By setting a threshold for set weight, we were able to condense the input text.
- Our backend API was hosted on AWS, using one of their EC2 machines. We used Gunicorn to convert our development Flask server into a production web server, then served that using nginx.
- Users were able to interact with our API using an array of applications that we made during the competition period.
- We offered native iOS/Android apps by using the Flutter framework. From a user perspective, these apps could take a picture of any document, convert it into rawtext, then summarize that text.
- We also had a web application made with Angular.js.