Our 4th project at Metis is called Fletcher and involves using unsupervised learning and natural language processing (NLP) while also using NoSQL with MongoDB for DBs and different web APIs. It is also optional to continue this project through our final project. So we were challenged to think big for this one!
There was definitely a bit of pressure in choosing this last project knowing that it might end up being my final project! So here is my idea for both Fletcher (the first few steps) and the final project, Kojak (the rest)!
Collect location-based tweets (includes longitude/latitude) from Twitter's streaming API
Use MongoDB to store/query this information.
Build system to classify longitude/latitude into US county. The only real way to do this right now is to use the FCC's API because Google's API is no longer online. So I want to build my own that won't require an API since I will be processing a large volume of tweets and don't want the API to my bottleneck.
Classify tweets by sentiment and topic using several different NLP tools.
Also figure out Tweet Momentum (momenTWEETum!) for each county based on current trends. This can possibly be used as a tool to figure out where cool things are happening in real time.
Use sentiment/topic to predict voting patterns based on tweets and trained with the 2012 election results.
Create an intuitive (and realtime) UI to show ALL THIS using D3.js on a web application hosted on an EC2 instance.
If time allows for it, combine it with my last census data project.
Okay this might sound a bit ambitious, but I think it's definitely in the realm of possibility!
What I Learned Today:
Earlier this year, NASCAR driver Kurt Busch, who has been accused by his ex-girlfriend Patricia Driscoll of domestic assault, testified in a court hearing Tuesday that he believes Driscoll is a trained assassin and that she once showed up wearing a gown covered in blood.