2017

Creating a better revenue model for MBTA

Massachusetts Bay Transportation Authority, a.k.a. MBTA, is the public transit agency operating most transit in the Greater Boston area, including busses, subways, and trains. The MBTA operates with high-level averages of revenue data, but does not have access to a detailed model of fares across different routes, times and dates, modes of transit, passenger profiles, and other characteristics. The goal of this project is to create a more granular cost model using existing passenger transaction data.

Such a model can be used to analyze bus route...

Read more about Creating a better revenue model for MBTA

Analyzing unfulfilled query data in Tripadvisor

TripAdvisor is one of the largest travel website companies which adopts the user content model with nearly 50 million monthly visitors and millions of business reviews.  We aim to improve the search experience by learning fine-grained information about each business, namely the users' sentiments toward specific entities as expressed in their reviews. Our approach allows us to rank these sentiments and implement a novel search experience where results are sorted by sentimental intensity towards the item of interest. These search results can further be enriched by...

Read more about Analyzing unfulfilled query data in Tripadvisor

Power of Words: Lyric-based music recommendation

The goal of this project is to leverage the rich content of song lyrics to connect each song with relatable concepts such as moods, occasions, and themes. A direct application of this automatic tagging system would be to produce playlists associated with different emotions or serve specific purposes (after break-up songs, holiday music, party mix, et cetera). An initial target for final product would be a collection of moods and topics that a user can select to retrieve an associated list of songs. A more advanced version would allow the user to type in a specific emotion or adjective...

Read more about Power of Words: Lyric-based music recommendation

Spotify playlist prediction

Spotify is a music, podcast, and video streaming service with 100 million active users. The company curates playlists that are followed by millions of users. These playlists are created by a combination of algorithmic and human-driven processes. The aim of our project is to make use of machine learning algorithms to improve the effectiveness of algorithmically curated playlists and to analyze what audio features contribute to the popularity of playlists.

Spotify attempts to direct the most relevant songs to users based on their preferences, moods etc. An enhanced version of...

Read more about Spotify playlist prediction

Social media engagement for cosmetic brands

Tribe Dynamics is a San Francisco based startup that measures social media engagement for cosmetic brands. Online content creation led by beauty bloggers is one of the key predictors of offline revenue in this industry. This project focuses on investigating how hashtag usage spreads across a social network of instagrammers who post about beauty products. The goal of the project is to model probabilistically each person’s propensity to use a hashtag based on whether their friends also use the hashtag, and to determine the characteristics of a successful marketing campaign using hashtags...

Machine learning-assisted medical image annotation

Machine learning has emerged in recent years as a powerful tool for many tasks across a wide number or disciplines. This has held true in biomedical imaging, where machine learning-based technologies have the potential to improve the efficiency and accuracy of imaging specialists by automatically identifying and measuring key findings within image data. Unfortunately, those automatic tools do not exist yet, and manual annotation is the common, time-consuming, standard. The purpose of this project is to develop a medical image annotation tool that...

Read more about Machine learning-assisted medical image annotation

Image emotion classification in social media websites

Automatic image emotion classification is challenging because it requires models capable of recognizing emotion content in images, which can vary substantially. In addition, there was no image dataset with high quality labels large enough for learning these models until 2016. We have designed the system for Emotion Data Management and Analysis (SEDMA) not only for prediction of image emotion but also to actively improve the process of building high quality manually labeled datasets. SEDMA can potentially be used in a wide range of...

Read more about Image emotion classification in social media websites

Predicting Alzheimer's Disease

Alzheimer’s Disease (AD) ravages the cognitive ability of more than 5 million Americans and creates an enormous strain on the health care system. Our research explores prediction of AD without medical imaging, in hopes of earlier and cheaper diagnoses. We construct a classification pipeline which shows greater than 90% accuracy and recall in predicting AD with our best model. This model generalizes well to sub-studies of our main data set, ADNI, as well as another AD dataset, AIBL. We also find that we can get close to 79% accuracy with only one clinical visit of data....

Read more about Predicting Alzheimer's Disease

Data Collection, Management and Cleaning

The City of Como project is a collaboration with Fluxedo, an Italian startup working in partnership with the municipality of Como to model human dynamic flow in the city.  The overall aim of the project is to integrate multiple and diverse data sources to build a picture of the way people live and move around the city. Using historical telecom and social media data along with other geolocated data, the team will form a coherent picture of the daily movements of different demographic groups throughout Como, dependent on the day, time, and other factors such as weather and events...

Read more about Data Collection, Management and Cleaning

Sentiment Analysis and Predictive Models

Moleskine’s philosophy is culture, travel, memory, imagination and personal identity. The goal of this project is to find influencers by looking at users' interactions and to target them across different social platforms. For example, we will look at how people connect in Twitter and create a weighted graph using both following numbers and @mentions. We will look at all platforms and cluster groups of posts by trending topics using LDA. This can be applied to all sources of media. We will then try to identify if trending topics and influencers are common across social platforms...

Read more about Sentiment Analysis and Predictive Models