Creating a better revenue model for MBTA

Massachusetts Bay Transportation Authority, a.k.a. MBTA, is the public transit agency operating most transit in the Greater Boston area, including busses, subways, and trains. The MBTA operates with high-level averages of revenue data, but does not have access to a detailed model of fares across different routes, times and dates, modes of transit, passenger profiles, and other characteristics. The goal of this project is to create a more granular cost model using existing passenger transaction data.

Analyzing unfulfilled query data in Tripadvisor

On Tripadvisor, customers can query anything ranging from restaurants, tour guides, flights, and hotel booking. The website suggests related reviews, pictures, ratings and suggestions given by other visitors. The client also supports online booking, allowing customers to easily book their travels with one-click.  Our project is to implement an on-line clustering algorithm for review data written by customers. On some timed-interval basis, we would like to classify new clusters of those reviews into categories and also connect them with sentimental analysis, e.g.

Power of Words: Lyric-based music recommendation

The goal of this project is to leverage the rich content of song lyrics to connect each song with relatable concepts such as moods, occasions, and themes. A direct application of this automatic tagging system would be to produce playlists associated with different emotions or serve specific purposes (after break-up songs, holiday music, party mix, et cetera). An initial target for final product would be a collection of moods and topics that a user can select to retrieve an associated list of songs.

Spotify playlist prediction

Spotify is a music, podcast, and video streaming service with 100 million active users. The company curates playlists that are followed by millions of users. These playlists are created by a combination of algorithmic and human-driven processes. The aim of our project is to make use of machine learning algorithms to improve the effectiveness of algorithmically curated playlists and to analyze what audio features contribute to the popularity of playlists.

Social media engagement for cosmetic brands

Tribe Dynamics is a San Francisco based startup that measures social media engagement for cosmetic brands. Online content creation led by beauty bloggers is one of the key predictors of offline revenue in this industry. This project focuses on investigating how hashtag usage spreads across a social network of instagrammers who post about beauty products. The goal of the project is to model probabilistically each person’s propensity to use a hashtag based on whether their friends also use the hashtag, and to determine the characteristics of a successful marketing campaign using hashtags.

Image emotion classification in social media websites

Legendary Entertainment is a media company that produces blockbuster films. While advertising an upcoming film, they need to know which audience is responding to their ads and how. To this end, they scrape data from social media websites to monitor conversations about their ads, and they adjust their campaigns accordingly to maximise effectiveness. Today, users are increasingly using images to express emotions and feelings on social media.

A/B Testing and Predictive Models

“Amyloid positivity” is a key risk indicator of Alzheimer’s disease. Amyloid status is considered to be positive when Amyloid Beta (A) protein, also referred to as amyloid plaque, is accumulated in the brain with sufficient density to meet a threshold. The goal of this capstone project is to use machine learning and other advanced analytics approaches to construct a model that predicts whether a single individual is amyloid positive or negative. The potential for this project is that your deliverables are integrated into Biogen’s Alzheimer’s treatment pipeline.

Data Collection, Management and Cleaning

The City of Como project is a collaboration with Fluxedo, an Italian startup working in partnership with the municipality of Como to model human dynamic flow in the city.  The overall aim of the project is to integrate multiple and diverse data sources to build a picture of the way people live and move around the city. Using historical telecom and social media data along with other geolocated data, the team will form a coherent picture of the daily movements of different demographic groups throughout Como, dependent on the day, time, and other factors such as weather and events.

Sentiment Analysis and Predictive Models

Moleskin’s philosophy is culture, travel, memory, imagination and personal identity. The goal of this project is to find influencers by looking at users interactions and to target them across the different social platforms. For example, we will look at how people connect in Twitter and create a weighted graph using both following numbers and @mentions. We will look at all platforms and cluster groups of posts by trending topics using LDA. This can be applied to all sources of media. We will then try to identify if trending topics and influencers are common across social platforms.