Student Research Projects

Spring 2017 Projects

A/B Testing and Predictive Models

“Amyloid positivity” is a key risk indicator of Alzheimer’s disease. Amyloid status is considered to be positive when Amyloid Beta (A) protein, also referred to as amyloid plaque, is accumulated in the brain with sufficient density to meet a threshold. The goal of this capstone project is to use machine learning and other advanced analytics approaches to construct a model that predicts whether a single individual is amyloid positive or negative. The potential for this project is that your deliverables are integrated into Biogen’s Alzheimer’s treatment pipeline.

See also: Biogen, 2017

Data Collection, Management and Cleaning

The Como project will focus on the city of Como, a small medieval town beautifully located on Lake Como in Northern Italy, with a large walking area in the downtown district and along the lakeshore. The project consists of collecting and analyzing data about the city and the way people live and move in it by integrating multiple and diverse data sources.

The problems to be addressed are:

  • Providing a reliable estimate of the overall picture of people density
  • Predicting the impact of future events positioned in time and space
  • Given a constrained budget and a cost model for sensors deployment

Sentiment Analysis and Predictive Models

The project will explore the popularity and success of different Moleskine products co-branded with other famous brands (also known as special editions) and launched during specific periods of time. The main field of analysis is measuring the impact of different products on social media channels and correlating that to sales.

See also: Moleskine, 2017

Past Projects

Developing a Pipeline for Identifying Potentially Hazardous Asteroids

Potentially hazardous objects (PHOs) are currently defined based on parameters that measure the object's potential to make threatening and close approaches to the Earth. To be considered a PHO, objects generally have an Earth minimum orbit intersection distance (MOID) of 0.05 AU or less and an absolute magnitude (H) of 22.0 or brighter (a rough indicator of large size). In this project students will develop the full pipeline which includes data management, algorithmic development and probabilistic predictions of impact.

Negotiation Tool for Airbnb

Airbnb is a global marketplace of rentals of apartments that reach 190 countries and 34,000 cities. In Airbnb, citizens insert their rental offers and rent their own apartments to other citizens, thereby defining a parallel market to traditional offers based upon hotels. We propose to integrate data from Airbnb with data from other sources, including open data, census information, real estate, information about the district, about the house interiors, social sources such as Instagram and Twitter, etc., so as to develop a new scoring system for Airbnb offers, similar to the hotel star system.

Restaurant Photo Classification Algorithm and Business Viability Tool for Tripadvisor

TripAdvisor users write reviews and upload photos from their various restaurant visits. These photos can be categorized/analysed so they can reveal information about the restaurant's menu, dishes, pricing, etc. The first step in this analysis is the classification of photos into simple, broad groups: food, drinks, menus, inside and outside photos of the establishment. Students' goal for this project was to build an image classifier using Convolutional Neural Networks and images aquired by the students themselves.

Dynamic Factor Selection for Determining Market Exposure

Market exposure is a key concept in quantitative finance. This is classically measured by estimating a beta coefficient in a linear equation where beta (exposure) expresses the returns of the market. Returns with low exposure to the market are desired, as they are not affected by downturns. This exposure modeling can be generalized to multiple factors and the exposures to factors are used to determine if a strategy or asset is protected enough from changes in certain risk factors, and to purchase hedges that cancel out this risk exposure.

Nester for Design

Nester is a platform where companies can find the best designs for a project, using Kaggle. Kaggle is a platform that hosts machine learning competitions where companies and researchers post data and pose challenges. Data scientists from all over the world compete to answer the questions and to produce the best results, in effect, crowdsourcing the most efficient technique or solution to the questions.

Through Nester, companies post brief design challenges. Designers then propose solutions and vote for other people's projects. Experts refine projects. Companies give feedback refine and select the best ideas. Finally, users pledge for their favorite product and we have a Winner!


Stochastic Query Optimization and Bias Characterization for Large Scale Text Search

Legendary is a leading film production company, with 43 Feature films released, 6 films currently in production and 13 billion box office until 2015. Identifying the correct search terms to find social media posts about an entity or concept is a highly challenging task. For instance, the word Fargo may refer to a place (in North Dakota), a TV show, a movie, or a bank (Wells Fargo). The student team analysed 4 million tweets to produce a text-query generation & optimization system. The search index query, constructed from combinations of text tokens constrained to simple logical operators, returns a highly pure set of text documents relevant to a property, such as a film, and also provide a characterization of the query bias.

An Attempt to Improve the MBTA Through Data

The MBTA serves 4.8 million people throughout the Boston metro area and facilitates approximately 1.3 million trips each weekday. Aggregated entry and exit data is collected for each rail station at 15-minute intervals. Since commuting is one of the most habitual acts a metropolitan citizen performs, this data provides excellent means to predict ridership throughout the week.