Automated Error Correction tool for POS tags of Hindi Data

MT-NLP Lab at the International Institute of Information Technology

Built an Automated Error Correction tool for POS tags of Hindi Data, in order to ensure that higher quality data is available for use. Hindi is one of the most widely spoken languages in India. Achieved an accuracy of 94.02% with a Neural Network that employed the probabilistic predictions of Support Vector Machines, Long Short Term Memory Networks, and Conditional Random Fields to make its predictions. Published a paper on the same at PACLIC 2018 and presented the paper at Hong Kong.


Cornell Tech in association with Estee Lauder

For a challenge extended by Estee Lauder - "How might we use data to develop in-the-moment personalized shopping experiences for e-commerce customers?", we built the product "dayly". This is an app that serves as a skin care diary to allow customers to track the effects their skin care routine products are having on them. The performance of the products on the skin conditions the customers are hoping to improve can be used to provide the recommendations given to the customer. The improvements in their skin can be analysed using Computer Vision and Deep Learning. Through this project we conducted user interviews to identify the real problems present, built the prototype and presented the final project at Estee Lauder and at Cornell Tech's Open Studio.

Echo Cares

Tech Media and Democracy at Cornell Tech in association with Columbia University, Parsons School of Design, New York University

A website that aggregates and verifies snippets of stories and posts written by people from all over the world about surviving the pandemic - or doing/witnessing an act of kindness during the pandemic. People all over are being overloaded with negative news. The aim of the website is to provide users with a sense of strength, and hope. The content is scraped from public posts of various social media platforms, and manually verified before being uploaded. However, by using sentiment analysis, we are able to cut down the volume of content to 1/4 th the original volume. This makes it possible to upload verified positive content to the website. We hope that this site will form an online safe space for the user where they do not have to worry about accidentally coming across negative content, while simultaneously behaving as a curation of stories on how humans coped with and grew through the pandemic.

Use of Bots to sow Political Discord using Misinformation and Disinformation

Tech Media and Democracy at Cornell Tech in association with Columbia University, Parsons School of Design, New York University

Political information/misinformation and disinformation has been used rampantly to sow discord in the context of Indian politics. We scraped Twitter with certain hashtags to understand the correlation between events happening in India from 01/2020 to 03/2020. During this time, every corner of the country was being torn apart by riots. We processed a massive amount of data in python and analysed the tweets as well as users to have discovered a good number of bots that constantly tweet misinformation and highly disruptive content. Some bots are from locations within India and some outside of the country that rapidly spread aggressive content to bring about communal divide. The article below discusses our findings in details and the methodologies we used to reach that conclusion.

Smart Glasses

Cornell Tech

The elderly who have hearing impairments and weakened vocal chords face trouble while conversing. For the question "How might we create a communication solution for older citizens with hearing and vocal impairments in a loud urban environment?" we propose the solution of Smart Glasses. Smart glasses employ sound conduction through bones. The arms of the glasses that fall beneath the ear lobes can conduct sound through the mastoid. A microphone present near the bridge of the spectacles asissts in amplifying the sound. The report attached below details the design process followed in designing this product, as well as an instruction manual for building it.

Detecting Political Bias Using Deep Learning

Cornell Tech

The political ecosystem in the United States is dynamic and complex. There are two major political parties in the United States and much of the political discourse in the country can be classified as either leaning conservative or liberal. This paper details the approach we have taken towards identifying political bias that can be inferred from text. We refer to existing research done in the area and aim to replicate two previous approaches for this classification problem: a Recursive Neural Network and Long short-term memory architecture. We then assess the transferability of the models to similar datasets. We detail the datasets used for this purpose and the metrics for evaluating the results achieved.

Part of Speech Tagging of Konkani Using Semi-Supervised Learning

MT-NLP Lab at the International Institute of Information Technology

Built a command line tool for Part of Speech Tagging of Konkani (A data scarce language spoken by a small section of Indians) with 77% accuracy using Semi-Supervised Machine Learning Techniques like Active Learning and Self Training.

Morphological Analyzer for Kannada

Manipal Institute of Technology

Built a Morphological Analyzer for Kannada, an agglutinative language spoken in Karnataka, India, using Support Vector Machines for classiļ¬cation and achieved an accuracy rate of 85%. The morphological components of the language provide valuable information regarding the properties of the word (e.g. Plurality, Gender) which can be used as data for other complex ML tasks. Published a paper on the same in IJET 2018.

Soccer Summary Generator

Manipal Institute of Technology

Built a python command line tool which generates a textual summary of a video of a football match using RNNs and CNNs. Generation of a narrative for a football match using video imagery as the input. Using convolutional neural networks (CNNs), features present in a frame of the video were identified and then subsequently captioned using recurrent neural networks (RNNs). By captioning one frame in every 50 frames, we were effectively able to generate a summary of the match. However, the successful generation was solely due to the small size of our training data, and in the blog post linked below, I expand on why this project is a good example of ML projects giving a false sense of success due to overfitting.