Simmi Mourya

I am a Computer Science gradute student at the University of Pennsylvania. Previously, I worked as a Software Developer/Data Scientist at ESRI, New Delhi, where I explored various applications of Deep Learning in Geo Spatial Imagery and their integration in ArcGIS Python API. Prior to ESRI, I contributed as a researcher at SBILab with Dr. Anubha Gupta at IIIT, Delhi. Most of my research involved developing novel Deep Learning methods in the area of medical imaging.

I completed my bachelors from Cluster Innovation Centre, University of Delhi in Computer Science and Applied Mathematics. There I was advised by Dr. Samir K. Brahmachari for my semester long project. I was also mentored by Dr. Shobha Bagai for course projects. During my junior year, I contributed to Cyvlfeat as a Google Summer of Code intern at Portland State University. For this project, I was advised by Simon Niklaus. More details of the project can be found here. I also mentored some pre-University students for the Google Code-In program for the Apache Mifos Initiative project. I spent half of my senior year at Pitney Bowes as a Data Science Intern.

Email / CV (SDE) / CV (DS) / Google Scholar / LinkedIn / Github / Medium / Photography / Published Work

Software Projects

Search Engine : The main goal of this project was to build a scalable web crawler hosted on Amazon AWS complete with a crawler, indexer, pagerank, and a front end. 1. To have a functioning, reasonable search engine which retrieved relevant pages. 2. Create meaningful indexes and page rank scores for all the webpages crawled. I worked majorly on running and scaling Indexer. Also worked on DevOps for Gradle, EMR, Hadoop, EMRFS and minor Hadoop DevOps for PageRank. (Team size: 4)

Multi-threaded web server and Service framework: A Java based web HTTP 1.1 compliant web server developed from scratch. Later merged it with a custom-built web service framework which emulates the behaviour of Java Spark. Key services implemented: Route registration, Session/Cookie management, Filter handler, Query Parameters handling, Request and Response handlers.

Cross Lingual NER using Multilingual Word Embeddings: In this project we implement and evaluate various cross lingual NER models using bilingual and multilingual word embeddings. Our current method explores the use of a bi-LSTM deep neural network model in the NER task. Our reimplementation of the published baseline in achieves an F1 score of 54.02 on the test set. With some extensions, we were able to boost the F1 score to 63.70. (Team size: 4)

Inferring Cuisines from Cooking Recipe Descriptions: In this project, we aim to predict the cuisine of the dish based on the constituents of this dish.The data for the project is from Kaggle (What’s cooking?) and consists of ingredients (in text form) and the corresponding cuisine. We implement natural language processing techniques to effectively vectorize the words for performing downstream tasks like classification. We find that TF-IDF with RBF-kernel based SVM yields the higher classification accuracy. (Team size: 3)

Reinforcement Learning using the CarRacing-v0 environment from OpenAI Gym : In this project we implement and evaluate various reinforcement learning methods to train the agent for OpenAI- Car Racing-v0 game environment. Our current method explores Fully connected Deep Q-network and achieves an average reward of 210.92 for 10 evaluation steps. We train our best performing model which contains 70,475 paramaters for 570 episodes. (Team size: 2)

Cyvlfeat : Google Summer of Code 2016: Designed and developed 12 new features for a high-performance Python/Cython wrapper of computer vision library, VLFeat. (Added algorithms specializing in image understanding and local features extraction and matching such as LBP, SIFT, hierarchical k-means, SLIC).
Project progress over a period of three months was documented in this blog.

Research

I'm interested in computer vision, machine learning, statistics, image processing, and computational photography. Much of my research deals with applications of deep learning in the field of medical imaging and GeoSpatial Imagery. I have also worked in Natural Language Processing.

LeukoNet: DCT-based CNN architecture for the classification of normal versus Leukemic blasts in B-ALL Cancer
Simmi Mourya*, Sonaal Kant*, Pulkit Kumar*, Ritu Gupta , Anubha Gupta

A deep learning framework for classifying immature leukemic blasts and normal cells by fusing Discrete Cosine Transform (DCT) domain features extracted via CNN with the Optical Density (OD) space features.

Classification of normal vs malignant cells in B-ALL white blood cancer microscopic images. IEEE International Symposium on Biomedical Imaging (ISBI)-2019 challenges
Anubha Gupta, Ritu Gupta, Shiv Gehlot, Simmi Mourya

Classification of normal vs malignant cells in B-ALL white blood cancer microscopic images. The challenge can be found here.

C_NMC_2019 Dataset: ALL Challenge dataset of ISBI 2019
"Gupta, A., & Gupta, R. (2019). ALL Challenge dataset of ISBI 2019 [Data set]. The Cancer Imaging Archive.

Contributed towards validation data post processing and procurement, under guidance of Dr. Anubha Gupta and Dr. Ritu Gupta. The data is hosted on The Cancer Imaging Archive. This dataset was also used for our IEEE ISBI 2019 conference challenge: Classification of Normal vs Malignant Cells in B-ALL White Blood Cancer Microscopic Images.

He is a great guy.