Rajvardhan Oak

alt2

About Me

Hi! My name is Raj and I am a graduate student at UC Berkeley. I obtained my undergraduate degree in Computer Science from the University of Pune, India in 2018. Presently, I am in the process of obtaining my masters degree in Information Management Systems. from the School of Information at UC Berkeley. My interests are data science, machine learning and computer security. I am advised by Prof. Dawn Song and Prof. Hany Farid. I also work as a teaching assistant with Prof. Zachary Pardos. I have been extremeley fortunate to have worked with Dr. Sadia Afroz.

I like using my data science and coding skills to solve real world problems, and uncover hidden patterns from large datasets. I have experience working with Natural Language Processing, log analysis, malware detection, adversarial machine learning.I have previously worked on solving problems like fake news detection and hate speech classification.

Presently, I am working with Facebook AI Research to develop a zero-shot learning embedding space for hoax articles, and developing classifiers for fake news detection.


Publications


Research

I am interested in the intersection of machine learning in security. I like to explore problems such as vulnerability detection, anomaly detection, and intrusion detection using machine learning. Recently, I have also been interested in using machine learning to combat disinformation online. This is a broad interest which covers deep fakes, hoax news, authorship fake news, bot accounts on social media, etc.
Here are some of my recent projects:

Malware Detection in Highly Imbalance Datasets. Malware is a malicious program which can steal sensitive information (financial details, passwords) and send them to adversaries, or use system resources to an adversary’s benefit (e.g. bitcoin mining). In the real world, the ratio of labeled malware samples ranges from about 0.1% - 2%, leading to a highly imbalanced datasets, which is a problem for machine learning methods. My research was in addressing the problem of high imbalance in malware detection. Inspired by the results shown by transfer learning in NLP and image processing tasks, we use a model called BERT, which relies on high pretraining to generate contextual embeddings. Using BERT, we are able to detect malware samples with an F-1 score of more than 90%, even when our data contains only 5% malware samples. An interesting finding from our research is that pretraining on NLP data helps improve the F-1 score by 3%.

Lifelong Anomaly Detection. Anomaly detection is essential towards ensuring system security and reliability. We find that existing approaches are not easy to adopt such new knowledge to improve system performance. We propose novel approaches to handle challenges associated with lifelong anomaly detection. In particular, we propose a framework called unlearning, which can effectively correct the model when a false negative (or a false positive) is labeled. To this aim, we develop several novel techniques to tackle two challenges referred to as exploding loss and catastrophic forgetting. We evaluate our approach using two state-of-the-art zero-positive deep learning anomaly detection architectures and three real-world tasks. The results show that the proposed approach is able to significantly reduce the number of false positives and false negatives through unlearning.


Teaching

Fall 2019

Spring 2019

Fall 2018