Kushal Kafle Personal Homepage

Kushal Kafle
Kushal Kafle Linkedin Kushal Kafle Google Scholar Kushal Kafle Twitter

Ph.D. Student at Center for Imaging Science, RIT
kk6055_AT_rit.edu

About Me


I am a fourth year Ph.D. student in Chester F. Carlson Center for Imaging Science at Rochester Institute of Technology. I work at Machine and Neuromorphic Perception Laboratory, (a.k.a. klab) which is directed by my advisor, Dr. Christopher Kanan.

The overarching goal of my research is to develop develop robust vision and language models. Along the way, I have published about both novel data and algorithms for VQA (CVPR 2016, CVPR 2018) and the analysis of robustness of the VQA algorithms in presence of dataset bias (ICCV 2017, CVIU 2017). Currently, I am exploring techniques that can help vision and languge models go beyond the traditional bounds of classification based approaches.

I am married to Jwala Dhamala, also a Ph.D. student in RIT and does fascinating research in machine learning for computational biomedicine.



Kushal Kafle

Timeline Events


May 2018: I was recognized as an outstanding reviewer for CVPR 2018!

Feb 2018: Our new paper "DVQA: Understanding Data Visualizations via Question Answering" was accepted to CVPR 2018!

Jul 2017: Our new paper "An Analysis of Visual Question Answering" was accepted to ICCV 2017! Also, available on arXiv.

Jun 2017: Our short paper "Data Augmentation for Visual Question Answering" was accepted to INLG 2017.

Jun 2017: Visual Question Answering (VQA) survey paper titled "Visual Question Answering: Datasets, Algorithms, and Future Challenges" was accepted to Computer Vision and Image Understanding Journal (CVIU). Also available on arXiv.

May 2017: Started working as Research Intern at Adobe Research.

May 2016: My application to Deep Learning Summer School, 2016 was accepted with scholarship.

Apr 2016: Launched online web demo for Visual Question Answering. ( New!: Added demo for DVQA as well ! )

Mar 2016: Our Paper "Answer Type Prediction for Visual Question Answering" was accepted to CVPR, 2016 !

Mar 2016: Our Amazon Web Services (AWS) grant proposal was accepted. Awarded $15K worth AWS credits.

Jul 2015: Started Working at Machine and Neuromorphic Perception Laboratory under Dr. Christopher Kanan

Publications


DVQA

DVQA: Understanding Data Visualizations via Question Answering
Kushal Kafle, Scott Cohen, Brian Price, and Christopher Kanan

Bar charts are an effective way for humans to convey information to each other, but today's algorithms cannot parse them. Existing methods fail when faced with minor variations in appearance. Here, we present DVQA, a dataset that tests many aspects of bar chart understanding in a question answering framework. Unlike visual question answering (VQA), DVQA requires processing words and answers that are unique to a particular bar chart. State-of-the-art VQA algorithms perform poorly on DVQA, and we propose two strong baselines that perform considerably better. DVQA also serves as an important proxy task for several critical AI abilities, such as attention, working memory, visual reasoning and an ability to handle dynamic and out-of-vocabulary(OOV) labels.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018)
Paper Project Page Bibtex
@inproceedings{kafle2018dvqa,
  title={DVQA: Understanding Data Visualizations via Question Answering},
  author={Kafle, Kushal and Cohen, Scott and Price, Brian and Kanan, Christopher},
  booktitle={CVPR},
  year={2018}
}



Different Tasks in VQA

An Analysis of Visual Question Answering Algorithms
Kushal Kafle and Christopher Kanan

Analyzing and comparing different VQA algorithms is notoriously opaque and difficult. In this paper, we analyze existing VQA algorithms using a new dataset that contains over 1.6 million questions organized into 12 different categories including questions that are meaningless for a given image. We also propose new evaluation schemes that compensate for over-represented question-types and make it easier to study the strengths and weaknesses of algorithms. Our experiments establish how attention helps certain categories more than others, determine which models work better than others, and explain how simple models (e.g. MLP) can surpass more complex models (MCB) by simply learning to answer large, easy question categories.

The IEEE International Conference on Computer Vision (ICCV 2017)
Paper Project Page Bibtex
@inproceedings{kafle2017analysis,
  title={An Analysis of Visual Question Answering Algorithms},
  author={Kafle, Kushal and Kanan, Christopher},
  booktitle={ICCV},
  year={2017}
}



OVERVIEW

Data Augmentation for Visual Question Answering
Kushal Kafle, Mohammed Yousefhussien and Christopher Kanan

In this short paper, we describe two simple means of producing new training data for visual question answering algorithms. Data augmentation using these methods show increased performance in both baseline and state of the art VQA algorithms, including pronounced increase in counting questions. which remain one of the most difficult problems in VQA.

International Natural Language Generation Conference (INLG 2017)
Paper Bibtex
@inproceedings{kafle2017data,
  title={Data Augmentation for Visual Question Answering},
  author={Kafle, Kushal and Yousefhussien, Mohammed and Kanan, Christopher}
  booktitle={INLG},
  year={2017}
}



Common VQA Framework

Visual Question Answering: Datasets, Algorithms, and Future Challenges
Kushal Kafle and Christopher Kanan

Since the release of the first VQA dataset in 2014, additional datasets have been released and many algorithms have been proposed. In this review, we critically examine the current state of VQA in terms of problem formulation, existing datasets, evaluation metrics, and algorithms. In particular, we discuss the limitations of current datasets with regard to their ability to properly train and assess VQA algorithms. We then exhaustively review existing algorithms for VQA. Finally, we discuss possible future directions for VQA and image understanding research.

Computer Vision and Image Understanding (CVIU)
Paper Bibtex
@article{kafle2017visual,
  title={Visual question answering: Datasets, algorithms, and future challenges},
  author={Kafle, Kushal and Kanan, Christopher},
  journal={Computer Vision and Image Understanding},
  year={2017}
}



OVERVIEW

Answer-Type Prediction for Visual Question Answering
Kushal Kafle and Christopher Kanan

In this paper, we build a system capable of answering open-ended text-based questions about images, which is known as Visual Question Answering (VQA). Our approach's key insight is that we can predict the form of the answer from the question. We formulate our solution in a Bayesian framework. When our approach is combined with a discriminative model, the combined model achieves state-of-the-art results (at the time of publication) on four benchmark datasets for open-ended VQA: DAQUAR, COCO-QA, The VQA Dataset, and Visual7W.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016)
Paper Bibtex
@inproceedings{kafle2016answer,
  title={Answer-type prediction for visual question answering},
  author={Kafle, Kushal and Kanan, Christopher},
  booktitle={CVPR},
  year={2016}
}


MISC


I have an Erdos Number!
What follows is some incredibly long-winded collaboration lineage hunting, but you can't deny the facts! I have an Erdos number of 5.
Kushal Kafle -> Christopher Kanan -> Nathan Cahill -> Darren Narayan -> Renu C. Laskar -> Paul Erdos

I have some impressive academic lineage !
Norbert Weiner is my grand-grand-advisor. David Hilbert and Bertrand Russell are my grand-grand-grand-advisors. It only gets better from there! If you follow David Hilbert, you eventually encounter some really big-shot names: Laplace, Fourier, Poisson, Lagrange, Dirichlet. Talk about standing on the shoulder of giants ! Here's the link.