Kushal Kafle Personal Homepage

Kushal Kafle
Kushal Kafle Linkedin Kushal Kafle Google Scholar Kushal Kafle Twitter

Ph.D. Candidate at Center for Imaging Science, RIT
kk6055 (AT) rit.edu

About Me

I am a Ph.D. candidate in Chester F. Carlson Center for Imaging Science at Rochester Institute of Technology. I work at Machine and Neuromorphic Perception Laboratory, (a.k.a. klab) which is directed by my advisor, Dr. Christopher Kanan.

The overarching goal of my research is to develop develop robust vision and language models. Along the way, I have published about both novel data and algorithms for VQA (CVPR 2016, CVPR 2018) and the analysis of robustness of the VQA algorithms in presence of dataset bias (ICCV 2017, CVIU 2017). I am currently exploring models that demonstrate good understanding of visual content and are right for the right reasons.

I am married to Jwala Dhamala, who is also a Ph.D. student in RIT and does fascinating research in machine learning and computational biomedicine.

Kushal Kafle

Timeline Events

Nov 2018: I successfully defended my thesis proposal aka, advancement to candidacy.

Nov 2018: Our new paper "TallyQA: Answering Complex Counting Questions" was accepted to AAAI 2019!

July 2018: I am co-organizing "Workshop on Shortcomings in vision and language (SiVL)" in ECCV, 2018.

May 2018: I was recognized as an outstanding reviewer for CVPR 2018!

Feb 2018: Our new paper "DVQA: Understanding Data Visualizations via Question Answering" was accepted to CVPR 2018!

Jul 2017: Our new paper "An Analysis of Visual Question Answering" was accepted to ICCV 2017! Also, available on arXiv.

Jun 2017: Our short paper "Data Augmentation for Visual Question Answering" was accepted to INLG 2017.

Jun 2017: Visual Question Answering (VQA) survey paper titled "Visual Question Answering: Datasets, Algorithms, and Future Challenges" was accepted to Computer Vision and Image Understanding Journal (CVIU). Also available on arXiv.

May 2017: Started working as Research Intern at Adobe Research.

May 2016: My application to Deep Learning Summer School, 2016 was accepted with scholarship.

Apr 2016: Launched online web demo for Visual Question Answering. ( New!: Added demo for DVQA as well ! )

Mar 2016: Our Paper "Answer Type Prediction for Visual Question Answering" was accepted to CVPR, 2016 !

Mar 2016: Our Amazon Web Services (AWS) grant proposal was accepted. Awarded $15K worth AWS credits.

Jul 2015: Started Working at Machine and Neuromorphic Perception Laboratory under Dr. Christopher Kanan



TallyQA: Answering Complex Counting Questions
Manoj Acharya,Kushal Kafle, and Christopher Kanan

Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection. Here, we study algorithms for complex counting questions that involve relationships between objects, attribute identification, reasoning, and more. To do this, we created TallyQA, the world's largest dataset for open-ended counting. We propose a new algorithm for counting that uses relation networks with region proposals. Our method lets relation networks be efficiently used with high-resolution imagery. It yields state-of-the-art results compared to baseline and recent systems on both TallyQA and the HowMany-QA benchmark.

Association for the Advancement of Artificial Intelligence (AAAI 2019)
Paper Project Page Bibtex
  title={TallyQA: Answering Complex Counting Questions},
  author={Acharya, Manoj and Kafle, Kushal and Kanan, Christopher},


DVQA: Understanding Data Visualizations via Question Answering
Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan

Bar charts are an effective way for humans to convey information to each other, but today's algorithms cannot parse them. Existing methods fail when faced with minor variations in appearance. Here, we present DVQA, a dataset that tests many aspects of bar chart understanding in a question answering framework. Unlike visual question answering (VQA), DVQA requires processing words and answers that are unique to a particular bar chart. State-of-the-art VQA algorithms perform poorly on DVQA, and we propose two strong baselines that perform considerably better. DVQA also serves as an important proxy task for several critical AI abilities, such as attention, working memory, visual reasoning and an ability to handle dynamic and out-of-vocabulary(OOV) labels.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018)
Paper Project Page Bibtex
  title={DVQA: Understanding Data Visualizations via Question Answering},
  author={Kafle, Kushal and Price, Brian and Cohen, Scott and Kanan, Christopher},

Different Tasks in VQA

An Analysis of Visual Question Answering Algorithms
Kushal Kafle and Christopher Kanan

Analyzing and comparing different VQA algorithms is notoriously opaque and difficult. In this paper, we analyze existing VQA algorithms using a new dataset that contains over 1.6 million questions organized into 12 different categories including questions that are meaningless for a given image. We also propose new evaluation schemes that compensate for over-represented question-types and make it easier to study the strengths and weaknesses of algorithms. Our experiments establish how attention helps certain categories more than others, determine which models work better than others, and explain how simple models (e.g. MLP) can surpass more complex models (MCB) by simply learning to answer large, easy question categories.

The IEEE International Conference on Computer Vision (ICCV 2017)
Paper Project Page Bibtex
  title={An Analysis of Visual Question Answering Algorithms},
  author={Kafle, Kushal and Kanan, Christopher},


Data Augmentation for Visual Question Answering
Kushal Kafle, Mohammed Yousefhussien and Christopher Kanan

In this short paper, we describe two simple means of producing new training data for visual question answering algorithms. Data augmentation using these methods show increased performance in both baseline and state of the art VQA algorithms, including pronounced increase in counting questions. which remain one of the most difficult problems in VQA.

International Natural Language Generation Conference (INLG 2017)
Paper Bibtex
  title={Data Augmentation for Visual Question Answering},
  author={Kafle, Kushal and Yousefhussien, Mohammed and Kanan, Christopher}

Common VQA Framework

Visual Question Answering: Datasets, Algorithms, and Future Challenges
Kushal Kafle and Christopher Kanan

Since the release of the first VQA dataset in 2014, additional datasets have been released and many algorithms have been proposed. In this review, we critically examine the current state of VQA in terms of problem formulation, existing datasets, evaluation metrics, and algorithms. In particular, we discuss the limitations of current datasets with regard to their ability to properly train and assess VQA algorithms. We then exhaustively review existing algorithms for VQA. Finally, we discuss possible future directions for VQA and image understanding research.

Computer Vision and Image Understanding (CVIU)
Paper Bibtex
  title={Visual question answering: Datasets, algorithms, and future challenges},
  author={Kafle, Kushal and Kanan, Christopher},
  journal={Computer Vision and Image Understanding},


Answer-Type Prediction for Visual Question Answering
Kushal Kafle and Christopher Kanan

In this paper, we build a system capable of answering open-ended text-based questions about images, which is known as Visual Question Answering (VQA). Our approach's key insight is that we can predict the form of the answer from the question. We formulate our solution in a Bayesian framework. When our approach is combined with a discriminative model, the combined model achieves state-of-the-art results (at the time of publication) on four benchmark datasets for open-ended VQA: DAQUAR, COCO-QA, The VQA Dataset, and Visual7W.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016)
Paper Bibtex
  title={Answer-type prediction for visual question answering},
  author={Kafle, Kushal and Kanan, Christopher},


I have an Erdos number of four!
What follows is some long-winded collaboration lineage hunting, but you can't deny the facts! I have an Erdos number of 4.
Kushal Kafle -> Christopher Kanan -> Kostas Danilidis -> Pavel Valtr -> Paul Erdos
Here's the link.

I have some impressive academic lineage !
Norbert Weiner is my grand-grand-advisor. David Hilbert and Bertrand Russell are my grand-grand-grand-advisors. It only gets better from there! If you follow David Hilbert, you eventually encounter some really big-shot names: Laplace, Fourier, Poisson, Lagrange, Dirichlet. Talk about standing on the shoulder of giants! Here's the link.