Kushal Kafle Personal Homepage

Kushal Kafle
Kushal Kafle Linkedin Kushal Kafle Google Scholar Kushal Kafle Twitter

Ph.D. Candidate at Center for Imaging Science, RIT
kk6055 (AT) rit.edu

About Me


I am a Ph.D. candidate in Chester F. Carlson Center for Imaging Science at Rochester Institute of Technology. I work at Machine and Neuromorphic Perception Laboratory, (a.k.a. klab) which is directed by my advisor, Dr. Christopher Kanan.

The overarching goal of my research is to develop develop robust vision and language models. Along the way, I have published about both novel data and algorithms for VQA (CVPR 2016, CVPR 2018, CVPR 2019) and the analysis of robustness of the VQA algorithms in presence of dataset bias (ICCV 2017, CVIU 2017). I am currently exploring models that demonstrate good understanding of visual content and are right for the right reasons.



Kushal Kafle

Timeline Events


Sept 2019: Our paper describing a novel state-of-the-art chart question answering algorithm was accepted to appear at WACV.

Mar 2019: I will be working as a research intern at Microsoft Research, Redmond this summer.

Mar 2019: Our new paper "Answer Them All! Toward Universal Visual Question Answering Models" was accepted to CVPR 2019!

Jan 2019: I am co-organizing second edition of "Workshop on Shortcomings in vision and language (SiVL)" in NAACL, 2019.

Nov 2018: I successfully defended my thesis proposal aka, advancement to candidacy.

Nov 2018: Our new paper "TallyQA: Answering Complex Counting Questions" was accepted to AAAI 2019!

July 2018: I am co-organizing "Workshop on Shortcomings in vision and language (SiVL)" in ECCV, 2018.

May 2018: I was recognized as an outstanding reviewer for CVPR 2018!

Feb 2018: Our new paper "DVQA: Understanding Data Visualizations via Question Answering" was accepted to CVPR 2018!

Jul 2017: Our new paper "An Analysis of Visual Question Answering" was accepted to ICCV 2017! Also, available on arXiv.

Jun 2017: Our short paper "Data Augmentation for Visual Question Answering" was accepted to INLG 2017.

Jun 2017: Visual Question Answering (VQA) survey paper titled "Visual Question Answering: Datasets, Algorithms, and Future Challenges" was accepted to Computer Vision and Image Understanding Journal (CVIU). Also available on arXiv.

May 2017: Started working as Research Intern at Adobe Research.

May 2016: My application to Deep Learning Summer School, 2016 was accepted with scholarship.

Apr 2016: Launched online web demo for Visual Question Answering. ( New!: Added demo for DVQA as well ! )

Mar 2016: Our Paper "Answer Type Prediction for Visual Question Answering" was accepted to CVPR, 2016 !

Mar 2016: Our Amazon Web Services (AWS) grant proposal was accepted. Awarded $15K worth AWS credits.

Jul 2015: Started Working at Machine and Neuromorphic Perception Laboratory under Dr. Christopher Kanan

Publications



REMIND Your Neural Network to Prevent Catastrophic Forgetting

REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes*, Kushal Kafle*, Robik Shrestha*, Manoj Acharya, and Christopher Kanan
* denotes equal contribution

In lifelong machine learning, an agent must be incrementally updated with new knowledge, instead of having distinct train and deployment phases. For incrementally training convolutional neural network models, prior work has enabled replay by storing raw images, but this is memory intensive and not ideal for embedded agents. Here, we propose REMIND, a tensor quantization approach that enables efficient replay with tensors. Unlike other methods, REMIND is trained in a streaming manner, meaning it learns one example at a time rather than in large batches containing multiple classes. Our approach achieves state-of-the-art results for incremental class learning on the ImageNet-1K dataset. We demonstrate REMIND's generality by pioneering multi-modal incremental learning for visual question answering (VQA), which cannot be readily done with comparison models.

Under Review, available on arXiv (2019)
Paper Bibtex
@article{hayes2019remind,
  title={REMIND Your Neural Network to Prevent Catastrophic Forgetting},
  author={Hayes, Tyler L and Kafle, Kushal and Shrestha, Robik and Acharya, Manoj and Kanan, Christopher},
  journal={arXiv preprint arXiv:1910.02509},
  year={2019}
}


Parallel Recurrent Fusion for Chart Question Answering

Answering Questions about Data Visualizations using Efficient Bimodal Fusion
Kushal Kafle, Robik Shrestha, and Christopher Kanan

Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e.g. bar charts, pie charts, and line graphs. Here, we propose a novel CQA algorithm called parallel recurrent fusion of image and language (PReFIL). PReFIL first learns bimodal embeddings by fusing question and image features and then intelligently aggregates these learned embeddings to answer the given question. Despite its simplicity, PReFIL greatly surpasses state-of-the art systems and human baselines on both the FigureQA and DVQA datasets. Additionally, we demonstrate that PReFIL can be used to reconstruct tables by asking a series of questions about a chart.

IEEE Winter Conference on Applications of Computer Vision (WACV 2020)
Paper Bibtex
@inproceedings{kafle2020prefil,
title={Answering Questions about Data Visualizations using Efficient Bimodal Fusion},
  author={Kafle, Kushal and Shrestha, Robik and  Kanan, Christopher},
  booktitle={WACV},
  year={2020}
}



Bias amplification in VQA

Challenges and Prospects in Vision and Language Research
Kushal Kafle, Robik Shrestha, and Christopher Kanan

Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of capabilities that integrate computer vision, reasoning, and natural language understanding. However, rather than behaving as visual Turing tests, recent studies have demonstrated state-of-the-art systems are achieving good performance through flaws in datasets and evaluation procedures. We review the current state of affairs and outline a path forward.

Frontiers in Artificial Intelligence - Language and Computation (Accepted, 2019)
Paper Bibtex
@article{kafle2019challenges,
  title={Challenges and Prospects in Vision and Language Research},
  author={Kafle, Kushal and Shrestha, Robik and Kanan, Christopher},
  journal={arXiv preprint arXiv:1904.09317},
  year={2019}
}



RAMEN

Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha, Kushal Kafle, and Christopher Kanan

Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning. A good VQA algorithm should be capable of both, but only a few VQA algorithms are tested in this manner. We compare five state-of-the-art VQA algorithms across eight VQA datasets covering both domains. To make the comparison fair, all of the models are standardized as much as possible, E.g., they use the same visual features, answer vocabularies, etc. We find that methods do not generalize across the two domains. To address this problem, we propose a new VQA algorithm that rivals or exceeds the state-of-the-art for both domains.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)
Paper Code (Coming Soon) Bibtex
@inproceedings{shrestha2019ramen,
title={Answer Them All! Toward Universal Visual Question Answering Models},
  author={Shrestha, Robik and Kafle, Kushal and Kanan, Christopher},
  booktitle={CVPR},
  year={2019}
    }



TallyQA

TallyQA: Answering Complex Counting Questions
Manoj Acharya, Kushal Kafle, and Christopher Kanan

Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection. Here, we study algorithms for complex counting questions that involve relationships between objects, attribute identification, reasoning, and more. To do this, we created TallyQA, the world's largest dataset for open-ended counting. We propose a new algorithm for counting that uses relation networks with region proposals. Our method lets relation networks be efficiently used with high-resolution imagery. It yields state-of-the-art results compared to baseline and recent systems on both TallyQA and the HowMany-QA benchmark.

Association for the Advancement of Artificial Intelligence (AAAI 2019)
Paper Project Page Bibtex
@inproceedings{acharya2019tallyqa,
  title={TallyQA: Answering Complex Counting Questions},
  author={Acharya, Manoj and Kafle, Kushal and Kanan, Christopher},
  booktitle={AAAI},
  year={2019}
  }



DVQA

DVQA: Understanding Data Visualizations via Question Answering
Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan

Bar charts are an effective way for humans to convey information to each other, but today's algorithms cannot parse them. Existing methods fail when faced with minor variations in appearance. Here, we present DVQA, a dataset that tests many aspects of bar chart understanding in a question answering framework. Unlike visual question answering (VQA), DVQA requires processing words and answers that are unique to a particular bar chart. State-of-the-art VQA algorithms perform poorly on DVQA, and we propose two strong baselines that perform considerably better. DVQA also serves as an important proxy task for several critical AI abilities, such as attention, working memory, visual reasoning and an ability to handle dynamic and out-of-vocabulary(OOV) labels.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018)
Paper Project Page Bibtex
@inproceedings{kafle2018dvqa,
  title={DVQA: Understanding Data Visualizations via Question Answering},
  author={Kafle, Kushal and Price, Brian and Cohen, Scott and Kanan, Christopher},
  booktitle={CVPR},
  year={2018}
}



Different Tasks in VQA

An Analysis of Visual Question Answering Algorithms
Kushal Kafle and Christopher Kanan

Analyzing and comparing different VQA algorithms is notoriously opaque and difficult. In this paper, we analyze existing VQA algorithms using a new dataset that contains over 1.6 million questions organized into 12 different categories including questions that are meaningless for a given image. We also propose new evaluation schemes that compensate for over-represented question-types and make it easier to study the strengths and weaknesses of algorithms. Our experiments establish how attention helps certain categories more than others, determine which models work better than others, and explain how simple models (e.g. MLP) can surpass more complex models (MCB) by simply learning to answer large, easy question categories.

The IEEE International Conference on Computer Vision (ICCV 2017)
Paper Project Page Bibtex
@inproceedings{kafle2017analysis,
  title={An Analysis of Visual Question Answering Algorithms},
  author={Kafle, Kushal and Kanan, Christopher},
  booktitle={ICCV},
  year={2017}
}



OVERVIEW

Data Augmentation for Visual Question Answering
Kushal Kafle, Mohammed Yousefhussien and Christopher Kanan

In this short paper, we describe two simple means of producing new training data for visual question answering algorithms. Data augmentation using these methods show increased performance in both baseline and state of the art VQA algorithms, including pronounced increase in counting questions. which remain one of the most difficult problems in VQA.

International Natural Language Generation Conference (INLG 2017)
Paper Bibtex
@inproceedings{kafle2017data,
  title={Data Augmentation for Visual Question Answering},
  author={Kafle, Kushal and Yousefhussien, Mohammed and Kanan, Christopher}
  booktitle={INLG},
  year={2017}
}



Common VQA Framework

Visual Question Answering: Datasets, Algorithms, and Future Challenges
Kushal Kafle and Christopher Kanan

Since the release of the first VQA dataset in 2014, additional datasets have been released and many algorithms have been proposed. In this review, we critically examine the current state of VQA in terms of problem formulation, existing datasets, evaluation metrics, and algorithms. In particular, we discuss the limitations of current datasets with regard to their ability to properly train and assess VQA algorithms. We then exhaustively review existing algorithms for VQA. Finally, we discuss possible future directions for VQA and image understanding research.

Computer Vision and Image Understanding (CVIU)
Paper Bibtex
@article{kafle2017visual,
  title={Visual question answering: Datasets, algorithms, and future challenges},
  author={Kafle, Kushal and Kanan, Christopher},
  journal={Computer Vision and Image Understanding},
  year={2017}
}



OVERVIEW

Answer-Type Prediction for Visual Question Answering
Kushal Kafle and Christopher Kanan

In this paper, we build a system capable of answering open-ended text-based questions about images, which is known as Visual Question Answering (VQA). Our approach's key insight is that we can predict the form of the answer from the question. We formulate our solution in a Bayesian framework. When our approach is combined with a discriminative model, the combined model achieves state-of-the-art results (at the time of publication) on four benchmark datasets for open-ended VQA: DAQUAR, COCO-QA, The VQA Dataset, and Visual7W.

IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016)
Paper Bibtex
@inproceedings{kafle2016answer,
  title={Answer-type prediction for visual question answering},
  author={Kafle, Kushal and Kanan, Christopher},
  booktitle={CVPR},
  year={2016}
}


MISC


I have an Erdos number of four!
What follows is some long-winded collaboration lineage hunting, but you can't deny the facts! I have an Erdos number of 4.
Kushal Kafle -> Christopher Kanan -> Kostas Danilidis -> Pavel Valtr -> Paul Erdos
Here's the link.

I have some impressive academic lineage !
Norbert Weiner is my grand-grand-advisor. David Hilbert and Bertrand Russell are my grand-grand-grand-advisors. It only gets better from there! If you follow David Hilbert, you eventually encounter some really big-shot names: Laplace, Fourier, Poisson, Lagrange, Dirichlet. Talk about standing on the shoulder of giants! Here's the link.