Nilesh Kulkarni

I am a first year CSE Ph.D. student at the University of Michigan advised by David Fouhey and Justin Johnson. I previously graduated with masters from Carnegie Mellon University where I was advised by Abhinav Gupta. Before that I was an undergrad in the Computer Science and Engineering department at IIT Bombay .

I have closely collaborated with Shubham Tulsiani and Ishan Misra. I spent time at Samsung AI Research in Seoul, South Korea for two years as Research Engineer. My research interests are to understand and learn the 3D structure in the visual world with minimal supervision.

email | github | google scholar | CV

nileshk AT umich DOT edu
My picture


[Feb 2020] A-CSM is accepted at CVPR 2020
[Sep 2019] Started as a Ph.D. student at University of Michigan
[Jun 2019] Defended my master thesis from Robotics@CMU
[Jun 2019] Two Papers accepted at ICCV 2019


[New] Articulation-aware Canonical Surface Mapping
Nilesh Kulkarni, Abhinav Gupta, David Fouhey, Shubham Tulsiani
CVPR, 2020
pdf   abstract   bibtex   code

We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape , and 2) inferring the articulation and pose of the template corresponding to the input image. While previous approaches rely on leveraging keypoint supervision for learning, we present an approach that can learn without such annotations. Our key insight is that these tasks are geometrically related, and we can obtain supervisory signal via enforcing consistency among the predictions. We present results across a diverse set of animate object categories, showing that our method can learn articulation and CSM prediction from image collections using only foreground mask labels for training. We empirically show that allowing articulation helps learn more accurate CSM prediction, and that enforcing the consistency with predicted CSM is similarly critical for learning meaningful articulation.

  title={Articulation-aware Canonical Surface Mapping},
  author={Kulkarni, Nilesh and Gupta, Abhinav and Fouhey, David and Tulsiani, Shubham},
  booktitle={Computer Vision and Pattern Recognition (CVPR)}

Canonical Surface Mapping via Geometric Cycle Consistency
Nilesh Kulkarni, Abhinav Gupta*, Shubham Tulsiani*
ICCV, 2019
pdf   project page   abstract   bibtex   video   code

We explore the task of Canonical Surface Mapping (CSM). Specifically, given an image, we learn to map pixels on the object to their corresponding locations on an abstract 3D model of the category. But how do we learn such a mapping? A supervised approach would require extensive manual labeling which is not scalable beyond a few hand-picked categories. Our key insight is that the CSM task (pixel to 3D), when combined with 3D projection (3D to pixel), completes a cycle. Hence, we can exploit a geometric cycle consistency loss, thereby allowing us to forgo the dense manual supervision. Our approach allows us to train a CSM model for a diverse set of classes, without sparse or dense keypoint annotation, by leveraging only foreground mask labels for training. We show that our predictions also allow us to infer dense correspondence between two images, and compare the performance of our approach against several methods that predict correspondence by leveraging varying amount of supervision.

  title={Canonical Surface Mapping via Geometric Cycle Consistency},
  author={Kulkarni, Nilesh and Gupta, Abhinav and Tulsiani, Shubham},
  booktitle={International Conference on Computer Vision (ICCV)}

3D-RelNet: Joint Object and Relational Network for 3D Prediction
Nilesh Kulkarni, Ishan Misra, Shubham Tulsiani, Abhinav Gupta
ICCV, 2019
pdf   project page   abstract   bibtex   code

We propose an approach to predict the 3D shape and pose for the objects present in a scene. Existing learning based methods that pursue this goal make independent predictions per object, and do not leverage the relationships amongst them. We argue that reasoning about these relationships is crucial, and present an approach to incorporate these in a 3D prediction framework. In addition to independent per-object predictions, we predict pairwise relations in the form of relative 3D pose, and demonstrate that these can be easily incorporated to improve object level estimates. We report performance across different datasets (SUNCG, NYUv2), and show that our approach significantly improves over independent prediction approaches while also outperforming alternate implicit reasoning methods.

  title={3D-RelNet: Joint Object and Relational Network for 3D Prediction},
  author={Nilesh Kulkarni, Ishan Misra, Shubham Tulsiani, Abhinav Gupta},

On-Device Neural Language Model based Word Prediction
Seunghak Yu*, Nilesh Kulkarni*, Haejun Lee, Jihie Kim
COLING : System Demonstrations, 2018
pdf   abstract   bibtex

Recent developments in deep learning with application to language modeling have led to success in tasks of text processing, summarizing and machine translation. However, deploying huge language models on mobile devices for on-device keyboards poses computation as a bottle-neck due to their puny computation capacities. In this work, we propose an on-device neural language model based word prediction method that optimizes run-time memory and also provides a realtime prediction environment. Our model size is 7.40MB and has average prediction time of 6.47 ms. The proposed model outperforms existing methods for word prediction in terms of keystroke savings and word prediction rate and has been successfully commercialized..

  title={On-device neural language model based word prediction},
  author={Yu, Seunghak and Kulkarni, Nilesh and Lee, Haejun and Kim, Jihie},
  booktitle={Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations},

Syllable-level Neural Language Model for Agglutinative Language
Seunghak Yu*, Nilesh Kulkarni*, Haejun Lee, Jihie Kim
EMNLP workshop on Subword and Character level models in NLP (SCLeM), 2017
pdf   abstract   bibtex

Language models for agglutinative languages have always been hindered in past due to myriad of agglutinations possible to any given word through various affixes. We propose a method to diminish the problem of out-of-vocabulary words by introducing an embedding derived from syllables and morphemes which leverages the agglutinative property. Our model outperforms character-level embedding in perplexity by 16.87 with 9.50 M parameters. Proposed method achieves state of the art performance over existing input prediction methods in terms of Key Stroke Saving and has been commercialized.

            title={Syllable-level neural language model for agglutinative language},
            author={Yu, Seunghak and Kulkarni, Nilesh and Lee, Haejun and Kim, Jihie},
            journal={arXiv preprint arXiv:1708.05515},

Robust kernel principal nested spheres
Suyash Awate*, Manik Dhar*, Nilesh Kulkarni*
ICPR, 2016
pdf   abstract   bibtex

Kernel principal component analysis (kPCA) learns nonlinear modes of variation in the data by nonlinearly mapping the data to kernel feature space and performing (linear) PCA in the associated reproducing kernel Hilbert space (RKHS). However, several widely-used Mercer kernels map data to a Hilbert sphere in RKHS. For such directional data in RKHS, linear analyses can be unnatural or suboptimal. Hence, we propose an alternative to kPCA by extending principal nested spheres (PNS) to RKHS without needing the explicit lifting map underlying the kernel, but solely relying on the kernel trick. It generalizes the model for the residual errors by penalizing the L p norm / quasi-norm to enable robust learning from corrupted training data. Our method, termed robust kernel PNS (rkPNS), relies on the Riemannian geometry of the Hilbert sphere in RKHS. Relying on rkPNS, we propose novel algorithms for dimensionality reduction and classification (with and without outliers in the training data). Evaluation on real-world datasets shows that rkPNS compares favorably to the state of the art.

            title={Robust kernel principal nested spheres},
            author={Suyash P. Awate and Manik Dhar and Nilesh Kulkarni},
            journal={2016 23rd International Conference on Pattern Recognition (ICPR)},


Electronic apparatus for compressing language model, electronic apparatus for providing recommendation word and operation methods thereof
Seunghak Yu, Nilesh Kulkarni, Haejun Lee
US Patent App. 15/888,442
patent   abstract

An electronic apparatus for compressing a language model is provided, the electronic apparatus including a storage configured to store a language model which includes an embedding matrix and a softmax matrix generated by a recurrent neural network (RNN) training based on basic data including a plurality of sentences, and a processor configured to convert the embedding matrix into a product of a first projection matrix and a shared matrix, the product of the first projection matrix and the shared matrix having a same size as a size of the embedding matrix, and to convert a transposed matrix of the softmax matrix into a product of a second projection matrix and the shared matrix, the product of the second projection matrix and the shared matrix having a same size as a size of the transposed matrix of the softmax matrix, and to update elements of the first projection matrix, the second projection matrix and the shared matrix by performing the RNN training with respect to the first projection matrix, the second projection matrix and the shared matrix based on the basic data.
Website inspired from here