Useful resource-constrained picture era and visible understanding: an interview with Aniket Roy

Within the newest in our sequence of interviews assembly the AAAI/SIGAI Doctoral Consortium members, we caught up with Aniket Roy to search out out extra about his analysis on generative fashions for pc imaginative and prescient duties.

Inform us a bit about your PhD – the place did you research, and what was the subject of your analysis?

I lately accomplished my PhD in Pc Science at Johns Hopkins College, the place I labored beneath the supervision of Bloomberg Distinguished Professor Rama Chellappa. My analysis primarily centered on creating strategies for resource-constrained picture era and visible understanding. Specifically, I explored how trendy generative fashions may be tailored to function effectively whereas sustaining robust efficiency.

Throughout my PhD, I labored broadly on the intersection of generative AI, multimodal studying, and few-shot studying. A lot of my work concerned designing strategies that allow fashions to be taught new ideas or carry out advanced visible duties with restricted knowledge or computational sources. This included analysis on diffusion fashions, customized picture era, and multimodal illustration studying. Total, my work goals to make superior imaginative and prescient and generative AI programs extra adaptable, environment friendly, and sensible for real-world functions.

May you give us an summary of the analysis you carried out throughout your PhD?

Throughout my PhD, my analysis broadly centered on enhancing the adaptability, effectivity, and high quality of contemporary generative fashions for pc imaginative and prescient duties. The speedy progress in generative AI–notably diffusion fashions and imaginative and prescient–language fashions–has created new alternatives to deal with long-standing challenges reminiscent of knowledge shortage, controllable era, and customized picture synthesis. My work aimed to develop strategies that permit these giant fashions to adapt successfully with restricted knowledge and computational sources whereas sustaining excessive visible constancy.

One line of my analysis addressed studying in data-constrained settings. For instance, I proposed FeLMi, a few-shot studying framework that leverages uncertainty-guided laborious mixup methods to enhance robustness and generalization when solely a small variety of labeled samples can be found. Constructing on this concept of enhancing coaching knowledge high quality, I additionally developed Cap2Aug, which introduces caption-guided multimodal augmentation. This strategy makes use of textual descriptions to information artificial picture era, enhancing visible range whereas lowering the area hole between actual and generated knowledge.

Overview of Cap2Aug.

One other facet of my analysis centered on enhancing the perceptual high quality of photos generated by diffusion fashions. On this route, I proposed DiffNat, a plug-and-play regularization methodology based mostly on the kurtosis-concentration property noticed in pure photos. By incorporating this precept into diffusion fashions by a KC loss, the generated photos exhibit extra pure texture statistics and improved perceptual realism, which additionally advantages downstream imaginative and prescient duties.

A serious a part of my work explored personalization and environment friendly adaptation of huge generative fashions. I launched DuoLoRA, a parameter-efficient framework for composing low-rank adapters that allows fine-grained management over content material and elegance with out requiring full retraining of the bottom mannequin. I additional prolonged personalization to zero-shot settings utilizing a training-free textual inversion strategy that enables arbitrary objects to be personalized instantly throughout era. Lastly, I proposed MultiLFG, a frequency-guided multi-LoRA composition framework that makes use of wavelet-domain representations and timestep-aware weighting to allow correct and training-free fusion of a number of ideas in diffusion fashions.

Overview of DuoLoRA.

Total, my analysis contributes towards constructing generative programs which can be extra environment friendly, adaptable, and controllable, enabling high-quality picture era and understanding even in data-limited or resource-constrained eventualities.

Was there a selected mission or a facet of your analysis that was notably attention-grabbing?

One mission that I discovered notably attention-grabbing throughout my PhD is DiffNat, which was printed in TMLR 2025. Diffusion fashions have turn out to be the spine of many trendy generative AI programs and have achieved spectacular ends in producing and modifying practical photos. Nonetheless, enhancing the perceptual high quality and naturalness of generated photos stays an vital problem.

Overview of DiffNat.

On this work, we launched a easy however efficient regularization method known as the kurtosis focus (KC) loss, which may be built-in into normal diffusion mannequin pipelines as a plug-and-play element. The concept was impressed by a statistical property of pure photos: when a picture is decomposed into totally different band-pass filtered variations–for instance utilizing the Discrete Wavelet Remodel–the kurtosis values throughout these frequency bands are typically comparatively constant. In distinction, generated photos usually present giant discrepancies throughout these bands. Our methodology reduces the hole between the very best and lowest kurtosis values throughout the frequency parts, encouraging the generated photos to observe extra pure picture statistics.

As well as, we launched a condition-agnostic perceptual steering technique throughout inference that additional improves picture constancy with out requiring further coaching indicators. We evaluated the strategy throughout a number of numerous duties, together with customized few-shot finetuning with textual content steering, unconditional picture era, picture super-resolution, and blind face restoration. Throughout these duties, incorporating the KC loss and perceptual steering constantly improved perceptual high quality, measured by metrics reminiscent of FID and MUSIQ, in addition to by human analysis.

What I notably favored about this mission is that it connects classical picture statistics with trendy diffusion fashions. It reveals that comparatively easy statistical insights about pure photos can nonetheless play a robust position in enhancing giant generative fashions.

What are your plans for constructing on the PhD – the place are you working now and what is going to you be investigating subsequent?

Throughout my PhD, I found that I genuinely benefit from the technique of analysis–particularly the second when an instinct or concept seems to work in follow. That technique of exploring new concepts and pushing the boundaries of what we all know is one thing I discover very motivating.

To proceed pursuing this, I will probably be becoming a member of NEC Laboratories America as a Analysis Scientist. On this position, I hope to construct on my PhD work by creating new strategies for generative fashions and exploring how these fashions can work together with broader multimodal programs. Specifically, I’m focused on advancing analysis on the intersection of generative fashions, imaginative and prescient–language–motion fashions, and embodied AI. Extra broadly, my aim is to contribute to the event of clever programs that may perceive, generate, and work together with the visible world extra successfully, whereas additionally persevering with to push ahead the scientific understanding of those fashions.

I’m focused on how you bought into the sector. What impressed you to review pc imaginative and prescient and machine studying?

My curiosity in pc imaginative and prescient and machine studying began throughout my undergraduate research, once I took programs in sign processing and picture processing. I discovered these topics notably fascinating as a result of they allowed you to experiment with algorithms and instantly see their results on photos. That visible and intuitive facet made the sector very partaking, and it helped me admire how mathematical ideas can instantly translate into significant visible outcomes.

On the similar time, I used to be additionally interested by how the human mind processes visible data—how we’re capable of acknowledge objects, perceive scenes, and interpret advanced visible indicators so effortlessly. That curiosity led me to wonder if we might design computational fashions that mimic points of human notion and allow machines to know visible knowledge in the same manner.

A serious affect throughout this time was my professor, Dr. Kuntal Ghosh, who inspired me to suppose extra deeply about these issues and strategy them with a scientific mindset. His mentorship performed an vital position in shaping my curiosity in analysis. Since then, that curiosity about visible notion and clever programs has continued to drive my work in pc imaginative and prescient and machine studying.

What was your expertise of the Doctoral Consortium at AAAI?

Sadly, I used to be not capable of attend the AAAI Doctoral Consortium in particular person resulting from visa-related points. Nonetheless, a colleague kindly helped current my poster on my behalf through the occasion. Despite the fact that I couldn’t be there bodily, I used to be very inspired by the response my work obtained. A number of researchers reached out to me after seeing the poster, and we had some very insightful discussions in regards to the concepts and potential future instructions of the analysis. In that sense, I nonetheless discovered the expertise fairly rewarding. The Doctoral Consortium is a good platform for sharing early-stage concepts, receiving suggestions from the neighborhood, and connecting with different researchers engaged on associated issues. I appreciated the chance to interact with individuals who had been within the work, and people interactions helped spark new views and collaborations.

May you inform us an attention-grabbing (non-AI associated) reality about you?

Outdoors of analysis, I’m a giant fan of music and stand-up comedy, and I actually get pleasure from touring every time I get the possibility. Exploring new locations, cultures, and views is one thing I discover refreshing—it’s an effective way to recharge and keep curious in regards to the world past work. I additionally get pleasure from writing poetic satire now and again, and I sometimes carry out it. It’s a enjoyable inventive outlet that enables me to combine humor and storytelling, which is sort of totally different from the analytical nature of the analysis work I often do.

About Aniket Roy

Aniket is at the moment a Analysis Scientist at NEC Labs America. He obtained his PhD from the Pc Science dept at Johns Hopkins College beneath the steering of Bloomberg Distinguished Professor Prof. Rama Chellappa. Previous to that, he did a Grasp’s from Indian Institute of Expertise Kharagpur. He was acknowledged with the Finest Paper Award at IWDW 2016 and the Markose Thomas Memorial Award for one of the best analysis paper on the Grasp’s stage. Throughout PhD, he explored domains of few-shot studying, multimodal studying, diffusion fashions, LLMs, LoRA merging with publications in main venues reminiscent of NeurIPS, ICCV, TMLR, WACV, CVPR and likewise 3 US patents filed. Throughout his PhD, he additionally gained industrial expertise by a number of internships in Amazon, Qualcomm, MERL, and SRI Worldwide. He was awarded as an Amazon Fellow (2023-24) at JHU and chosen to take part in ICCV’25 and AAAI’26 doctoral consortium.

AIhub
is a non-profit devoted to connecting the AI neighborhood to the general public by offering free, high-quality data in AI.

Main Menu

What's Hot

Sixteen new START.nano corporations are creating hard-tech options with the help of MIT.nano | MIT Information

The Finest AI-Pushed Market Intelligence Platforms for Institutional Traders

What Anthropic Glasswing reveals about the way forward for vulnerability discovery

Useful resource-constrained picture era and visible understanding: an interview with Aniket Roy

Dependable Robotics Completes Detect and Keep away from Testing for the FAA

Tennibot launches Companion V2, its newest robotic tennis ball machine

Palladyne AI Secures Further Foundational Swarming U.S. Patent on AI-Pushed Path Creation, Goal Detection, and Behavioral Prediction

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

Sixteen new START.nano corporations are creating hard-tech options with the help of MIT.nano | MIT Information

The Finest AI-Pushed Market Intelligence Platforms for Institutional Traders

What Anthropic Glasswing reveals about the way forward for vulnerability discovery

Your AirPods are gross. This $6 instrument will preserve them glowing clear

Main Menu

Subscribe to Updates

What's Hot

Useful resource-constrained picture era and visible understanding: an interview with Aniket Roy

Inform us a bit about your PhD – the place did you research, and what was the subject of your analysis?

May you give us an summary of the analysis you carried out throughout your PhD?

Was there a selected mission or a facet of your analysis that was notably attention-grabbing?

What are your plans for constructing on the PhD – the place are you working now and what is going to you be investigating subsequent?

I’m focused on how you bought into the sector. What impressed you to review pc imaginative and prescient and machine studying?

What was your expertise of the Doctoral Consortium at AAAI?

May you inform us an attention-grabbing (non-AI associated) reality about you?

About Aniket Roy

Related Posts