A Multimodal LLM-based Assistant for User-Centric Interactive Machine Learning

doi

This paper introduces a novel interactive system powered by a multimodal large language model (MLLM) designed to democratize machine learning development for non-experts. Unlike traditional ML development approaches, our MLLM-based assistant engages users in natural conversations while providing visual feedback, helping them articulate their requirements and generate high-quality training data. A key challenge in ML development is that novice users often struggle to create comprehensive and well-balanced training datasets that accurately reflect their intended use cases. Our system addresses this challenge by actively monitoring the user’s interactions and providing real-time guidance, translating abstract user requirements into concrete ML specifications through multimodal dialogue. Through our user studies, we demonstrate that this approach significantly improves the quality of training data and leads to more successful ML model development outcomes.