ExpertSUM: Expert-Level Text Summarization from Fine-Grained Multimedia Analytics
Special Session at MMM 2025
Nara, Japan

Special Session ExpertSUM

In recent years, video data has been increasingly utilized for the purpose of safety management and operational efficiency in a variety of industries, including mobility, disaster prevention, manufacturing, healthcare, food computing, and retail. It takes an enormous number of hours to manually check videos and create expert-quality reports. Recently, generative AI including Large Language Model (LLM) and Vision and Language Model (VLM) has attracted enormous interest from public where users can interact with an AI chatbot to ask open ended questions about long sentence. Although it has become possible to generate explanatory text for still images using generative AI, the generated text does not reach the level of detail required by industry. For example, it is difficult with the current VLM to generate a vehicle collision report for determining the percentage of fault at a property insurance company.

Regarding the above background, this special session invites papers in the fine-grained multimedia analytics that help to generate detailed descriptions for expert-quality reports. Specifically, the focus of this proposal is on two key aspects:

  • Fine-grained Multimedia analytics covering various type of details e.g. Open Vocabulary Object Detection/Action Recognition/Scene Graph Generation.
  • Bridging details from fine-grained multimedia analytics into text generation.
This special session aims to bring together researchers in academia and industry to discuss about topics related to expert-level text generation from fine-grained multimedia analytics, its applications, and other open problems relevant in this field. Specifically, the key research problems related to this workshop include, but are not limited to: (1) How to accurately recognize events and the details in a particular domain. (2) How to create expert-quality texts based on collections of fine-grained multimedia analytics data in a particular domain.

Call for Papers

This special session welcomes submissions from diverse research domains such as mobility, disaster prevention, manufacturing, healthcare, food computing, and retail. Topics of interests for the research community include, but are not limited to:

  • Multimedia Analytics for a particular domain
  • Transfer learning and Domain Adaptation
  • Multimedia Dataset for a particular domain
  • Multimodal Language Model, Image/Video Captioning, Question Answering
  • Integration of diverse multimedia analytic data and Large Language Model
  • Applications of multimedia data and Large Language Model

Important Dates

The submission deadline is at 11:59 p.m. of the stated deadline date Anywhere on Earth.

  • Call for submission: 1st April 2024
  • Paper submission deadline: 22nd July 2024
  • Notification to authors: 24th September 2024
  • Camera-ready submission: 23rd October 2024

  • Submission link: TBA
    All submissions to this special session must be original works not under review at any other workshop, conference, or journal.
  • Paper format: 12 content pages, including all figures, tables, and appendices, in the Springer LNCS style. Additional 2 pages containing only cited references are allowed.
    All paper submissions must conform to the formatting instructions of Springer Verlag, LNCS series, and must strictly adhere to the submission schedule.
  • Blinding & review: All papers will undergo the same review process and review period to MMM 2025. Submitted papers must conform to the “double-blind” review policy.
Invited Speakers
To be announced.

On-site venue: TBA
Date & time: TBA
Organizing Committee
Session Chair
Satoshi Yamazaki
NEC Corporation, Japan
Takahiro Komamizu
Nagoya University, Japan
Program Committee
Past Events