Three PhD students from the Department of Computer and Information Science at the Faculty of Science and Technology, University of Macau, have had their papers accepted at the 32nd ACM International Conference on Multimedia 2024. ACM Multimedia is a prestigious conference in the field of multimedia research, recognized as a top-tier conference by the China Computer Federation. This year, the acceptance rate for the conference was 26.2%, highlighting the University of Macau’s achievements and innovation in multimedia research.
The three students whose papers were accepted are Guo Xiaojiao, Han Wencheng, and Liao Haicheng. Guo Xiaojiao, under the guidance of her advisor Pun Chi Man, completed a paper titled “Dual-Hybrid Attention Network for Specular Highlight Removal.” This research developed a new deep learning model called the Dual-Hybrid Attention Network (DHAN-SHR), aimed at removing specular highlights in image processing to improve the quality and interpretability of images and videos. This ultimately enhances the performance of downstream tasks such as content-based retrieval, object recognition, and scene understanding. DHAN-SHR introduces a hybrid attention mechanism to effectively capture features at different scales and regions in images without relying on additional prior knowledge or manual annotations. Extensive experiments have demonstrated that DHAN-SHR outperforms 18 state-of-the-art methods both quantitatively and qualitatively, setting a new benchmark for related technologies.
Another paper, completed by Han Wencheng under the guidance of Shen Jianbing, is titled “Prior Metadata-Driven RAW Reconstruction: Eliminating the Need for Per-Image Metadata.” This research focuses on innovations in RAW image processing technology. It proposes a RAW image reconstruction method based on prior metadata. By utilizing pre-extracted reference image metadata, it achieves efficient and high-quality RAW image reconstruction without relying on individual image metadata. This technology simplifies the RAW image processing workflow through a three-stage process—pixel matching, data compression, and image reconstruction—while maintaining higher reconstruction quality compared to traditional methods. The application of this technology significantly reduces storage requirements and transmission burdens in digital photography, heralding a new era for RAW image processing.
The third paper, completed by Liao Haicheng under the guidance of Xu Chengzhong and Li Zhenning, is titled “When, Where, and What? A Benchmark for Accident Anticipation and Localization with Large Language Models.” This research proposes a novel accident identification-localization-feedback system, extending accident identification from the traditional “When” and “What” dimensions to include the “Where” dimension. By integrating multimodal large models to analyze cross-modal semantic information in complex traffic scenarios, the system can accurately identify the time, location, and involved objects before an accident occurs, and use large language models to timely warn passengers of potential accidents. The research team designed a chain attention mechanism (DOA) combined with a Markov chain noise model, dynamically optimizing and updating feature representations through dynamic routing to focus on high-risk objects in multi-agent scenarios. Extensive real-world experiments have shown that this system excels in key metrics such as timeliness of accident identification and accuracy of localization, setting a new benchmark for autonomous driving safety and human-machine interaction.
These research achievements by the three PhD students reflect the University of Macau’s strength and accomplishments in the field of multimedia research. Not only have they gained recognition in the international academic community, but they have also driven innovation and application in multimedia technology, contributing to technological advancements and industry upgrades in image processing, video editing, and object recognition. In the future, the University of Macau will continue to foster internationally competitive research talent through an internationalized research environment and faculty team, injecting new momentum into the development of academia and industry.
澳門大學科技學院電腦及資訊科學系三位博士生論文獲第32屆國際多媒體大會(ACM International Conference on Multimedia 2024)錄取。ACM Multimedia作為多媒體研究領域的權威會議,在該領域首屈一指的會議,享有崇高的聲譽,獲中國計算機學會評定為A類會議。國際多媒體大會今年錄取率為26.2%,突顯了澳門大學在多媒體研究領域的育人成果與研究創新性。
論文錄取的三位學生分別是過曉嬌、韓文程以及廖海成。其中過曉嬌同學在其導師潘治文的指導下,完成題為“用於去除鏡面高光的雙重混合注意力網路“ (Dual-Hybrid Attention Network for Specular Highlight Removal)的論文。該研究開發了一種全新的深度學習模型名為雙重混合注意力網絡 (Dual-Hybrid Attention Network, DHAN-SHR),旨在去除圖像處理中的鏡面高光,提高圖像和視頻的質量和解釋度,最終改善基於內容的檢索、物體識別和場景理解等下游任務的性能。DHAN-SHR通過引入混合注意力機制,實現了對圖像中不同尺度和區域特徵的有效捕捉,無需依賴額外的先驗知識或人工標註。經過廣泛的實驗驗證,證明了DHAN-SHR 在定量和定性方面均優於18種最先進的方法,為相關技術的發展樹立了新的標杆。
另一篇在沈建冰指導下,由韓文程完成的論文題目為“Prior Metadata-Driven RAW Reconstruction: Eliminating the Need for Per-Image Metadata”聚焦於RAW圖像處理技術的革新。研究提出了一種基於先驗元數據的RAW圖像重建方法。通過利用預先提取的參考圖像元數據,無需依賴每張圖像單獨的元數據,實現了高效高質的RAW圖像重建。此技術通過三階段流程——像素匹配、數據壓縮以及圖像重建,大幅簡化了RAW圖像處理過程,同時保持了與傳統方法更高的重建質量。這項技術的應用有助顯著減少數碼攝影中的存儲需求和傳輸負擔,為RAW圖像處理開創新時代。
第三篇論文廖海成在在須成忠和李振寧的指導下,廖海成完成了題為《 When, Where, and What? A Benchmark for Accident Anticipation and Localization with Large Language Models 》的論文。 該研究提出了全新的事故識別-定位 – 反饋系統,開創性地將事故識別從傳統的 “何時”( When )和 “何事”( What )擴展至 “何地”(Where)層面。通過整合多模態大模型對複雜交通場景的跨模態語義信息進行分析,系統能夠在事故發生前準確識別時間、地點並定位涉事對象,並利用大語言模型及時向乘客預警潛在事故。此外,研究團隊設計了一種鏈式注意力機制(DOA),結合馬爾可夫鏈噪聲模型,通過動態路由不斷優化和更迭特徵表達,引導系統重點關注多智能體場景中的高風險對象,從而增強了系統對動態駕駛情境的理解和應對能力。大量現實世界場景實驗表明,該系統在事故識別的及時性和定位的準確性等關鍵指標上表現出色,為自動駕駛安全與人機交互領域樹立了新的標杆。
這三位博士生的研究成果反映了澳門大學在多媒體研究領域的實力及成果。不僅在國際學術界獲得肯定,還推動了多媒體技術的創新和應用,為圖像處理、視頻編輯、物體識別等行業的技術進步和產業升級作出貢獻。未來,澳門大學將繼續通過國際化的科研環境以及師資團隊、培養具國際競爭力的科研人才,為學術界和產業界的發展注入了新動力。