The University of Macau (UM), Xu Huanle, Assistant Professor of FST, and their research team, have made significant breakthroughs in the field of cloud computing. The team has designed an innovative resource management system called Erms, which addresses the challenges posed by large-scale shared microservices in artificial intelligence applications. Erms is the world’s first system to dynamically optimize resource scaling for shared microservices, while also implementing priority scheduling in microservice scenarios.
The system uses piecewise linear functions to model the performance of microservices, modeling the latency of the microservices as a piecewise linear function with variables such as workload, resource occupancy, and system interference; it also uses a depth-first traversal algorithm to simplify the call dependencies of microservices to ensure the scalability of the system. Relying on this model, Erms conducts global optimization to precisely set latency targets for those intricately related microservices. Erms has also cleverly designed a new set of scheduling strategies to optimize the resource allocation of shared microservices, significantly improving the efficiency of resource use.
Compared to existing microservice systems, Erms can reduce the possibility of SLA violations to one-fifth of the original risk and save nearly 1.6 times the CPU resources. The paper is also the first to completely address the research work of microservice multiplexing scenarios, which is of great inspirational significance for subsequent in-depth research on cloud-native systems. The team’s paper “Optimizing Resource Management for Shared Microservices: A Scalable System Design” was recently published in the top-tier computer science journal ACM Transactions on Computer Systems (ToCS). This is the 10th article published by authors from China (including Hong Kong and Macau) since the journal’s inception in 1983, and it is the first paper from the Guangdong-Hong Kong-Macau Greater Bay Area. ToCS enjoys a prestigious reputation in the field of computer systems, with many milestone achievements in computer operating systems, databases, and distributed systems first published in this journal.
The paper’s success follows Prof Xu Cheng-Zhong’s team’s 2021 publication “Characterizing Dependence and Performance of Microservices,” another significant work which was presented at the premier international conference on cloud computing, the ACM Symposium on Cloud Computing, and won the conference’s only Best Paper award for that year. This award was also the first for a scholar from China (including Hong Kong, Macau, and Taiwan) since the conference’s inception in 2009. Mei Hong, the Chairman of the China Computer Federation and the Director of the Academic Committee of the State Key Laboratory of Internet of Things for Smart City and Academician of Chinese Academy of Science, believes that the publication of these two important papers marks UM’s research in cloud computing as having entered an internationally leading level.
Both of these important papers are the result of the team’s collaboration with the Shenzhen Institutes of Advanced Technology of the Chinese Academy of Sciences and the global leading international cloud computing company Alibaba. The first author of the papers, Luo Shutian, is a doctoral graduate co-trained by UM and the Chinese Academy of Sciences (now a postdoctoral fellow at Yale University), with Xu Cheng-Zhong and Xu Huanle serving as the corresponding authors. The research work has been continuously funded by Alibaba’s “Innovative Research Program” for five years and received the Alibaba Outstanding Project Cooperation Award in 2022. The work was also supported by the Macao Science and Technology Development Fund(0024/2022/A1), the Ministry of Science and Technology’s Key Research and Development Program(No.2019YFB2102100), and the Guangdong Province Key Research and Development Program(NO.2020B010164003).
The link to the preprint of the paper: https://doi.org/10.1145/3631607
澳大科技學院須成忠講座教授及助理教授徐歡樂團隊在雲計算領域取得突破性研究進展。團隊提出了一個創新的資源管理系統方案,以應對微服務大規模應用的挑戰。研究成果獲發表於計算機頂級期刊 ACM Transactions on Computer Systems (ToCS)上,該期刊在計算機系統領域享有崇高聲譽,眾多計算機操作系統、網絡、数据庫和分佈式系統中的里程碑式重要成果都是在該期刊发表,自創刊40年以來僅收錄極少量中國發表的文章,是次是粵港澳大灣區的首篇文章。
人工智慧領域正經歷日新月異的科技變革,面對海量的計算資源需求,最大化計算資源的利用效率,支持更多計算量成為雲計算業界的挑戰。而提高計算資源利用效率,關鍵在於高效的資源管理和調度。針對人工智慧應用帶來的大規模新型共用微服務,澳大團隊首次設計了全新資源管理系統「Erms」根據實際的工作量對資源進行動態調配和資源伸縮優化,並第一次在微服務場景下實現優先順序調度。系統採用分段線性函數對微服務的性能進行建模,將微服務的延遲建模為一個以工作負載、資源佔用量及系統干擾為變數的分段線性函數 ; 同時採用深度優先遍曆演算法簡化微服務的調用依賴,以保證系統的可擴展性。依託於該模型,Erms採用全局優化手段來精確設定那些關係錯綜複雜的微服務的延遲目標。 此外,還巧妙設計了一套新的調度策略,用以優化共用微服務的資源配置,大大提升了資源使用的效率。與現有微服務系統相比,Erms能將SLA違規的可能性降低到原來的五分之一,並節約CPU資源近1.6倍。其研究論文《優化共享微服務的資源管理:可擴展的系統設計》“Optimizing Resource Management for Shared Microservices: A Scalable System Design”是第一個完整解決微服務複用場景的研究工作,對雲原生系統的後續深入研究有重大啟發意義。
論文成果是須成忠研究團隊在2021年發表的《描述微服務依賴和性能: 阿里巴巴溯源分析》“Characterizing Dependence and Performance of Microservices” 後的又一佳作,該成果發表在國際計算機協會雲計算頂級會議ACM Symposium on Cloud Computing 並獲2021年唯一最佳論文獎。 該獎也是會議自2009年舉辦以來,首次由中國學者(含港澳台)獲得。中國計算機學會理事長、智慧城市物聯網國家重點實驗室學術委員會主任梅宏院士認為“在智慧城市和人工智能大模型浪潮中,雲計算起到核心使能作用. 這兩篇重要論文的發表標誌澳門大學在雲計算方面的研究進入國際領先水準”。
這兩篇論文都是團隊與中國科學院深圳先進技術研究院和雲計算國際龍頭企業阿裡巴巴公司的合作成果,論文第一作者羅樹添是澳門大學與中國科學院聯合培養的博士畢業生(現為耶魯大學博士後),須成忠與徐歡樂是共同通訊作者。論文研究工作連續五年獲阿裡巴巴「創新研究計劃」資助,並於2022年獲阿裡巴巴優秀項目合作獎。研究獲得了澳門科技基金(0024/2022/A1),科技部重點研發計劃(No.2019YFB2102100)和廣東省重點研發計劃資助(NO.2020B010164003)。
可於此閱讀論文內容:https://doi.org/10.1145/3631607