The neural-based machine translation (UM-MT) systems, developed by the Natural Language Processing and Portuguese-Chinese Machine Translation (NLP2CT) Laboratory of the University of Macau (UM), recently won the first, second, third, and fifth prizes in the constraint English-to-Chinese machine translation campaign organised under the 13th China Workshop on Machine Translation (CWMT 2017).
This year’s competition received a total of 73 submissions from 18 companies and universities, including Sogou, Toshiba, Beihang University, Xiamen University, and the Chinese Academy of Sciences. The translation tasks of this evaluation campaign involved six language pairs, namely English-to/from-Chinese, Mongolian-to-Chinese, Uyghur-to-Chinese, Tibetan-to-Chinese, and Japanese-to-Chinese, in four translation domains, including news, patents, daily expressions, and government documents. The English-to/from-Chinese translation tasks were co-organised by CWMT2017 and the International Conference on Machine Translation (WMT 2017).
Under the supervision of Associate Professor Derek Wong and Assistant Professor Lidia Chao from the FST, three systems from UM won the top three prizes in the constraint category.
In addition, compared to other systems that were run on large datasets (25 million sentences) provided by both CWMT and WMT, UM’s systems were run on a small dataset (9 million sentences) provided by CWMT only and won the second, third, and fifth prizes. The first prize went to Sogou.
During the conference, Um2T, UM’s online interactive Portuguese-Chinese machine translation system, received positive feedback and recognition from other participants. Um2T is based on the state-of-the-art neural machine translation architecture and technology and has many advanced features. This system is now available online at http://nlp2ct.cis.umac.mo/NMT/ for public use.
The NLP2CT lab places a great emphasis on the training of students in theoretical research and engineering practice. The research achievements have been published in a number of top international journals and at international conferences, including journals and conferences of the Institute of Electrical and Electronics Engineers/Association for Computing Machinery (IEEE/ACM), and the Association for Computational Linguistics (ACL), as well as the Conference on Empirical Methods in National Language Processing (EMNLP), and the International Conference on Computational Linguistics (COLING). The models and algorithms developed by UM have achieved good rankings in many national and international evaluation campaigns, including second places in English and Chinese news translation campaign of CWMT 2015 and first places in English-to-German, Czech-to-English and French-to-English medical translation tasks of WMT 2014.
澳門大學自然語言處理與中葡機器翻譯(NLP2CT)實驗室研發的多部“神經機器翻譯系統"(UM-MT)於第13屆全國機器翻譯研討會(CWMT 2017)主辦的“英中機器翻譯評測"大賽中突圍而出,奪“受限語料"組別冠、亞、季軍及第五名。
今屆機器翻譯系統評測大賽,吸引18家企業和高校參與,如搜狗、東芝、北京航空航天大學、廈門大學、中國科學院等,共提交了73套系統參加是次大賽。評測共分6種語言對(英漢、漢英、蒙漢、維漢、藏漢、日漢)和4個領域(新聞、專利、日常用語和政府文獻)。其中,英漢、漢英翻譯評測,由CWMT 2017與第二屆機器翻譯國際會議(WMT 2017)共同組織。
在澳大科技學院副教授黃輝和助理教授周沁指導下,澳大研究團隊提交了多套基於神經網絡的機器翻譯系統;其中三套系統僅使用了大會提供的900萬句“平行語料"作為其訓練語料,最終包攬了“受限語料"組別的前三名。
另外,相比其他單位使用了大會提供共2,500萬句“平行語料"訓練的系統,澳大僅使用“小語料訓練的翻譯系統依然於總體賽事獲得第二、第三、及第五名的好成績,僅次於排在首位使用大訓練語料的搜狗翻譯系統。
在今屆機器翻譯研討會產品展示期間,由澳大自主研發的“Um2T中葡在線神經機器翻譯系統"受到了廣泛關注與好評。Um2T採用最新的神經網絡架構與技術,並結合實驗室已有的中葡機器翻譯經驗開發,同時集成多項創新功能。目前該系統已開放上線http://nlp2ct.cis.umac.mo/NMT/,供社會與學術界使用。
澳大自然語言處理與中葡機器翻譯實驗室一直重視學生在理論研究與工程實踐的培養,不僅多次在IEEE/ACM、ACL、EMNLP、COLING等國際頂級學術會議與期刊中發表文章,同時將其學術成果帶到應用實踐中,在歷年機器翻譯評測比賽中均獲優異成績,包括CWMT 2015漢英、英漢第二名,WMT2014英德、捷英、法英第一名等。