一、任选以下一题的英文完成英译汉1. Online learning with expert advice is a fundamental problem of sequential prediction. In this problem, the algorithm has access to a set of n"experts" who make predictions on each day. The goal on each day is to process these predictions, and make a prediction with the minimum cost. After making a prediction, the algorithm sees the actual outcome on that day, updates its state, and then moves on to the next day. An algorithm is judged by how well it does compared to the best expert in the set.
The classical algorithm for this problem is the multiplicative weights algorithm, which has been well-studied in many fields since as early as the 1950s.Variations of this algorithm have been applied to and optimized for a broad range of problems, including boosting an ensemble of weak-learners in machine leaning, and approximately solving linear and semi-definite programs. However, every application, to our knowledge, relies on storing weights for every expert and uses Ω (n) memory. There is little work on understanding the memory required to solve the online learning with expert advice problem (or run standard sequential prediction algorithms, such as multiplicative weights) in natural streaming models, which is especially important when the number of experts, as well as the number of days on which the experts make predictions, is large.
We initiate the study of the learning with expert advice problem in the streaming setting, and show lower and upper bounds. Our lower bound for i.i.d., random order, and adversarial order streams uses a reduction to a custom-built problem using a novel masking technique, to show a smooth trade-off for regret versus memory. Our upper bounds show novel ways to run standard sequential prediction algorithms in rounds on small "pools" of experts, thus reducing the necessary memory. For random-order streams, we show that our upper bound is tight up to low order terms. We hope that these results and techniques will have broad applications in online learning, and can inspire algorithms based on standard sequential prediction techniques, like multiplicative weights, for a wide range of other problems in the memory-constrained setting.
专家建议的在线学习是序列预测的一个基本问题。在这个问题中,该算法可以访问一组每天进行预测的n位“专家”。每天的目标是处理这些预测,并以最小的成本做出预测。做出预测后,通过算法可以看到当天的实际结果,更新其状态,然后第二天继续。一种算法的优劣是根据它与一组专家中最优专家相比较来判断的。
这一问题的经典算法是乘法权重算法,早在20世纪50年代以来,该算法就在许多领域中得到了广泛的研究。该算法的变体已被应用于广泛的问题,并针对这些问题进行了优化,包括增强机器学习中能力较弱学习者的集合,以及近似求解线性和半正定规划。然而,据我们所知,每个应用程序都依赖于各位专家和用户存储权重Ω(n)内存。在自然流模型中,很少有人致力于了解解决专家建议在线学习问题(或运行标准的序列预测算法,如乘法权重)所需的内存,这在专家数量以及专家进行预测的天数很大时尤为重要。
我们开始研究流媒体环境中的专家建议学习问题,并显示了下限和上限。我们的i.i.d.,随机顺序和对抗顺序流的下限使用了一种新的屏蔽技术,将问题简化为定制问题,以显示遗憾与内存之间的平滑权衡。我们的上限显示了在小“专家池”上循环运行标准序列预测算法的新方法,从而减少了必要的内存。对于随机序流,我们证明了我们的上限紧挨着低阶项。我们希望这些结果和技术将在在线学习中得到广泛的应用,并且可以启发基于标准序列预测技术的算法,如乘法权重,以解决内存受限环境中的各种其他问题。
2. Many recent models in software engineering introduced deep neural models based on the Transformer architecture or use transformer based Pre-trained Language Models(PLM) trained on code. Although these models achieve the state of the arts results in many downstream tasks such as code summarization and bug detection, they are based on Transformer and PLM, which are mainly studied in the Natural Language Processing(NLP) field. The current studies rely on the reasoning and practices from NLP for these models in code, despite the differences between natural languages and programming languages. There is also limited literature on explaining how code is modeled.
Here, we investigate the attention behavior of PLM on code and compare it with natural language. We pre-trained BERT, a Transformer based PLM, on code and explored what kind of information it learns, both semantic and syntactic. We run several experiments to analyze the attention values of code constructs on each other and what BERT learns in each layer. Our analyses show that BERT pays more attention to syntactic entities, specifically identifiers and separators, in contrast to the most attended token [CLS] in NLP. This observation motivated us to leverage identifiers to represent the code sequence instead of the [CLS] token when used for code clone detection. Our results show that employing embeddings from identifiers increases the performance of BERT by 605% and 4% F1-score in its lower layers and the upper layers, respectively. When identifiers' embeddings are used in CodeBERT, a code-based PLM, the performance is improved by 21-24% in the F1-score of clone detection. The findings can benefit the research community by using code-specific representations instead of applying the common embeddings used in NLP, and open new directions for developing smaller models with similar performance.
最近,软件工程中的许多模型引入了基于转换器架构的深度神经模型,或者使用了基于转换器代码的预训练语言模型(PLM)。尽管这些模型在许多下游任务(如代码摘要和错误检测)中取得了最先进的成果,但它们是基于转换器和PLM的,主要在自然语言处理(NLP)领域进行研究。尽管自然语言和编程语言之间存在差异,但目前的研究依赖于NLP对代码中这些模型的推理和实践。解释代码建模方式的文献也是有限的。
在这里,我们研究PLM对代码的专注行为,并将其与自然语言进行比较。我们对BERT(一种基于转换器的PLM)进行了代码预训练,并探索了它学习的信息类型,包括语义和句法。我们进行了几项实验,以分析代码结构对彼此的关注值以及BERT在每层中学到的内容。我们的分析表明,与NLP中最受关注的标记[CLS]相比,BERT更关注句法实体,特别是标识符和分隔符。这一观察促使我们在用于代码克隆检测时利用标识符来表示代码序列,而不是使用[CLS]标记。我们的结果表明,使用标识符的嵌入将BERT的性能在其下层和上层分别提高605%和4%的F1分数。当标识符的嵌入在CodeBERT(一种基于代码的PLM)中使用时,克隆检测的F1分数性能提高了21~24%。这些发现可以通过使用特定代码表示,而不是应用NLP中使用的常见嵌入来造福于研究界,并为开发具有类似性能的较小模型开辟了新方向。
二、任选以下一题的中文完成汉译英1. 机器学习是人工智能及模式识别领域的共同研究热点,其理论和方法已被广泛应用于解决工程应用和科学领域的复杂问题。2010年的图灵奖获得者为哈佛大学的Leslie Valliant教授,其获奖工作之一是建立了概率近似正确(Probably Approximate Correct,PAC)学习理论;2011年的图灵奖获得者为加州大学洛杉矶分校的Judea Pearl教授,其主要贡献为建立了以概率统计为理论基础的人工智能方法。这些研究成果都促进了机器学习的发展和繁荣。
机器学习是研究怎样使用计算机模拟或实现人类学习活动的科学,是人工智能中最具智能特征、最前沿的研究领域之一。自20世纪80年代以来,机器学习作为实现人工智能的途径,在人工智能界引起了广泛的兴趣,特别是近十几年来,机器学习领域的研究工作发展很快,它已成为人工智能的重要课题之一。机器学习不仅在基于知识的系统中得到了应用,而且在自然语言理解、非单调推理、机器视觉、模式识别等许多领域也得到了广泛应用。一个系统是否具有学习能力已成为是否具有“智能”的一个标志。机器学习的研究主要分为两类研究方向:第一类是传统机器学习的研究,该类研究主要是研究学习机制,注重探索模拟人的学习机制:第二类是大数据环境下机器学习的研究,该类研究主要是研究如何有效利用信息,注重从巨量数据中获取隐藏的、有效的、可理解的知识。
Machine learning is a common research hot spot in the field of artificial intelligence and pattern recognition. Its theory and methods have been widely used to solve complex problems in engineering applications and scientific fields. The winner of the Turing Prize in 2010 was Professor Leslie Valliant of Harvard University. One of his award-winning works was to establish the Probably Approximate Correct (PAC) learning theory. The winner of Turing Prize in 2011 is Professor Judea Pearl of UCLA, whose main contribution is to establish artificial intelligence methods based on probability and statistics. These research results have promoted the development and prosperity of machine learning.
Machine learning is a science that studies how to use computers to simulate or realize human learning activities. It is one of the most intelligent and cutting-edge research fields in artificial intelligence. Since the 1980s, machine learning, as a way to realize artificial intelligence, has aroused extensive interest in the field of artificial intelligence. Especially in the past decade, the research work in the field of machine learning has developed rapidly, and it has become one of the important topics of artificial intelligence. Machine learning is not only used in knowledge-based systems, but also widely used in natural language understanding, non monotonic reasoning, machine vision, pattern recognition and many other fields. Whether a system has learning ability has become a sign of whether it has "intelligence". The research on machine learning mainly falls into two categories: the first category is the research on traditional machine learning, which focuses on studying learning mechanisms and exploring the learning mechanisms of human simulation; the second category is the research on machine learning in the big data environment, which focuses on how to effectively use information and how to obtain hidden, effective and understandable knowledge from huge amounts of data.
2. 虽然网络类型的划分标准各种各样,但是从地理范围划分是一种大家都认可的通用网络划分标准。按这种标准可以把各种网络类型划分为局域网、城域网、广域网和互联网四种。
局域网(Local Area Network,LAN)通常我们常见的“LAN”就是指局域网,这是我们最常见、应用最广的一种网络。局域网随着整个计算机网络技术的发展和提高得到了充分的应用和普及,几乎每个单位都有自己的局域网,甚至有的家庭中都有自己的小型局域网。很明显,所谓局域网,就是在局部地区范围内的网络,它所覆盖的地区范围较小。局域网在计算机数量配置上没有太多的限制,少的可以只有两台,多的可达几百台。
城域网(Metropolitan Area Network,MAN)这种网络一般来说是在一个城市,但不在同一地理小区范围内的计算机互联。这种网络的连接距离可以在10~100公里,它采用的是IEEE802.6标准。城域网多采用ATM技术做骨干网。ATM是一种用于数据、语音、视频以及多媒体应用程序的高速网络传输方法。ATM包括一个接口和一个协议,该协议能够在一个常规的传输信道上,在比特率不变及变化的通信量之间进行切换。
广域网(Wide Area Network,WAN)这种网络也称为远程网,所覆盖的范围比城域网(MAN)更广,它一般是在不同城市之间的LAN或者MAN网络互联,地理范围可从几百公里到几千公里。
Although there are various standards for classifying network types, the geographical scope division is a universal network classification standard that is recognized by everyone. According to this standard, various network types can be divided into LAN, MAN, WAN and Internet.
LocaI Area Network (LAN) Generally, our common "LAN" refers to the LAN, which is the most common and widely used network. With the development and improvement of the whole computer network technology, LAN has been fully applied and popularized. Almost every unit has its own LAN, and even some families have their own small LAN. Obviously, the so-called LAN is a network in a local area, which covers a smaller area. There is not too much limit on the number of computers configured in the LAN. The number of computers can be as few as two and as many as hundreds.
Metropolitan Area Network (MAN) This network refers to the interconnection of computers in a city but not in the same geographical cell. The connection distance of this network can be 10-100kin, and it adopts IEEE 802.6 standard. MAN mostly adopts ATM technology as the backbone network. ATM is a high-speed network transmission method for data, voice, video and multimedia applications. ATM includes an interface and a protocol, which can switch between traffic with constant bit rate and variable bit rate on a conventional transmission channel.
The Wide Area Network (WAN) It's also known as the remote network, which covers a wider area than the metropolitan area network (MAN). It is generally interconnected between LANs or MAN networks in different cities, with a geographic range of hundreds to thousands of kilometers.