Q-Learning in Regularized Mean-field Games
In: Dynamic games and applications: DGA
ISSN: 2153-0793
273884 Ergebnisse
Sortierung:
In: Dynamic games and applications: DGA
ISSN: 2153-0793
The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by directly generalizing Q-learning to the multiobjective setting. In this paper we present a modification of MPQ-learning that avoids useless cyclical policies and thus improves the number of training steps required for convergence. ; Supported by: the Spanish Government, Agencia Estatal de Investigaci´on (AEI) and European Union, Fondo Europeo de Desarrollo Regional (FEDER), grant TIN2016-80774-R (AEI/FEDER, UE); and Plan Propio de Investigación de la Universidad de Málaga - Campus de Excelencia Internacional Andalucía Tech.
BASE
In: RAND Journal of Economics, Forthcoming
SSRN
Working paper
In: This is a pre-print of an article published in Communications in Nonlinear Science and Numerical Simulation (2021). The final authenticated version is available online at DOI: doi.org/10.1016/j.cnsns.2021.105805
SSRN
Working paper
In: The Rand journal of economics, Band 52, Heft 3, S. 538-558
ISSN: 1756-2171
AbstractPrices are increasingly set by algorithms. One concern is that intelligent algorithms may learn to collude on higher prices even in the absence of the kind of coordination necessary to establish an antitrust infringement. However, exactly how this may happen is an open question. I show how in simulated sequential competition, competing reinforcement learning algorithms can indeed learn to converge to collusive equilibria when the set of discrete prices is limited. When this set increases, the algorithm considered increasingly converges to supra‐competitive asymmetric cycles. I show that results are robust to various extensions and discuss practical limitations and policy implications.
In: International journal of academic research, Band 4, Heft 4, S. 89-94
ISSN: 2075-7107
In: Amsterdam Law School Research Paper No. 2022-25
SSRN
The electricity markets restructuring process encouraged the use of computational tools in order to allow the study of different market mechanisms and the relationships between the participating entities. Automated negotiation plays a crucial role in the decision support for energy transactions due to the constant need for players to engage in bilateral negotiations. This paper proposes a methodology to estimate bilateral contract prices, which is essential to support market players in their decisions, enabling adequate risk management of the negotiation process. The proposed approach uses an adaptation of the Q-Learning reinforcement learning algorithm to choose the best from a set of possible contract prices forecasts that are determined using several methods, such as artificial neural networks (ANN), support vector machines (SVM), among others. The learning process assesses the probability of success of each forecasting method, by comparing the expected negotiation price with the historic data contracts of competitor players. The negotiation scenario identified as the most probable scenario that the player will face during the negotiation process is the one that presents the higher expected utility value. This approach allows the supported player to be prepared for the negotiation scenario that is the most likely to represent a reliable approximation of the actual negotiation environment. ; This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 703689 (project ADAPT) and No 641794 (project DREAM-GO); NetEfficity Project (P2020 − 18015); and UID/EEA/00760/2013 funded by FEDER Funds through COMPETE pro-gram and by National Funds through FCT. ; info:eu-repo/semantics/publishedVersion
BASE
SSRN
Working paper
SSRN
SSRN
Electricity markets are complex environments, which have been suffering continuous transformations due to the increase of renewable based generation and the introduction of new players in the system. In this context, players are forced to re-think their behavior and learn how to act in this dynamic environment in order to get as much benefit as possible from market negotiations. This paper introduces a new learning model to enable players identifying the expected prices of future bilateral agreements, as a way to improve the decision-making process in deciding the opponent players to approach for actual negotiations. The proposed model introduces a con-textual dimension in the well-known Q-Learning algorithm, and includes a simulated annealing process to accelerate the convergence process. The proposed model is integrated in a multi-agent decision support system for electricity market players negotiations, enabling the experimentation of results using real data from the Iberian electricity market. ; This work has received funding from the European Union's Horizon 2020 research and innovation programme under project DOMINOES (grant agreement No 771066) and from FEDER Funds through COMPETE program and from National Funds through FCT under the project UID/EEA/00760/2019.
BASE
Electricity markets are complex environments, which have been suffering continuous transformations due to the increase of renewable based generation and the introduction of new players in the system. In this context, players are forced to re-think their behavior and learn how to act in this dynamic environment in order to get as much benefit as possible from market negotiations. This paper introduces a new learning model to enable players identifying the expected prices of future bilateral agreements, as a way to improve the decision-making process in deciding the opponent players to approach for actual negotiations. The proposed model introduces a con-textual dimension in the well-known Q-Learning algorithm, and includes a simulated annealing process to accelerate the convergence process. The proposed model is integrated in a multi-agent decision support system for electricity market players negotiations, enabling the experimentation of results using real data from the Iberian electricity market. ; This work has received funding from the European Union's Horizon 2020 research and innovation programme under project DOMINOES (grant agreement No 771066) and from FEDER Funds through COMPETE program and from National Funds through FCT under the project UID/EEA/00760/2019. ; info:eu-repo/semantics/publishedVersion
BASE
Underwater Wireless Sensor Networks (UWSNs) have aroused increasing interest of many researchers in industry, military, commerce and academe recently. Due to the harsh underwater environment, energy efficiency is a significant theme should be considered for routing in UWSNs. Underwater positioning is also a particularly tricky task since the high attenuation of radio-frequency signals in UWSNs. In this paper, we propose an energy-efficient depth-based opportunistic routing algorithm with Q-learning (EDORQ) for UWSNs to guarantee the energy-saving and reliable data transmission. It combines the respective advantages of Q-learning technique and opportunistic routing (OR) algorithm without the full-dimensional location information to improve the network performance in terms of energy consumption, average network overhead and packet delivery ratio. In EDORQ, the void detection factor, residual energy and depth information of candidate nodes are jointly considered when defining the Q-value function, which contributes to proactively detecting void nodes in advance, meanwhile, reducing energy consumption. In addition, a simple and scalable void node recovery mode is proposed for the selection of candidate set so as to rescue packets that are stuck in void nodes unfortunately. Furthermore, we design a novel method to set the holding time for the schedule of packet forwarding base on Q-value so as to alleviate the packet collision and redundant transmission. We conduct extensive simulations to evaluate the performance of our proposed algorithm and compare it with other three routing algorithms on Aqua-sim platform (NS2). The results show that the proposed algorithm significantly improve the performance in terms of energy efficiency, packet delivery ratio and average network overhead without sacrificing too much average packet delay.
BASE