Contextual Bandit | Pollux - Fachinformationsdienst Politikwissenschaft

Filter

Format

23 Ergebnisse

Sortierung:

Aufsatz(elektronisch)#12021

Smoothness-Adaptive Contextual Bandits

In: Operations Research 70 (6), 3198–3216

Gur, Yonatan; Momeni, Ahmadreza; Wager, Stefan

Open Access

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

SSRN

Exportieren

Open Access#22015

Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits ; Contributions aux bandits manchots : gestion du risque et sous-échantillonnage pour les bandits contextuels linéaires

Galichet, Nicolas

This thesis focuses on sequential decision making in unknown environment, and more particularly on the Multi-Armed Bandit (MAB) setting, defined by Lai and Robbins in the 50s. During the last decade, many theoretical and algorithmic studies have been aimed at cthe exploration vs exploitation tradeoff at the core of MABs, where Exploitation is biased toward the best options visited so far while Exploration is biased toward options rarely visited, to enforce the discovery of the the true best choices. MAB applications range from medicine (the elicitation of the best prescriptions) to e-commerce (recommendations, advertisements) and optimal policies (e.g., in the energy domain). The contributions presented in this dissertation tackle the exploration vs exploitation dilemma under two angles. The first contribution is centered on risk avoidance. Exploration in unknown environments often has adverse effects: for instance exploratory trajectories of a robot can entail physical damages for the robot or its environment. We thus define the exploration vs exploitation vs safety (EES) tradeoff, and propose three new algorithms addressing the EES dilemma. Firstly and under strong assumptions, the MIN algorithm provides a robust behavior with guarantees of logarithmic regret, matching the state of the art with a high robustness w.r.t. hyper-parameter setting (as opposed to, e.g. UCB (Auer 2002)). Secondly, the MARAB algorithm aims at optimizing the cumulative 'Conditional Value at Risk' (CVar) rewards, originated from the economics domain, with excellent empirical performances compared to (Sani et al. 2012), though without any theoretical guarantees. Finally, the MARABOUT algorithm modifies the CVar estimation and yields both theoretical guarantees and a good empirical behavior. The second contribution concerns the contextual bandit setting, where additional informations are provided to support the decision making, such as the user details in the ontent recommendation domain, or the patient history in the medical domain. The study focuses on how to make a choice between two arms with different numbers of samples. Traditionally, a confidence region is derived for each arm based on the associated samples, and the 'Optimism in front of the unknown' principle implements the choice of the arm with maximal upper confidence bound. An alternative, pioneered by (Baransi et al. 2014), and called BESA, proceeds instead by subsampling without replacement the larger sample set. In this framework, we designed a contextual bandit algorithm based on sub-sampling without replacement, relaxing the (unrealistic) assumption that all arm reward distributions rely on the same parameter. The CL-BESA algorithm yields both theoretical guarantees of logarithmic regret and good empirical behavior. ; Cette thèse s'inscrit dans le domaine de la prise de décision séquentielle en environnement inconnu, et plus particulièrement dans le cadre des bandits manchots (multi-armed bandits, MAB), défini par Robbins et Lai dans les années 50. Depuis les années 2000, ce cadre a fait l'objet de nombreuses recherches théoriques et algorithmiques centrées sur le compromis entre l'exploration et l'exploitation : L'exploitation consiste à répéter le plus souvent possible les choix qui se sont avérés les meilleurs jusqu'à présent. L'exploration consiste à essayer des choix qui ont rarement été essayés, pour vérifier qu'on a bien identifié les meilleurs choix. Les applications des approches MAB vont du choix des traitements médicaux à la recommandation dans le contexte du commerce électronique, en passant par la recherche de politiques optimales de l'énergie. Les contributions présentées dans ce manuscrit s'intéressent au compromis exploration vs exploitation sous deux angles spécifiques. Le premier concerne la prise en compte du risque. Toute exploration dans un contexte inconnu peut en effet aboutir à des conséquences indésirables ; par exemple l'exploration des comportements d'un robot peut aboutir à des dommages pour le robot ou pour son environnement. Dans ce contexte, l'objectif est d'obtenir un compromis entre exploration, exploitation, et prise de risque (EER). Plusieurs algorithmes originaux sont proposés dans le cadre du compromis EER. Sous des hypothèses fortes, l'algorithme MIN offre des garanties de regret logarithmique, à l'état de l'art ; il offre également une grande robustesse, contrastant avec la forte sensibilité aux valeurs des hyper-paramètres de e.g. (Auer et al. 2002). L'algorithme MARAB s'intéresse à un critère inspiré de la littérature économique(Conditional Value at Risk), et montre d'excellentes performances empiriques comparées à (Sani et al. 2012), mais sans garanties théoriques. Enfin, l'algorithme MARABOUT modifie l'estimation du critère CVaR pour obtenir des garanties théoriques, tout en obtenant un bon comportement empirique. Le second axe de recherche concerne le bandit contextuel, où l'on dispose d'informations additionnelles relatives au contexte de la décision ; par exemple, les variables d'état du patient dans un contexte médical ou de l'utilisateur dans un contexte de recommandation. L'étude se focalise sur le choix entre bras qu'on a tirés précédemment un nombre de fois différent. Le choix repose en général sur la notion d'optimisme, comparant les bornes supérieures des intervalles de confiance associés aux bras considérés. Une autre approche appelée BESA, reposant sur le sous-échantillonnage des valeurs tirées pour les bras les plus visités, et permettant ainsi de se ramener au cas où tous les bras ont été tirés un même nombre de fois, a été proposée par (Baransi et al. 2014).

Open Access

BASE

Exportieren

Aufsatz(elektronisch)#3

Contextual Combinatorial Conservative Bandits

In: TCS-D-23-01119

Zhang, Xiaojin; Liu, Weiwen; Li, Shuai; Zhang, Shengyu

Open Access

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

SSRN

Exportieren

Aufsatz(elektronisch)#426. April 2024

FuzzyBandit An Autonomous Personalized Model Based on Contextual Multi Arm Bandits Using Explainable AI

In: Defence science journal: DSJ, Band 74, Heft 4, S. 496-504

Sharma, Kapil; Bala, Manju; Bansal, Nipun

ISSN: 0011-748X

In the era of artificial cognizance, context-aware decision-making problems have attracted significant attention. Contextual bandit addresses these problems by solving the exploration versus exploitation dilemma faced to provide customized solutions as per the user's liking. However, a high level of accountability is required, and there is a need to understand the underlying mechanism of the black box nature of the contextual bandit algorithms proposed in the literature. To overcome these shortcomings, an explainable AI (XAI) based FuzzyBandit model is proposed, which maximizes the cumulative reward by optimizing the decision at each trial based on the rewards received in previous observations and, at the same time, generates explanations for the decision made. The proposed model uses an adaptive neuro-fuzzy inference system (ANFIS) to address the vague nature of arm selection in contextual bandits and uses a feedback mechanism to adjust its parameters based on the relevance and diversity of the features to maximize reward generation. The FuzzyBandit model has also been empirically compared with the existing seven most popular art of literature models on four benchmark datasets over nine criteria, namely recall, specificity, precision, prevalence, F1 score, Matthews Correlation Coefficient (MCC), Fowlkes–Mallows index (FM), Critical Success Index (CSI) and accuracy.

Open Access

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

Exportieren

Aufsatz(elektronisch)#52023

Bandits for Algorithmic Trading with Signals

Cartea, Álvaro; Drissi, Fayçal; Osselin, Pierre

Open Access

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

SSRN

Exportieren

Aufsatz(elektronisch)#62020

Spoiled for Choice? Personalized Recommendation for Healthcare Decisions: A Multi-Armed Bandit Approach

In: Kelley School of Business Research Paper No. 19-25

Zhou, Tongxin; Wang, Yingfei; Yan, Lu (Lucy); Tan, Yong

Open Access

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

SSRN

Working paper

Exportieren

Aufsatz(elektronisch)#72016

Feature-Based Dynamic Pricing

Cohen, Maxime C.; Lobel, Ilan; Paes Leme, Renato

Open Access

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

SSRN

Exportieren

Aufsatz(elektronisch)#82024

Incentivized Exploration via Filtered Posterior Sampling

Kalvit, Anand; Slivkins, Aleksandrs; Gur, Yonatan

Open Access

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

SSRN

Exportieren

Aufsatz(elektronisch)#92020

Cold Start to Improve Market Thickness on Online Advertising Platforms: Data-Driven Algorithms and Field Experiments

Ye, Zikun; Zhang, Dennis; Zhang, Heng; Zhang, Renyu (Philip); Chen, Xin; Xu, Zhiwei

Open Access

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

SSRN

Working paper

Exportieren

Aufsatz(elektronisch)#102020

Data-driven Hospital Admission Control: A Learning Approach

In: Mohammad Zhalechian, Esmaeil Keyvanshokooh, Cong Shi, Mark P. Van Oyen (2023) Data-Driven Hospital Admission Control: A Learning Approach. Operations Research 0(0).

Zhalechian, Mohammad; Keyvanshokooh, Esmaeil; Shi, Cong; Van Oyen, Mark P.

Open Access

Verfügbarkeit an Ihrem Standort wird überprüft

Dieser Artikel ist auch in Ihrer Bibliothek verfügbar: |

elektronisch

gedruckt

SSRN

Working paper

Exportieren

Aufsatz(elektronisch)#112020