scispace - formally typeset
Search or ask a question

Showing papers on "Decision problem published in 2020"


Journal ArticleDOI
TL;DR: This paper defines some new operational laws by Dombi t-norm and t-conorm and develops an algorithm by using spherical fuzzy set information in decision-making matrix that is suitable and effective for decision process to evaluate their best alternative.
Abstract: Spherical fuzzy sets (SFSs), recently proposed by Ashraf, is one of the most important concept to describe the fuzzy information in the process of decision making. In SFSs the sum of the squares of memberships grades lies in close unit interval and hence accommodate more uncertainties. Thus, this set outperforms over the existing structures of fuzzy sets. In real decision making problems, there is often a treat regarding a neutral character towards the membership and non-membership degrees expressed by the decision-makers. To get a fair decision during the process, in this paper, we define some new operational laws by Dombi t-norm and t-conorm. In the present study, we propose Spherical fuzzy Dombi weighted averaging (SFDWA), Spherical fuzzy Dombi ordered weighted averaging (SFDOWA), Spherical fuzzy Dombi hybrid weighted averaging (SFDHWA), Spherical fuzzy Dombi weighted geometric (SFDWG), Spherical fuzzy Dombi ordered weighted geometric (SFDOWG) and Spherical fuzzy Dombi hybrid weighted geometric (SFDHWG) aggregation operators and discuss several properties of these aggregation operators. These aforesaid operators are enormously used to help a successful solution of the decision problems. Then an algorithm by using spherical fuzzy set information in decision-making matrix is developed and applied the algorithm to decision-making problem to illustrate its applicability and effectiveness. Through this algorithm, we proved that our proposed approach is practical and provides decision makers a more mathematical insight before making decisions on their options. Besides this, a systematic comparison analysis with other existent methods is conducted to reveal the advantages of our method. Results indicate that the proposed method is suitable and effective for decision process to evaluate their best alternative.

118 citations


Journal ArticleDOI
TL;DR: Q-learning, originally an incremental algorithm for estimating an optimal decision strategy in an infinite-horizon decision problem, now refers to a general class of reinforcement learning methods.
Abstract: Q-learning, originally an incremental algorithm for estimating an optimal decision strategy in an infinite-horizon decision problem, now refers to a general class of reinforcement learning methods ...

94 citations


Journal ArticleDOI
TL;DR: A healthcare device selection problem is analyzed from the perspectives of clinicians, biomedical engineers, and healthcare investors and a comparison of the results derived from different multi-criteria solution processes is presented.

90 citations


Journal ArticleDOI
03 Apr 2020
TL;DR: A novel first-order policy optimization method is proposed, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method, which can handle general types of cumulative multi-constraint settings.
Abstract: In this paper, we study reinforcement learning (RL) algorithms to solve real-world decision problems with the objective of maximizing the long-term reward as well as satisfying cumulative constraints. We propose a novel first-order policy optimization method, Interior-point Policy Optimization (IPO), which augments the objective with logarithmic barrier functions, inspired by the interior-point method. Our proposed method is easy to implement with performance guarantees and can handle general types of cumulative multi-constraint settings. We conduct extensive evaluations to compare our approach with state-of-the-art baselines. Our algorithm outperforms the baseline algorithms, in terms of reward maximization and constraint satisfaction.

85 citations


Journal ArticleDOI
03 Apr 2020
TL;DR: This work enables decision-focused learning for the broad class of problems that can be encoded as a Mixed Integer Linear Program (MIP), hence supporting arbitrary linear constraints over discrete and continuous variables.
Abstract: Machine learning components commonly appear in larger decision-making pipelines; however, the model training process typically focuses only on a loss that measures average accuracy between predicted values and ground truth values. Decision-focused learning explicitly integrates the downstream decision problem when training the predictive model, in order to optimize the quality of decisions induced by the predictions. It has been successfully applied to several limited combinatorial problem classes, such as those that can be expressed as linear programs (LP), and submodular optimization. However, these previous applications have uniformly focused on problems with simple constraints. Here, we enable decision-focused learning for the broad class of problems that can be encoded as a mixed integer linear program (MIP), hence supporting arbitrary linear constraints over discrete and continuous variables. We show how to differentiate through a MIP by employing a cutting planes solution approach, an algorithm that iteratively tightens the continuous relaxation by adding constraints removing fractional solutions. We evaluate our new end-to-end approach on several real world domains and show that it outperforms the standard two phase approaches that treat prediction and optimization separately, as well as a baseline approach of simply applying decision-focused learning to the LP relaxation of the MIP. Lastly, we demonstrate generalization performance in several transfer learning tasks.

81 citations


Journal ArticleDOI
Andre Barreto, Shaobo Hou, Diana Borsa, David Silver, Doina Precup1 
TL;DR: It is argued that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel, and associating each task with a reward function can be seamlessly accommodated within the standard reinforcement-learning formalism.
Abstract: The combination of reinforcement learning with deep learning is a promising approach to tackle important sequential decision-making problems that are currently intractable One obstacle to overcome is the amount of data needed by learning systems of this type In this article, we propose to address this issue through a divide-and-conquer approach We argue that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel By associating each task with a reward function, this problem decomposition can be seamlessly accommodated within the standard reinforcement-learning formalism The specific way we do so is through a generalization of two fundamental operations in reinforcement learning: policy improvement and policy evaluation The generalized version of these operations allow one to leverage the solution of some tasks to speed up the solution of others If the reward function of a task can be well approximated as a linear combination of the reward functions of tasks previously solved, we can reduce a reinforcement-learning problem to a simpler linear regression When this is not the case, the agent can still exploit the task solutions by using them to interact with and learn about the environment Both strategies considerably reduce the amount of data needed to solve a reinforcement-learning problem

76 citations


Journal ArticleDOI
05 May 2020
TL;DR: In this article, a taxonomy of multi-objective multi-agent decision-making settings is presented, based on the reward structures and utility functions of a system. But the taxonomy does not consider the trade-offs between conflicting objective functions.
Abstract: Many real-world decision problems are inherently multi-objective in nature and concern multiple actors, making multi-objective multi-agent systems a key domain to study. We argue that trade-offs between conflicting objective functions should be analysed on the basis of the utility that these trade-offs have for the users of a system. We develop a new taxonomy which classifies multi-objective multi-agent decision making settings, on the basis of the reward structures and utility functions. We analyse which solution concepts apply to the different settings in our taxonomy, which allows us to offer a structured view of the field and identify promising directions for future research.

76 citations


Journal ArticleDOI
TL;DR: This paper attempts to extend the traditional MABAC method to the Z-information environment by introducing the directed distance and regret theory, and the proposed method simultaneously considers the randomness and fuzziness of Z-number.
Abstract: Decision makers (DMs) have different cognitive levels in practical experience, information reserve, and thinking ability. Thus, decision information is often not completely reliable. As a tool that can effectively represent information reliability, Z-number has been studied by many scholars in recent years. Current research on Z-number assumes that differences in various parts of a Z-number can complement one another. However, in many cases, the preference of DMs for each part is difficult to determine, or DMs believe that the differences in various parts cannot be complementary. Therefore, to solve such decision problems, this paper attempts to extend the traditional MABAC method to the Z-information environment by introducing the directed distance and regret theory. The proposed method simultaneously considers the randomness and fuzziness of Z-number. An example about regional circular economy development program selection is provided to illustrate the feasibility of the proposed method. Results show that the proposed method can solve complex decision problems rationally and effectively, and it has broad application prospects.

66 citations


Journal ArticleDOI
TL;DR: A regret-based three-way decision model under interval type-2 fuzzy environment is constructed and it is shown that the proposed model can effectively solve uncertain decision problems.
Abstract: Three-way decision provides a new perspective for dealing with uncertainty and complexity in decision-making problems. However, behaviors of decision-makers may be influenced by different risk attitudes in reality. To address this problem, we construct a regret-based three-way decision model under interval type-2 fuzzy environment. Basically, regret theory and interval type-2 fuzzy set are utilized to improve three-way decision in coping with the risk and uncertainty. Two core issues focus on the determination of decision rules and estimation of conditional probabilities for different decision-makers under interval type-2 fuzzy environment. The maximum-utility decision rules are derived based on regret theory. An interval type-2 fuzzy technique for order preference by similarity to ideal solution (TOPSIS) method is utilized to estimate the conditional probability. The results of the illustrative example show that the proposed model can effectively solve uncertain decision problems. The comparative analysis and experimental evaluations are utilized to elaborate on the performance of the regret-based three-way decision model.

56 citations


Proceedings Article
01 Jan 2020
TL;DR: A new Variational Policy Gradient Theorem for RL with general utilities is derived, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.
Abstract: In recent years, reinforcement learning (RL) systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general concave utility function of the state-action occupancy measure, which subsumes several of the aforementioned examples as special cases. Such generality invalidates the Bellman equation. As this means that dynamic programming no longer works, we focus on direct policy search. Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function. We develop a variational Monte Carlo gradient estimation algorithm to compute the policy gradient based on sample paths. We prove that the variational policy gradient scheme converges globally to the optimal policy for the general objective, though the optimization problem is nonconvex. We also establish its rate of convergence of the order $O(1/t)$ by exploiting the hidden convexity of the problem, and proves that it converges exponentially when the problem admits hidden strong convexity. Our analysis applies to the standard RL problem with cumulative rewards as a special case, in which case our result improves the available convergence rate.

54 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a proactive handover framework for millimeter-wave networks, where handover timings are optimized while obstacle-caused data rate degradations are predicted before the degradation occurs.
Abstract: For millimeter-wave networks, this paper presents a paradigm shift for leveraging time-consecutive camera images in handover decision problems. While making handover decisions, it is important to predict future long-term performance—e.g., the cumulative sum of time-varying data rates—proactively to avoid making myopic decisions. However, this study experimentally notices that a time-variation in the received powers is not necessarily informative for proactively predicting the rapid degradation of data rates caused by moving obstacles. To overcome this challenge, this study proposes a proactive framework wherein handover timings are optimized while obstacle-caused data rate degradations are predicted before the degradations occur. The key idea is to expand a state space to involve time-consecutive camera images, which comprises informative features for predicting such data rate degradations. To overcome the difficulty in handling the large dimensionality of the expanded state space, we use a deep reinforcement learning for deciding the handover timings. The evaluations performed based on the experimentally obtained camera images and received powers demonstrate that the expanded state space facilitates (i) the prediction of obstacle-caused data rate degradations from 500 ms before the degradations occur and (ii) superior performance to a handover framework without the state space expansion.

Journal ArticleDOI
TL;DR: Past work with a focus on identifying types of formulations, classifying objectives, and categorising solution methods indicates that there is no standard formulation, that passenger oriented objectives are most common, and that more recent work are multi-objective.
Abstract: Assigning aircraft to gates is an important decision problem that airport professionals face every day. The solution of this problem has raised a significant research effort and many variants of this problem have been studied. In this paper, we review past work with a focus on identifying types of formulations, classifying objectives, and categorising solution methods. The review indicates that there is no standard formulation, that passenger oriented objectives are most common, and that more recent work are multi-objective. In terms of solution methods, heuristic and metaheuristic approaches are dominant which provides an opportunity to develop exact and approximate approaches both for the single and multi-objective problems.

Journal ArticleDOI
TL;DR: This article found that a substantial fraction of participants follow a simple "what you see is all there is" heuristic, according to which participants exclusively consider information that is right in front of them, and directly use the sample mean to estimate the population mean.
Abstract: News reports and communication are inherently constrained by space, time, and attention. As a result, news sources often condition the decision of whether to share a piece of information on the similarity between the signal and the prior belief of the audience, which generates a sample selection problem. This article experimentally studies how people form beliefs in these contexts, in particular the mechanisms behind errors in statistical reasoning. I document that a substantial fraction of experimental participants follows a simple “what you see is all there is” heuristic, according to which participants exclusively consider information that is right in front of them, and directly use the sample mean to estimate the population mean. A series of treatments aimed at identifying mechanisms suggests that for many participants, unobserved signals do not even come to mind. I provide causal evidence that the frequency of such incorrect mental models is a function of the computational complexity of the decision problem. These results point to the context dependence of what comes to mind and the resulting errors in belief updating.

Journal ArticleDOI
11 Dec 2020-Energies
TL;DR: A methodological framework is proposed that is the basis for the relevance analysis of individual criteria in any considered decision model and recommendations were presented in the field of reference criteria set for the considered decision problem, demonstrating the practical usefulness of the authors’ proposed approach.
Abstract: The paper undertakes the problem of proper structuring of multi-criteria decision support models. To achieve that, a methodological framework is proposed. The authors’ framework is the basis for the relevance analysis of individual criteria in any considered decision model. The formal foundations of the authors’ approach provide a reference set of Multi-Criteria Decision Analysis (MCDA) methods (TOPSIS, VIKOR, COMET) along with their similarity coefficients (Spearman correlation coefficients and WS coefficient). In the empirical research, a practical MCDA-based wind farm location problem was studied. Reference rankings of the decision variants were obtained, followed by a set of rankings in which particular criteria were excluded. This was the basis for testing the similarity of the obtained solutions sets, as well as for recommendations in terms of both indicating the high significance and the possible elimination of individual criteria in the original model. When carrying out the analyzes, both the positions in the final rankings, as well as the corresponding values of utility functions of the decision variants were studied. As a result of the detailed analysis of the obtained results, recommendations were presented in the field of reference criteria set for the considered decision problem, thus demonstrating the practical usefulness of the authors’ proposed approach. It should be pointed out that the presented study of criteria relevance is an important factor for objectification of the multi-criteria decision support processes.

Journal ArticleDOI
TL;DR: This work proposes two new approaches to determining the relevance of particular decision criteria effectively in sustainable transport problems using four different approaches and evaluates their effectiveness using a reference ranking and popular multi-criteria decision analysis methods.
Abstract: Problems related to sustainable urban transport have gained in importance with the rapid growth of urban agglomerations. There is, therefore, a need to support decision-making processes in this area, a trend that is visible in the literature. Many methods have already been presented as a useful decision-making tool in this field. However, it is still a significant challenge to properly determine the relevance of the criteria because it is one of the most critical points of many presented techniques to solve decision problems. In this work, we propose two new approaches to determining the relevance of particular decision criteria effectively in sustainable transport problems. For this purpose, we examine a study case for the evaluation of electric bikes evaluated against eight criteria, which have been taken from earlier work. We calculate the relevance of each criterion using four different approaches and then evaluate their effectiveness using a reference ranking and popular multi-criteria decision analysis methods. The results are compared with each other by using similarity coefficients. Finally, we summarize the results obtained and set out further methods of development.

Posted Content
TL;DR: This paper proposes partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs, and proposes fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the linear complexity the non-robust Bellman operators.
Abstract: Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which severely limits their scalability. This paper describes new efficient algorithms for solving the common class of robust MDPs with s- and sa-rectangular ambiguity sets defined by weighted $L_1$ norms. We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs. We also propose fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the linear complexity the non-robust Bellman operator. Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach which uses linear programming solvers combined with a robust value iteration.

Journal ArticleDOI
TL;DR: An approach based on morphological matrix is put forward to satisfy sustainability requirements in product conceptual stage considering vagueness and uncertainty through intuitionistic fuzzy number and the feasibility and validity of proposed method is proved.

Journal ArticleDOI
TL;DR: A site selection framework for the BSS based on MCDM (Multi-criteria decision making) method is proposed, which provides a critical tool for investors to select the most appropriate alternative.

Journal ArticleDOI
TL;DR: It is shown that a self-learning agent can successfully manage time constraints with the agent performing better than the traditional benchmark, a time-constraint heuristic combining due date deviations and a classical first-in-first-out approach.
Abstract: Reinforcement learning (RL) offers promising opportunities to handle the ever-increasing complexity in managing modern production systems. We apply a Q-learning algorithm in combination with a process-based discrete-event simulation in order to train a self-learning, intelligent, and autonomous agent for the decision problem of order dispatching in a complex job shop with strict time constraints. For the first time, we combine RL in production control with strict time constraints. The simulation represents the characteristics of complex job shops typically found in semiconductor manufacturing. A real-world use case from a wafer fab is addressed with a developed and implemented framework. The performance of an RL approach and benchmark heuristics are compared. It is shown that RL can be successfully applied to manage order dispatching in a complex environment including time constraints. An RL-agent with a gain function rewarding the selection of the least critical order with respect to time-constraints beats heuristic rules strictly by picking the most critical lot first. Hence, this work demonstrates that a self-learning agent can successfully manage time constraints with the agent performing better than the traditional benchmark, a time-constraint heuristic combining due date deviations and a classical first-in-first-out approach.

Journal ArticleDOI
TL;DR: An assessment framework established on a Hesitant Fuzzy Linguistic Multi-Criteria Decision-Making technique to collectively consider parameters affecting the eventual decision to address the Smart Watch (SW) selection problem is introduced.

Journal ArticleDOI
TL;DR: A new weighting method, which integrates BWM and D numbers, called D-BWM is proposed to evaluate renewable energy alternatives under uncertainty and provides an easy and systematic approach which can be straightforwardly extended to handle many decision problems in the field of sustainable development.

Journal ArticleDOI
01 Sep 2020-Symmetry
TL;DR: This work shows two approaches to identify fuzzy models with partially incomplete data and compares their accuracy empirically and to determine the importance of the obtained model to the used criteria.
Abstract: A significant challenge in the current trend in decision-making methods is the problem’s class in which the decision-maker makes decisions based on partially incomplete data. Classic methods of multicriteria decision analysis are used to analyze alternatives described by using numerical values. At the same time, fuzzy set modifications are usually used to include uncertain data in the decision-making process. However, data incompleteness is something else. In this paper, we show two approaches to identify fuzzy models with partially incomplete data. The monolithic approach assumes creating one model that requires many queries to the expert. In the structured approach, the problem is decomposed into several interrelated models. The main aim of the work is to compare their accuracy empirically and to determine the sensitivity of the obtained model to the used criteria. For this purpose, a study case will be presented. In order to compare the proposed approaches and analyze the significance of the decision criteria, we use two ranking similarity coefficients, i.e., symmetric rw and asymmetric WS. In this work, the limitations of each approach are presented, and the results show great similarity despite the use of two structurally different approaches. Finally, we show an example of calculations performed for alternatives with partially incomplete data.

Journal ArticleDOI
TL;DR: A detection framework based on Turing machines is developed to detect those scenarios in which the jammer is not able to disrupt the communication and it is shown that additional coordination resources such as common randomness make the communication robust against such attacks.
Abstract: Wireless communication systems are inherently vulnerable to intentional jamming. In this paper, two classes of such jammers are considered: those with partial and full knowledge. While the first class accounts for those jammers that know the encoding and decoding function, the latter accounts for those that are further aware of the actual transmitted message. Of particular interest are so-called denial-of-service (DoS) attacks in which the jammer is able to completely disrupt any transmission. Accordingly, it is of crucial interest for the legitimate users to detect such adversarial DoS attacks. This paper develops a detection framework based on Turing machines. Turing machines have no limitations on computational complexity and computing capacity and storage and can simulate any given algorithm. For both scenarios of a jammer with partial and full knowledge, it is shown that there exists no Turing machine which can decide whether or not a DoS attack is possible for a given channel and the corresponding decision problem is undecidable. On the other hand, it is shown for both scenarios that it is possible to algorithmically characterize those channels for which a DoS attack is not possible. This means that it is possible to detect those scenarios in which the jammer is not able to disrupt the communication. For all other channels, the Turing machine does not stop and runs forever making this decision problem semidecidable. Finally, it is shown that additional coordination resources such as common randomness make the communication robust against such attacks.

Journal ArticleDOI
TL;DR: This work combines game theory, dynamic programming, and recent deep reinforcement learning (DRL) techniques to online learn the Nash equilibrium policy for two-player zero-sum Markov games (TZMGs) and proves the effectiveness of the proposed algorithm on TZMG problems.
Abstract: The Nash equilibrium is an important concept in game theory. It describes the least exploitability of one player from any opponents. We combine game theory, dynamic programming, and recent deep reinforcement learning (DRL) techniques to online learn the Nash equilibrium policy for two-player zero-sum Markov games (TZMGs). The problem is first formulated as a Bellman minimax equation, and generalized policy iteration (GPI) provides a double-loop iterative way to find the equilibrium. Then, neural networks are introduced to approximate Q functions for large-scale problems. An online minimax Q network learning algorithm is proposed to train the network with observations. Experience replay, dueling network, and double Q-learning are applied to improve the learning process. The contributions are twofold: 1) DRL techniques are combined with GPI to find the TZMG Nash equilibrium for the first time and 2) the convergence of the online learning algorithm with a lookup table and experience replay is proven, whose proof is not only useful for TZMGs but also instructive for single-agent Markov decision problems. Experiments on different examples validate the effectiveness of the proposed algorithm on TZMG problems.

Journal ArticleDOI
Mengdi Wang1
TL;DR: A novel randomized linear programming algorithm for approximating the optimal policy of the discounted-reward and average-reWARD Markov decision problems by leveraging the value–policy relationship between reward and reward is proposed.
Abstract: We propose a novel randomized linear programming algorithm for approximating the optimal policy of the discounted-reward and average-reward Markov decision problems. By leveraging the value–policy ...

Journal ArticleDOI
TL;DR: To determine the significant supply quality criteria of public transportation, a hybrid Analytic Hierarchy Process (AHP) combined with the Best Worst Method (BWM) is applied and the model adopted highlighted the most significant service quality criteria that influence urban bus transport systems.
Abstract: Big cities suffer from serious complex problems such as air pollution, congestion, and traffic accidents Developing public transport quality in such cities is considered an efficient remedy to obviate these critical issues This paper aims to determine the significant supply quality criteria of public transportation As a methodology, a hybrid Analytic Hierarchy Process (AHP) combined with the Best Worst Method (BWM) is applied The proposed model is basically a hierarchy structure with at least a 5 × 5 pairwise comparison matrix or larger A real-world complex problem was examined to validate the created model (public transport quality improvement) An urban bus transport system in the Jordanian capital city, Amman, was used as a case study; three stakeholder groups (passengers, nonpassengers, and representatives of the local government) participated in the evaluation process The conventional Analytic Hierarchy Process (AHP) leads to weak consistency in the case of existing 5 × 5 pairwise comparison matrices or larger, particularly in estimating complex problems To avoid this critical issue in AHP, we used Best Worst Method (BWM) comparisons, which make the evaluation process easier for decision makers; moreover, it saves survey time and provides more consistency when compared to AHP pairwise comparisons The model adopted highlighted the most significant service quality criteria that influence urban bus transport systems Furthermore, the sensitivity analysis conducted detected the stability of the criteria ranking in the three levels of the hierarchical structure Since the proposed AHP–BWM model (which is the sole example of this sort of combination) is independent from the decision attributes, it can be applied to arbitrary hierarchically structured decision problems with a relatively large number of pairwise comparisons

Proceedings ArticleDOI
19 Oct 2020
TL;DR: To train from weak supervision, the proposed Director-Actor-Critic framework is based on hierarchical Reinforcement Learning with intrinsic motivation, and to accelerate the training process, pre-train the Critic with high-reward trajectories generated by hand-crafted rules, and leverage curriculum learning to gradually increase the complexity of questions during query graph generation.
Abstract: Knowledge Graph Question Answering aims to automatically answer natural language questions via well-structured relation information between entities stored in knowledge graphs. When faced with a complex question with compositional semantics, query graph generation is a practical semantic parsing-based method. But existing works rely on heuristic rules with limited coverage, making them impractical on more complex questions. This paper proposes a Director-Actor-Critic framework to overcome these challenges. Through options over a Markov Decision Process, query graph generation is formulated as a hierarchical decision problem. The Director determines which types of triples the query graph needs, the Actor generates corresponding triples by choosing nodes and edges, and the Critic calculates the semantic similarity between the generated triples and the given questions. Moreover, to train from weak supervision, we base the framework on hierarchical Reinforcement Learning with intrinsic motivation. To accelerate the training process, we pre-train the Critic with high-reward trajectories generated by hand-crafted rules, and leverage curriculum learning to gradually increase the complexity of questions during query graph generation. Extensive experiments conducted over widely-used benchmark datasets demonstrate the effectiveness of the proposed framework.

Journal ArticleDOI
TL;DR: A new style of pairwise comparison called DLPR with incomplete symbolic proportions is proposed to represent DMs’ comparison information and applied to GDM situations, which are demonstrated by solving a GDM problem of evaluating and selecting research projects.

Journal ArticleDOI
TL;DR: A multigranulation sequential three-way decisions model with cost-sensitive, based on BWM and MULTIMOORA for multiple levels of granularity to deal with the multi-attribute group decision-making problems under uncertainty.

Journal ArticleDOI
TL;DR: This paper defines the modified PSVNHFS, a probabilistic single valued neutrosophic hesitant fuzzy weighted arithmetic average operator and introduces two aggregation operators related to algebraic properties presented in this paper.
Abstract: Recently, there has been great interest on single valued neutrosophic hesitant fuzzy set theory. When compared single valued neutrosophic set, it is more convenient for real life situations. But even in this case, there is still missing data for some decision problems. Probabilistic single valued neutrosophic hesitant fuzzy sets (PSVNHFSs) are defined to solve this problem. Even though it contains more information, it needs some improvements. In this paper, the modified PSVNHFS is defined and some improvements in the theory of PSVNHFS are proposed. Also, we improve some algebraic properties of this set theory and define a distance operator for PSVNHFSs. Then we introduce two aggregation operators called probabilistic single valued neutrosophic hesitant fuzzy weighted arithmetic average (PSVNHFWA) operator and probabilistic single valued neutrosophic hesitant fuzzy weighted geometric average (PSVNHFWG) operator related to algebraic properties presented in this paper. Also, we extend the MABAC method under the probabilistic single valued neutrosophic hesitant fuzzy set theory. Finally, we give an illustrative example to demonstrate the stability and reliability of the proposed theory.