• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2000 Fiscal Year Final Research Report Summary

Studies on optimal theory and its application in probabilistic decision processes with general utility functions.

Research Project

Project/Area Number 11640118
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field General mathematics (including Probability theory/Statistical mathematics)
Research InstitutionWakayama Univ.

Principal Investigator

KADOTA Yoshinobu  Wakayama Univ., Edu., Prof., 教育学部, 教授 (90116294)

Co-Investigator(Kenkyū-buntansha) YASUDA Masami  Chiba Univ., Sci., Prof., 理学部, 教授 (00041244)
KURANO Masami  Chiba Univ., Edu., Prof., 教育学部, 教授 (70029487)
Project Period (FY) 1999 – 2000
KeywordsMarkov / decision / stopping / utility / optimal / concave / risk-averse / non-discounted
Research Abstract

Stopped decision process is a combined model of Markov decision processes (MDPs) and the stopping problem. MDPs are specified by the set of countable states S, compact action space A (i) assigned at each state i ∈ S, transition probabilities q= (q_<ij>(a)), and a uniformly bounded immediate reward function r(i, a, j), which are continuous in a ∈ A(i) for any i, j ∈ S.A policy π is a sequence of probabilities on A (i_t) conditioned by each histories (i_0, a_0, i_1, …, i_t ) for t=0,1, …. Denote by σ a stopping time and by g a utility function.
Let B(t)=Σ^t_<k=1> r (X_<k-1>, Δ_<k-1>, X_k), where X_t and Δ_t are the state and action at time t, respectively. The pair (π, σ) is called (i_0, α_0)-optimal if it maximizes E^π_<i_0> [g (α_0+B(σ))], where E^π_<i_0> is the expectation by the probability measure on the sample space Ω=(S×A)^∞ for an initial state i_0.
It is assumed that g is non-decreasing, concave and bounded above, or that g has an bounded derivative on any compact subset of the real line R satisfying E^π_i [sup_<t【greater than or equal】0> g^+(α_0+B(t))] < ∞ for any π, i, where g^+ is the positive part of g. Let v(i, α) = max_<{(π, σ)}> E^π_i (g(α+B(σ)). Then, we have following results.
1. For any i ∈ S and α, υ(i, α) satisfies optimality equations
υ(i, α) = max {g(α), max_<α∈A> Σ_<j∈S> q_<ij>(a) υ (j, α+r(i, a, j)}(1)Furthermore, suppose (π, σ) satisfies P^π_<i_0> (σ>1)=1.
2. If (π, σ) is (i_0, α_0)-optimal pair, then E^π_<i_0> [g(α_0 + B(σ))] satisfies (1).
3. If E^π_<i_0>[g(α_0 + B(σ))] satisfies (1), then (π, σ) is (i_0, α_0)-optimal.

  • Research Products

    (6 results)

All Other

All Publications (6 results)

  • [Publications] Y.Kadota,M.Kurano and M.Yasuda.: "Stopped decision processes in conjunction with general utility."To appear in J.Information and Optimization Science.. (2001)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Y.Kadota,M.Kurano and M.Yasuda: "Risk-averse stopped Markov decision processes"第4回情報・統計科学(BIC)シンポジウム報告.. (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Y.Kadota.: "Deviation matrix, Laurent series and Blackwell optimality in countable state Markov decision processes."数理解析研究所講究録「不確実なモデルによる動的形画理論の課題とその展望」. (掲載予定). (2001)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Y.Kadota, M.Kurano and M.Yasuda: "Stopped decision processes in conjunction with general utility."To appear in J.Inform. & Optim.Sci.. (2001)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Y.Kadota, M.Kurano and M.Yasuda: "Risk-averse stopped Markov decision processes."The 4th BIC (Bull. Inform. & Cybernet.) symposium. (1999)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Y.Kadota: "Deviation matrix, Laurent series and Blackwell optimality in countable state Markov decision Processes."To appear. Lecture note in Institute of Math. Anal. in Kyoto Univ.. (2001)

    • Description
      「研究成果報告書概要(欧文)」より

URL: 

Published: 2002-03-26  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi