提供高质量的essay代写,Paper代写,留学作业代写-天才代写

首頁 > > 詳細

代寫0CCS0CSE編程、代做R,Java,Python程序語言代寫Web開發|代寫Web開發

Introduction to CS & Engineering (0CCS0CSE)
Assignment 23: Episode
1 Value Function
Implementing Eq. 1 can cause confusion because V (S) is on both sides of the equation and
in Python V (S) is a dictionary. This document will help explain lines 23−25 in Algorithm 1.
V (St) = V (St) + α[Rt+1 + γV (St+1) − V (St)] (1)
Although lines 23 and 24 appear to update the valueFunction dictionary in Algorithm 1,
they do not. Lines 23 and 24 are retrieve information from the value function dictionary.
The introduction of two new variables, v st1 and v st0, to replace V (St+1) and V (St), would
help to clarify that only line 25 changes the dictionary.
v st1 ⇐ GetValueOf(board)
v st0 ⇐ GetValueOf(previousState)
V (St) ⇐ v st0+session.learningRate×(reward+(session.discountRate×v st1)−v st0)
Furthermore, GetValueOf(...) is a multistep process (1) get the key from the board (2)
check if the key is in valueFunction, either i. the key is in valueFunction —return the
value associated with the key in the dictionary, e.g., return self.valueFunction[key] or
ii. the key is not in valueFunction —add the key to the dictionary, initialise its value
to zero and return 0. It would be best to add a new method, getValueOf(self, board),
which does all of this. In Algorithm 1, lines 23 and 24, both board and previousState are
TicTacToe objects.
1
Algorithm 1 This method executes a single tictactoe game and updates the state value
table after every move played by the RL agent.
1: procedure episode(board, opponent, session)
2:
3: result ⇐ True
4: turn ⇐ 0
5: previousState ⇐ CopyBoard()
6:
7: while not board.isGameOver() and result do
8: if turn > 1 then :
9: turn ⇐ 0
10: end if
11:
12: agentMoved ⇐ False
13:
14: if turn is 0 and session.agentFirst or turn is 1 and not session.agentFirst then
15: result ⇐ makeTrainingMove(board, session.epsilon)
16: agentMoved ⇐ True
17: else
18: result ⇐ opponent.makeMove(board)
19: end if
20:
21: if agentMoved then
22: reward ⇐ getReward(board)
23: V (St+1) ⇐ GetValueOf(board)
24: V (St) ⇐ GetValueOf(previousState)
25: V (St) ⇐ V (St) +session.learningRate ×(reward + (session.discountRate ×
V (St+1)) − V (St))
26: previousState ⇐ CopyBoard()
27:
28: end if
29:
30: turn ⇐ turn + 1
31: end while
32:
33: reward ⇐ getReward(board)
34: V (St+1) ⇐ GetValueOf(board)
35: V (St+1) ⇐= V (St+1) + session.learningRate ∗ reward
36: end procedure
 

聯系我們
  • QQ:1067665373
  • 郵箱:1067665373@qq.com
  • 工作時間:8:00-23:00
  • 微信:Badgeniuscs
熱點文章
程序代寫更多圖片

聯系我們 - QQ: 1067665373 微信:Badgeniuscs
? 2021 uk-essays.net
程序代寫網!

在線客服

售前咨詢
售后咨詢
微信號
Essay_Cheery
微信
全优代写 - 北美Essay代写,Report代写,留学生论文代写作业代写 北美顶级代写|加拿大美国论文作业代写服务-最靠谱价格低-CoursePass 论文代写等留学生作业代做服务,北美网课代修领导者AssignmentBack 北美最专业的线上写作专家:网课代修,网课代做,CS代写,程序代写 代码代写,CS编程代写,java代写北美最好的一站式学术代写服务机构 美国essay代写,作业代写,✔美国网课代上-最靠谱最低价 美国代写服务,作业代写,CS编程代写,java代写,python代写,c++/c代写 代写essay,作业代写,金融代写,business代写-留学生代写平台 北美代写,美国作业代写,网课代修,Assignment代写-100%原创 北美作业代写,【essay代写】,作业【assignment代写】,网课代上代考