Reconciling my understanding of dynamic programming and Markov decision process

Algorithms like policy iteration and value iteration are often classified as dynamic programming methods that try to solve the Bellman optimality equations.

My current understanding of dynamic programming is this:

It is a method applied to optimization problems. DP problems exhibit optimal substructure, i.e., the optimal solution to a problem contains the optimal solution to a subproblem. These subproblems are not independent of each other, but are overlapping. There are two approaches - one is a bottom-up, the other is top-down. I have the following questions:

Is this understanding of DP comprehensive? Does every DP algorithm have an optimal substructure with overlapping subproblems?

How does policy iteration and value iteration fit into this scheme? Can we call them bottom-up or top-down?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/b3anz129
πŸ“…︎ Apr 15 2021
🚨︎ report
Below you will find a link to a Zoom recording where our team discusses Reinforcement Learning. Topics covered: Markov Decision Process, Double Q-Learning, the math behind Q-Learning, and the Bellman Equation. We also walk through the algorithms and provide coded examples.

Topic: Reinforcement Learning Math Discussion

Meeting Recording:

https://us02web.zoom.us/rec/share/xcdlLPLzrmxLfNbNuFHud4UtFaTVeaa823IYr6dYzUw-uzo3Q0gjSQwweD9oLgzf

πŸ‘︎ 43
πŸ’¬︎
πŸ‘€︎ u/davidstroud1123
πŸ“…︎ May 19 2020
🚨︎ report
Hey Everyone! Tried writing a small introduction to Markov Decision Process. This is my first technical blog! Feedback and Suggestions would be greatly appreciated. medium.com/@mitesh_shah/i…
πŸ‘︎ 14
πŸ’¬︎
πŸ‘€︎ u/mitesh1612
πŸ“…︎ Apr 27 2020
🚨︎ report
Reinforcement Learning: The Markov Decision Process

How is the Markov Decision Process used in AI? All that has been explained in a simple fashion in Reinforcement learning Part 3: Introduction to Markov Process. Take a look πŸ‘‰https://medium.com/ai%C2%B3-theory-practice-business/reinforcement-learning-part-3-the-markov-decision-process-9f5066e073a2

πŸ‘︎ 32
πŸ’¬︎
πŸ‘€︎ u/cdossman
πŸ“…︎ Oct 30 2019
🚨︎ report
Markov decision process [minor Part 2 Spolier]

So In the final episode we get to see the code to get into HAP's Lab. I’m not just looking at every little thing. This was deliberate. The camera panned directly at it and he stood to the side.

268 #62

Now maybe this is just an easter egg or maybe I've been looking at this stuff for too long and everything can apply, but I see so many coincidences that are always somewhat applicable.

This lead me to a book on Signal Processing for Cognitive Radios.

There it talks about infinite horizon value functions and how they can be approximated by a piecewise linear finite horizon value function when computed for a sufficiently long horizon

Also on that page it references something called the Markov Decision Process.

>A Markov decision process (MDP) is a discrete time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming and reinforcement learning.

>They are used in many disciplines, including robotics, automatic control, economics and manufacturing.

>The probability that the process moves into its new state is influenced by the chosen action. The next state depends on the current state and the decision maker's action But is conditionally independent of all previous states and actions

All sounds remarkably similar to the type of system that can propel forking life decisions and dimension jumps forward.

Also the whole robotics and automation stuff.

πŸ‘︎ 9
πŸ’¬︎
πŸ‘€︎ u/Cicer
πŸ“…︎ Mar 30 2019
🚨︎ report
[P] An introduction to Markov Decision Process [RL]

I have recently managed to set aside some time to clean up a couple of drafts that have been sitting on my lap for the last couple of months.

It's a series of three articles on the Markov Decision Processes, a piece of the mathematical framework underlying Reinforcement Learning techniques. A couple more are in the process of being written, but I believe that the material could already be useful to anyone interested in taking a look at the "nitty gritty" math formulation.

Link to the first article: https://www.lpalmieri.com/posts/rl-introduction-00/

Link to the index: https://www.lpalmieri.com/

πŸ‘︎ 185
πŸ’¬︎
πŸ‘€︎ u/LukeMathWalker
πŸ“…︎ Sep 01 2018
🚨︎ report
Are there works that provide geometric /topological viewpoint Markov Decision Process?
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/hmi2015
πŸ“…︎ Jul 25 2019
🚨︎ report
Is there a way to represent a Markov Decision Process as a BayesNet?
πŸ‘︎ 8
πŸ’¬︎
πŸ‘€︎ u/quickMLQuestion
πŸ“…︎ Mar 18 2019
🚨︎ report
[D] Simple intro to Markov Decision Process via Game of Thorns youtube.com/watch?v=Kllu_…
πŸ‘︎ 28
πŸ’¬︎
πŸ‘€︎ u/jaleyhd
πŸ“…︎ Apr 11 2018
🚨︎ report
partially observable Markov decision process: One of the best explanations i have come across youtube.com/watch?v=bVT7Q…
πŸ‘︎ 111
πŸ’¬︎
πŸ‘€︎ u/saltedcashew
πŸ“…︎ Apr 09 2018
🚨︎ report
Has anyone learned about the Markov Decision Process at UNLV? (Looking for a professor that teaches it)

Did you like the professor? Who was it? What class was it?

Thanks!

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/g3t0nmyl3v3l
πŸ“…︎ Oct 11 2018
🚨︎ report
[D] Reinforcement Learning - Markov Decision Process oneraynyday.github.io/ml/…
πŸ‘︎ 53
πŸ’¬︎
πŸ‘€︎ u/OneRaynyDay
πŸ“…︎ May 20 2018
🚨︎ report
Solving the egg drop puzzle in python using brute force, dynamic programming, and a Markov Decision Process declanoller.com/2018/09/0…
πŸ‘︎ 25
πŸ’¬︎
πŸ‘€︎ u/diddilydiddilyhey
πŸ“…︎ Sep 11 2018
🚨︎ report
The Markov Property, Chain, Reward Process and Decision Process xaviergeerinck.com/markov…
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/qznc_bot
πŸ“…︎ May 20 2018
🚨︎ report
Category Theory of Markov Decision Processes thenewflesh.net/2020/02/1…
πŸ‘︎ 69
πŸ’¬︎
πŸ‘€︎ u/hoj201
πŸ“…︎ Feb 12 2020
🚨︎ report
Some questions regarding Markov Decision Processes

Hi ! I am currently taking the Reinforcement Learning specialization by University of Alabama on coursera (auditing, no money T_T)

Please refer to this image before reading queries

https://imgur.com/a/t9qYB0D

The Markov property states that we have all the information needed in the current state and action to predict the next state

Queries :

  1. If the markov property holds and we have all the information required to predict the next state then why do we sum over all rewards for a particular next state for ALL POSSIBLE NEXT STATES ? Isn't the next state known if the ACTION is given according to the markov property ?
  2. Why are we summing over all rewards for EACH state ? Isn't the reward fixed for each state EVEN IF we are not given the action ?
πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/CSGOvelocity
πŸ“…︎ Jun 28 2020
🚨︎ report
"A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming", Ferreira et al 2017 arxiv.org/abs/1706.01417
πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/gwern
πŸ“…︎ Jun 11 2017
🚨︎ report
Policy in Markov Decision Processes

I am currently going through the university of alberta's course on RL on coursera

Confusion :

In MDPs the next state and the reward associated with it are stochastic. Given the current state, every state in the set of possible states has a finite possibility of occurring. Then how do you choose an optimal policy ? I understand that we are trying to maximise the discounted expected return (sum of all further rewards)

Do you evaluate multiple policies over episodes and then choose ?

And how are actions picked if states are stochastic ?

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/CSGOvelocity
πŸ“…︎ Jul 26 2020
🚨︎ report
Category Theory of Markov Decision Processes thenewflesh.net/2020/02/1…
πŸ‘︎ 15
πŸ’¬︎
πŸ‘€︎ u/hoj201
πŸ“…︎ Feb 12 2020
🚨︎ report
I summarised my experiences with learning a Partially Observable Markov Decision Process given input/output data - in case it helps anyone. danielmescheder.wordpress…
πŸ‘︎ 13
πŸ’¬︎
πŸ‘€︎ u/danielMe
πŸ“…︎ Dec 13 2011
🚨︎ report
Markov Decision Processes (MDPs) ?

anyone has any idea about Markov Decision Processes (MDPs) ? I am making a voice assistant android app and need a middleware for the backend to link everything.. so was wondering whether MDP would be a good option..

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/javaliciouz
πŸ“…︎ Apr 10 2020
🚨︎ report
The Mathematics of 2048: Optimal Play with Markov Decision Processes jdlm.info/articles/2018/0…
πŸ‘︎ 1k
πŸ’¬︎
πŸ‘€︎ u/begnini
πŸ“…︎ Apr 10 2018
🚨︎ report
Markov Decision Process in R for a song suggestion software?

Okay, so I'm not exactly sure if this belongs here, but this is my problem: We have a music player that has different playlists and automatically suggests songs from the current playlist I'm in. What I want the program to learn is, that if I skip the song, it should decrease the probability to be played in this playlist again. I think this is what's called reinforcement learning and I've read a bit about the algorithms, decidin that MDP seems to be exactly what we have here. I know that in MDP there are more than one state, so I figured for this case it would mean the different playlists. Like depending on the state (playlist) I'm in, it chooses the songs that it thinks fits the best and get "punished" (by skipping) if it has chosen wrongly.

So what I'm asking is, if you guys think this is the right approach? Or would you suggest a different algorithm? Does all of this even make any sense, should I provide more information?

If it does sound right, I'd like to ask for some tutorials or starting points getting about MDP in R. I've searched online but have only found the MDP Toolbox in R and it kind of doesn't really make sense to me. Do you have any suggestions? I'm really helpful for any kind of advice. :)

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/edevcimot
πŸ“…︎ May 19 2015
🚨︎ report
Category Theory of Markov Decision Processes thenewflesh.net/2020/02/1…
πŸ‘︎ 19
πŸ’¬︎
πŸ‘€︎ u/hoj201
πŸ“…︎ Feb 12 2020
🚨︎ report
What does E[...|...;...] mean in a mathematical formula? E.g. the state value function in Markov decision processes

I'm trying to implement this paper about Markov decision processes but am struggling with some of the formulas, for example the state value function definition at the end of the second page. I can understand everything in it except the double struck E[...] notation at the start, I've never seen it before, can't derive what it means from the formula and don't know what to look up. The only thing I can think of is set builder notation considering the 'where pipe "|"' and the big double struck letter but that wouldn't make any sense here. Could anyone help me out with this? Thanks a lot for reading!

πŸ‘︎ 21
πŸ’¬︎
πŸ‘€︎ u/LetsGetTrashed
πŸ“…︎ May 25 2019
🚨︎ report
The Mathematics of 2048: Optimal Play with Markov Decision Processes jdlm.info/articles/2018/0…
πŸ‘︎ 572
πŸ’¬︎
πŸ‘€︎ u/dezzion
πŸ“…︎ Apr 09 2018
🚨︎ report
Category Theory of Markov Decision Processes thenewflesh.net/2020/02/1…
πŸ‘︎ 11
πŸ’¬︎
πŸ‘€︎ u/hoj201
πŸ“…︎ Feb 12 2020
🚨︎ report
Reinforcement Learning: The Markov Decision Process

How is the Markov Decision Process used in AI? All that has been explained in a simple fashion in Reinforcement learning Part 3: Introduction to Markov Process. Take a look πŸ‘‰https://medium.com/ai%C2%B3-theory-practice-business/reinforcement-learning-part-3-the-markov-decision-process-9f5066e073a2

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/cdossman
πŸ“…︎ Oct 30 2019
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.