Getting Started with Markov Decision Process
A Markov Decision Process is used in planning and decision making where the decision-maker can face repeatable situations.
For example, for a business, there can b2 two states. "High Sales", "Low Sales". The business owner can perform 2 actions. "Maximize Advertisement budget", "Minimize advertisement budget". In a given situation, (i.e. High Sales, or Low Sales), the actions can impact in a way that, the "Low sale" can turn into the "high sale" or "high sale" can turn into the "low sale". Markov Decision Process is a decision-making process that evaluates a policy so that the business owner will know what action to perform in what state. Should he/she maximize the advertisement budget when the sale is low? should he/she minimize the advertisement budget when the sale is high? Markov Decision Process can evaluate a policy that can answer this question.
Once the Markov Model is defined, a policy can be evaluated by doing Value Iteration or Policy Iteration. The policy is calculated based on the expected reward for each of the states.
The Markov Decision Process can be defined as
MDP = 〈S,A,T,R,γ〉
Here, S are the states, A the actions, T the transition probabilities and R is the rewards in a given state. γ is a discount factor that is used to reduce the importance of future rewards. The transition probabilities are the probabilities of going from one state to another.
Therefore, in order to apply the Markov Decision Process, you need to have
1. A set of predefined set of States,
2. A set of actions that you know, you can take on those states.
3. You need to define a reward in a given state.
4. You need to know the probabilities of going from one state to another, given a specific action is performed in a state.
Once you have all these information ready, you can start modeling the Markov Decision Process. Say, you are the business owner who wants to use the Markov Decision Process to get a policy about what to do in a High Sale / Low Sale state. Click the Markov Decision Process button from the dashboard. You will be greeted with the following screen.
Now, add the states as you know. Enter "High Sale" and "Low Sale". Then click the Proceed button. Once you click the Proceed button, you will be asked the following question.
Click 'Yes' because you want to set up your actions. Once you click Yes, you will be asked to enter the actions you can take. Enter "Maximize advertisement", and "Minimize advertisement"
Then click "Proceed". Now, you will be asked the following question. Click "Yes".
This time, you know that the same set of actions can be taken in the Low Sale state. So, set up the actions as shown below.
Click "Proceed". Now, you will be asked to define an objective, because that objective will be used to measure your Reward in a State. For simplicity, let's use an Objective like "Maximize Satisfaction". Then click "Proceed".
Once you click the "Proceed" button, you will be asked the type of your objective. Just for this simple example, use "Subjective" type.
Then you will be asked if you have any more objectives. As you know, Rational Will supports multi-criteria decision analysis. So, you can define many objectives. But for this simple problem, we will stick to one objective. So, click NO.
Now set the reward in High Sale State. Obviously will be very happy in the 'High Sale' state, so you can set the Satisfaction slider to its full value. Then click the Proceed button.
Once you click the Proceed button, you will be asked to set the reward for the 'Low Sale' state. Obviously, you do not want the 'Low Sale', and you will be very unhappy in that state. So, set the slider value to its minimum value. Then click the "Proceed" button.
Once you click the "Proceed" button, you will be asked to set the transition probabilities. Say, when you are in the High Sale state, maximizing advertisement keeps the High Sale state intact with 90% probability. So, let's define it like this.
Now, click the Proceed button. You will be asked the transition probabilities when you Minimize advertisement. Say, you have set up the transition probabilities like this:
Once you click the "Proceed" button, you will be asked to define transition probabilities for Low State, with Maximize advertisement action.
Once you click the Proceed button, you will be asked the following transition probabilities. You may set up as shown below.
Once you click the "Proceed" button, you will be shown the following view.
According to the calculated policy, in both situations, you need to Maximize the advertisement. Well, it is so simple example that, this policy can be evaluated by simple common sense, no need to perform any complicated calculation. But, it is an example of modeling with known answers so that it is a way of validation that the system works. Now, if you have a complicated scenario where you have more than one state and more than one actions with not so extreme reward, then the Markov Decision Process can be a very useful tool to find a policy that can be applied in real life. Now, click the "Finish" button. Once you do that, you will see the following view.
From the above screen, you can see that you can modify all the input that you have entered so far from the wizard. Based on the evaluated policy, a Markov Chain is also formed. Based on the Markov Chain, the forecast chart is displayed in the carousel, as you can see from the above screen.
For the Markov Chain, an initial state needs to be defined, so you can do that from this tab as shown here.
You can see the Decision Graph by clicking the "Decision Graph" button from the Ribbon.
Once you click the Decision Graph button, you will see the decision graph as shown below.