Getting Started with the Markov Decision Process
A Markov Decision Process is used in planning and decision making where the decision-maker can face repeatable situations.
For example, for a business, there can b2 two states. "High Sales", "Low Sales". The business owner can perform 2 actions. "Maximize Advertisement budget", "Minimize advertisement budget". In a given situation, (i.e. High Sales, or Low Sales), the actions can impact in a way that, the "Low sale" can turn into the "high sale" or "high sale" can turn into the "low sale". Markov Decision Process is a decision-making process that evaluates a policy so that the business owner will know what action to perform in what state. Should he/she maximize the advertisement budget when the sale is low? should he/she minimize the advertisement budget when the sale is high? Markov Decision Process can evaluate a policy that can answer this question.
Once the Markov Model is defined, a policy can be evaluated by doing Value Iteration or Policy Iteration. The policy is calculated based on the expected reward for each of the states.
The Markov Decision Process can be defined as
MDP = 〈S,A,T,R,γ〉
Here, S are the states, A the actions, T the transition probabilities and R is the rewards in a given state. γ is a discount factor that is used to reduce the importance of future rewards. The transition probabilities are the probabilities of going from one state to another.
Therefore, in order to apply the Markov Decision Process, you need to have
1. A set of the predefined set of States,
2. A set of actions that you know, you can take on those states.
3. You need to define a reward in a given state.
4. You need to know the probabilities of going from one state to another, given a specific action is performed in a state.
Step by step modeling example
Once you have all this information ready, you can start modeling the Markov Decision Process. Say, you are the business owner who wants to use the Markov Decision Process to get a policy about what to do in a High Sale / Low Sale state. Click the Markov Decision Process button from the dashboard. You will be greeted with the following screen.
Now, add the states as you know. Enter "High Sale" and "Low Sale". Then click the Proceed button. If you want to make changes to the States, like Edit the name or delete, you can do that by selecting a state and right-click to see the context menu. You can double click on the state name to turn into the Edit mode as well.
Once you click the Proceed button, you will be asked the following question.
Click 'Yes' because you want to set up your actions. Once you click Yes, you will be asked to enter the actions you can take. Enter "Maximize advertisement", and "Minimize advertisement"
Then click "Proceed". Now, you will be asked the following question. Click "Yes".
This time, you know that the same set of actions can be taken in the Low Sale state. So, set up the actions as shown below.
Click "Proceed". Now, you will be asked to define an objective, because that objective will be used to measure your Reward in a State. For simplicity, let's use an Objective like "Maximize Satisfaction". Then click "Proceed".
Identifying Objectives for Reward assignment.
Once you click the "Proceed" button, you will be asked the type of your objective. Just for this simple example, use the "Subjective" type.
Then you will be asked if you have any more objectives. As you know, Rational Will supports multi-criteria decision analysis. So, you can define many objectives. But for this simple problem, we will stick to one objective. So, click NO.
Now set the reward in High Sale State. Obviously will be very happy in the 'High Sale' state, so you can set the Satisfaction slider to its full value. Then click the Proceed button.
Once you click the Proceed button, you will be asked to set the reward for the 'Low Sale' state. Obviously, you do not want the 'Low Sale', and you will be very unhappy in that state. So, set the slider value to its minimum value. Then click the "Proceed" button.
Once you click the "Proceed" button, you will be asked to set the transition probabilities. Say, when you are in the High Sale state, maximizing advertisement keeps the High Sale state intact with 90% probability. So, let's define it like this.
Now, click the Proceed button. You will be asked the transition probabilities when you Minimize advertisement. Say, you have set up the transition probabilities like this:
Once you click the "Proceed" button, you will be asked to define transition probabilities for Low State, with Maximize advertisement action.
Once you click the Proceed button, you will be asked the following transition probabilities. You may set up as shown below.
Once you click the "Proceed" button, you will be shown the following view.
According to the calculated policy, in both situations, you need to Maximize the advertisement. Well, it is so simple example that, this policy can be evaluated by simple common sense, no need to perform any complicated calculation. But, it is an example of modeling with known answers so that it is a way of validation that the system works. Now, if you have a complicated scenario where you have more than one state and more than one action with not so extreme reward, then the Markov Decision Process can be a very useful tool to find a policy that can be applied in real life. Now, click the "Finish" button. Once you do that, you will see the following view.
Realization of the "Markov Decision Process"
From the above screen, you can see that you can modify all the input that you have entered so far from the wizard. Based on the evaluated policy, a Markov Chain is also formed. Based on the Markov Chain, the forecast chart is displayed in the carousel, as you can see from the above screen.
For the Markov Chain, an initial state needs to be defined, so you can do that from this tab as shown here.
You can see the Decision Graph by clicking the "Decision Graph" button from the Ribbon.
Once you click the Decision Graph button, you will see the decision graph as shown below.
A plethora of charts for the generated Markov Chain
When a Markov Decision Process is modeled, a Markov chain is also generated based on the fact that the user selected the optimum action. various charts can be displayed in the chart section, like Steady State / long term forecast, the forecast for a specific number of iterations, etc.
Even a Custom Expression can be used to find out a calculated State forecast, like "[State A] AND [State B] AND NOT [State C].