Title: How reinforcement learning plays pivotal role in determination of short term action policies of a consumer in order to reach long term goal

of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Devices & Hardware

Published:

Views: 216 | Pages: 6

Extension: PDF | Download: 2

Share
Description
Application of reinforcement learning where training institutes offer multiple short-term courses as professional career solution, in such environment consecutive action policy of a student as per personal long term goal is depicted here. Here a
Transcript
  Title: How reinforcement learning plays pivotal role in determination of short term action policies of a consumer in order to reach long term goal .    by Joyjeet Raha  ,M.Phil student (University of Calcutta); e-mail:joyjeet9@gmail.com  Abstract  :   Application of reinforcement learning where training institutes offer multiple short-term courses as professional career solution, in such environment consecutive action policy of a student as per personal long term goal is depicted here. Here a student (as he was not getting his desired type of  job) visits job-fair & interacts with an academician of one of the training institute stalls w.r.t his long term goal regarding scope of the courses. Lastly he went as per  by the suggestions of the coach and realized his long term dream, although he sacrificed some immediate short term rewards. .He got bigger value of return compared to his investment.   Keywords  : Value,Pay-Off,State,Reinforcement,Policy,Goal,Reward,Cost,Learning,Consumer Behavior.   Introduction  : Goal of a candidate is to teach Research-Methodology (subject) as a trainer (be it in Corporate or at academic institutes).He has completed MBA in systems .He visits a job-fair (as he was not getting his desired type of job), there professional training institutes also participated. Candidate visits one of such stalls of  professional training- course & interacts with an academician.  Part-(A) :Qualitative Analysis of Scenario State-1:Unplanned Meet (a) Candidate : Tells about his academic background & long-term goal. (b)  Academic-Advisor: Advises him to learn Analytics through different software- programmes at his training-institute & course starts with Sas-programming (of minimum course- fee) & terminates with Big-Data & Hadoop package . For that  prerequisite is knowledge of C-Language, friendliness with menu-driven statistical software operations. Gifts him free study materials & manuals with statistical software. Mentions his past-batch- students’ current  status & conveys present & future opportunities/advantages of availing this course. State-2: Appointment based meeting (at i nstitute’s Office) (a)  Candidate: Comes with basic idea of operations with various menu-driven-soft wares that is required to implement algorithms of Statistical Machine Learning & moreover brings some conceptual-doubts. (b) Academic-Advisor: Answers all his queries & after telling some theoretical  basics about Sas-Programming & sends him to class for practical. State-3: For the post of Research Analyst, as a Sas-programmer, student gets a job-offer from an academic  – research-institute but  satisfied student chooses learn the next level of Analytics-Course instead of availing the undesired type of job. (a)   Candidate ( Satisfied ): Returns back to the Coaching-Centre with some new queries regarding Analytics programming through R-Language & deposits money. (b)   Academic-Advisor: Asks him to come after brushing C++ language &  provides him web-links & study materials .On completion of R advises to learn Big-Data with Hadoop package & assures the student that he will get  bigger return value on this investment. State-4: After completing the course he cracks interview for the post of a trainer of Sas,R at an Analytics-training-Institute but that institute demanded basic knowledge of Big-Data from newly appointed trainer and they are even ready to enhance the margin of salary by n-folds for it. Now the candidate joins the faculty-job & in parallel the candidate also registers again to learn the package of Big-Data+Hadoop in the weekend batch.  Part-(B): Quantitative Perspective of Research   Figure 1:Showing the top view of whole scenario. Set of actions={up, down, bottom}.To address the accidental change in ongoing situation in other words to take into a/c of those things which are out of his control ,we introduce (hypothetical) transition probabilities (1) to move right: t r  =.995,(2) to move up: t u =.0025,(3) to move down: t d =.0025 & we suppose value of net-rewards associated with states corresponding to moving up & down are as 1 i.e. R  a =R   b =R  c =R  d =..1 .It seems that compared to right-hand-side directional movement, the transition probabilities for other directional movements are almost negligible so from that we consider movements are almost deterministic in nature here. It can be interpreted as he is totally convinced by the advice of the academician as he finds those words fall in line with his long term goal. So we are mainly interested to collect the net-rewards corresponding to states of horizontal  path (those are R  1 =5, R  2 =10, R  3 =15, R  4 =20).Based on those here we compute the raw expected payoff, while ignoring the discount factor, cost of prevailing state &  action .The computation pattern of expected outcome, for example for 1 st  cell = (R2*t r +R  a *t u +R   b *t d) ). Figure 2: Showing the resultant direction of policy derived from expected outcome. Conclusion : Optimal action here would be to go along right/east direction, that is to go as per by the suggestions of the Analytics-Coach ,it falls in track with respect to his long term goal. From greatest value of payoff, optimal policy for long term reward is determined .Optimal policy shows the shortest sequence of actions from start to goal correspond to minimum cost (w.r.t any format/sense) . By following this pattern of action we gradually suppress the end point variance. Many a times we forego immediate reward with hope that we’ll get bigger reward at the end. References      Malhotra & Dash , “  Marketing Research ” Vol 5 th  edition publisher Pearson agency-(2010) .    Philip Kotler ,”  Marketing Management”   Vol 13 th  edition-(2011) .
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x