Construction of Optimal Prediction Intervals for Load Forecasting Problems

of 9
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report



Views: 6 | Pages: 9

Extension: PDF | Download: 0

Construction of Optimal Prediction Intervals for Load Forecasting Problems
  Deakin Research Online  Deakin University’s institutional research repository DDeakin Research Online Research Online This is the published version (version of record) of: Khosravi, Abbas, Nahavandi, Saeid and Creighton, Doug 2010-08, Construction of optimal prediction intervals for load forecasting problems  , IEEE transactions on power systems , vol. 25, no. 3, pp. 1496-1503. Available from Deakin Research Online: Reproduced with kind permission of the copyright owner.  ©20 10  IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Copyright : 2010, IEEE  1496 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 25, NO. 3, AUGUST 2010 Construction of Optimal Prediction Intervalsfor Load Forecasting Problems Abbas Khosravi  , Member, IEEE  , Saeid Nahavandi  , Senior Member, IEEE  , and Doug Creighton  , Member, IEEE   Abstract— Short-termloadforecastingisfundamentalforthere-liable and efficient operation of power systems. Despite its impor-tance, accurate prediction of loads is problematic and far remote.Oftenuncertaintiessignificantlydegradeperformanceofloadfore-castingmodels.Besides,thereisnoindexavailableindicatingrelia-bilityofpredictedvalues.Theobjectiveofthisstudyistoconstructprediction intervals for future loads instead of forecasting theirexact values. The delta technique is applied for constructing pre-dictionintervalsforoutcomesofneuralnetworkmodels.Somesta-tisticalmeasuresaredevelopedforquantitativeandcomprehensiveevaluation of prediction intervals. According to these measures, anew cost function is designed for shortening length of predictionintervals without compromising their coverage probability. Simu-lated annealing is used for minimization of this cost function andadjustment of neural network parameters. Demonstrated resultsclearly show that the proposed methods for constructing predic-tion interval outperforms the traditional delta technique. Besides,it yields prediction intervals that are practically more reliable anduseful than exact point predictions.  Index Terms— Load forecasting, neural network, prediction in-terval. I. I NTRODUCTION T OREMAINcompetitivein theprivatizedand deregulatedmarkets of power generation, it is vital for companies toreduce their operating cost. Over estimation of loads may leadto excess supply, and consequently, increment of operationalcosts. On the other hand, under estimation may result in lossof reliability to supplying utilities. Therefore, formulating op-timal strategies and schedules for generating power is of utmostimportant for utility companies. Such planning can potentiallysave millions of dollars per year for utility companies [1]. Fur-thermore, many operational activities within power systems, in-cluding, among others, unit commitment, economic dispatch,automatic generation control, security assessment, maintenancescheduling, and energy commercialization are usually sched-uled on the basis of short-term load forecasting (STLF). Thelead time of forecast may vary between minutes to days.Motivated by these, a countless number of numerical studieshave been reported in the scientific and industrial literature.Loosely speaking, all STLF methods can be divided into two Manuscript received October 12, 2009. First published March 11, 2010; cur-rentversion publishedJuly21,2010. This workwassupported bythe Centre forIntelligent Systems Research (CISR) at Deakin University. Paper no. TPWRS-00805-2009.The authors are with the Centre for Intelligent Systems Research (CISR),Deakin University, Geelong, Vic 3117, Australia (e-mail:;; versions of one or more of the figures in this paper are available onlineat Object Identifier 10.1109/TPWRS.2010.2042309 broad categories: statistical methods (parametric), and artifi-cial intelligence-based techniques (nonparametric). Statisticalmethods include regression models (linear or piecewise-linear)[2], Kalman filter [3], and time series (autoregressive movingaverage models) [4], [5]. The inherent complexity and non- linearity of relationships between electric loads and theirexogenous variables make application of these techniques forload forecasting problematic. Forecasters developed based onthese techniques are often prone to bias [6].On the contrary, artificial intelligent based techniques, andin particular neural networks (NNs), possess an excellent capa-bility of learning and approximating nonlinear relationships toany arbitrary degree of accuracy (universal approximators) [7].Applications of expert systems [8], [9], NNs [10]–[12] (and ref- erences therein), fuzzy systems [13], and neuro-fuzzy systems[14] have proliferated for STLF within the last two decades. IthasbeenalsostatedthatthemajorityofcommercialSTLFpack-ages used by utility companies have been developed based onartificial intelligent-based techniques (mainly NNs) [15], [16]. A good review of NN-based STLF can be found in [11], [12], and references therein.Recently reported reviewing studies indicate that in many en-gineering and science fields NNs significantly outperform theirtraditional rivals in term of prediction and classification accu-racy [17]. There is, however, some skepticism related to theperformance of NNs for STLF [12] (and references therein). Ithas been mentioned that in the majority of conducted studies,NN models have been 1) unnecessarily very large and 2) over-fitted. The first problem can be easily managed through devel-oping NNs in a  constructive  approach [18]; i.e., NN complexityis increased whenever it does not satisfy the prediction require-ments. Practicing this principle satisfactorily guarantees min-imality of NN size. Overfitting can be also avoided throughusing theoretically well-established methods such as Bayesianlearning algorithm or weight decay cost function technique [7].Despite countless reports on successful application of NNsfor STLF, here we argue that modelers have often lost sight of abasic characteristic of NNs. NN models are theoretically deter-ministic [7], and by that, their application of predicting futureof stochastic systems is always in doubt and questionable [19].It is empirically very important to notice that loads often showcompletelynonlinearandinsomecaseschaoticbehaviors.Theirfluctuationsthrough thetime are erratic and influenced by manyknown or unknown factors. In either case, often informationabout influencing factors is uncertain. Unreliability of forecastsofweather conditionsandtemperature variationsare oftenhigh.Although local system failures are compensable though consid-ering power generation surplus, they may dramatically changesystem behavior and stability. Uncertainties and probabilistic 0885-8950/$26.00 © 2010 IEEE  KHOSRAVI  et al. : CONSTRUCTION OF OPTIMAL PREDICTION INTERVALS FOR LOAD FORECASTING PROBLEMS 1497 events highly contribute to the degradation of performance of NN models for load forecasting. Negative consequences raisedfrom the stochastic nature of power systems cannot be compen-sated solely through increasing NN size (neither hidden layersnor neurons) or repeating its training procedure. With the pres-ence, occurrence, and accumulation of these uncertainties andprobabilisticevents,powersystemslooklikeastochasticsystemwith volatile behaviors in term of load demands in future. Asthere is more than one probable reality for future of these sys-tems (load demands in future), any claim about accuracy of fu-ture prediction is dubious and untrustworthy.Seeking to remedy these defects, construction of predictionintervals (PIs) has been proposed in literature. By definition,a PI with confidence level of % is a random intervaldeveloped based on past observations,for future observations, , such that. PIs indicate theexpected error between the prediction and the actual targets.Furthermore, they convey more meaningful information thanpredicted point values. Of utmost importance is level of confi-dence, giving PIs an indication of their reliability. In literature,different schools of methods exist for construction of PIs: 1)delta technique [20], [21], 2) Bayesian technique [7], [22], 3) bootstrap [23], [24], and 4) mean-variance estimation [25]. The cornerstone of the delta technique lies in interpreting NNsas nonlinear regression models and linearizing them basedon Taylor’s series expansion [20]. The Bayesian techniqueinterprets the NN parameter uncertainty in terms of probabilitydistributions and integrates them to obtain the probability dis-tribution of the target conditional on the observed training set[7]. The bootstrap technique is essentially a resampling methodthat its computation requirement is massive. The fourth schoolis implemented through developing two NNs for predictionof mean and variance of targets. Selection of any of thesetechniques for constructing PIs depends on problem domain,computation burden, number of available samples, and analysispurpose. Construction of PIs has been a subject of much atten-tion in recent years. Examples are temperature prediction [26],travel time prediction in baggage handling system [27], [28], watershed simulation [29], solder paste deposition process [30], and time series forecasting [31].To the best of our knowledge, power engineering field, andin particular STLF domain is void of information about sup-portingtheoriesandapplicationsofPIs.Motivatedbythesegapsin practical and scientific research, one fold of this study aimsat applying the delta technique to the STLF problem. Instead of developingandexploitingNNsforyieldingexactloadforecasts,PIs with a high confidence level % are constructed forfuture loads. In experiments with real data, it is demonstratedthatPIsareempiricallymoreusefulandreliablethanexactpointpredictions.Another fold of this research concentrates on designing prac-tical indices and measures for quantitative evaluation of PIs.Literature only offers a measure for evaluating coverage proba-bility of PIs. Often discussion about length of PIs (and similarlyfor confidence intervals) is ignored or represented ambiguously[30], [32]–[34]. Here, we propose a new measure for quantita- tive evaluation of PIs that covers both aspects of PIs: length andcoverage probability. With regard to this new index, a new costfunction is developed for improving quality of PIs (squeezingPIs without compromising their coverage probability). Amplecare is exercised in definition of the new cost function to keepfundamental assumptions of the delta technique valid. As cal-culation of mathematical characteristics of this new cost func-tionisveryproblematic(ifnotimpossible),gradient-basedopti-mizationmethodsarenotapplicableforitsminimization.There-fore, stochastic optimization techniques should be employedfor its minimization. In this study, simulated annealing (SA) isadopted for minimization of this cost function in order to adjustNN parameters. It is shown that PIs developed using the opti-mized NNs are effectively narrower with at least the same cov-erage probability like PIs constructed using NNs trained basedon traditional techniques such as Levenberg-Marquardt tech-nique [7].The rest of this paper is organized as follows. Section IIprovides a brief review of fundamental theories of the deltatechnique. The new PI assessment measure is explained inSection III. Section IV represents the new cost function and itsminimization procedure. Experimental results are demonstratedin Section V. Finally, Section VI concludes the paper with someremarks for further study in this domain.II. T HEORY AND  B ACKGROUND  A. Delta Technique for PI Construction The delta technique is based on representation and interpre-tation of NNs as nonlinear regression models. This allows ap-plying standard asymptotic theory to them for constructing PIs.According to this, one may represent them as follows:(1)and are, respectively, the th set of inputs ( indepen-dent variables) and the corresponding target (dependent vari-able). with is the nonlinear function representing thetrue regression function. is also the number of observations., an estimate of , can be obtained through minimization of sum of squared error (SSE) cost function(2)where . A first-order Taylor’s expansion of around the true values of model parameters can be ex-pressed as(3)where is gradient of (here NN models) with respectstoitsparameters, ,calculatedfor .Withtheassumptionthatin (1) are independently and normally distributed ,the % PI for is(4)  1498 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 25, NO. 3, AUGUST 2010 is the quantile of a cumulative t-distributionfunction with degrees of freedom. here is the differencebetween number of training samples and number of NNparameters . is also the Jacobian matrix of the NN modelwith respect to its parameters.The cost function defined in (2) is only related to the predic-tion errors and does not put any penalties on the network sizeor constrain the parameter magnitudes. This may result in sin-gularity of matrix , that in turn makes computed PIs lessreliable. Inclusion of some weight decay terms in (2) can poten-tially solve this problem. The new cost function therefore willbe (weight decay cost function)(5)where is the regularizing factor [7]. Adjusting NN parametersthrough minimization of this cost function often improves theNN generalization. Rebuilding PIs based on (5) will yield thefollowing PIs:(6)where . Calculationof in (6) is as follows:(7)where .  B. Simulated Annealing SA is a gradient-free optimization technique first introducedin [35]. SA is based on the annealing of metals. If a metalis cooled slowly, its molecules enter a crystal structure. Thiscrystal structure represents the minimum energy state. Essen-tially, SA is a Monte Carlo technique that can be used forseeking out the global minimum. The effectiveness of SA isattributed to the nature that it can explore the design space bymeans of neighborhood structure and escape from local minimaby probabilistically allowing uphill moves. The primary virtuesof the SA method for optimization are as follows: first, since noderivativeinformationisneededduringthesearch,SAperformswell in conjunction with nondifferentiable cost functions, andsecondly, SA is stochastic, thus it has better chances to explorethe entire design space and reach the global optimum.SA system is initialized at a temperature with a configu-ration whose energy is evaluated to be . A new con-figuration ( with new energy level ) is constructed byapplying a random change. Decision about acceptance or rejec-tion of the new configuration is made based on the differencein energy level . The new configu-ration is unconditionally accepted if it lowers the energy of thesystem . If the energy of the system is increased bythechange,thenewconfigurationisacceptedwithsomerandomprobability, , where is the Boltz-mann factor. If , where is a random number between0 and 1, the new configuration is approved. This process is re-peated sufficient times at the current temperature to sample thesearch space, and then the temperature is decreased based ona cooling schedule. This procedure continues until one of thestopping criterion is met.Examples of cooling schedules are geometric and ex-ponential. Generally, the higher the temperature, the morelikely the acceptance of an uphill transition. This meansthat in early stages of optimization, SA behaves like arandom walk. Mathematically, should be chosen so that. As decreases,SA becomes a greedy optimization search looking for globaloptimum. When , SA becomes totally greedy and onlyaccepts good changes. Further information about SA and itsfundamental theories can be found in [35] and [36].III. Q UANTITATIVE  M EASURES FOR  PI A SSESSMENT As discussed before, literature does not offer a suitable mea-sure for comprehensive assessment of PIs. In this section a newgeneral examination measure is proposed that covers both im-portant aspects of PIs: length and coverage probability. As theproposed measure is general and developed based on featuresof PIs (not the utilized method for constructing PIs), it can beapplied in other relevant studies as well.Theoretically, one can characterize PIs based on their lengthand coverage probability. One approach for quantitative assess-ment of PI lengths is to normalize each interval length with re-gard to range of targets. Following this, a measure called nor-malized mean prediction interval length (NMPIL) can be ob-tained as follows:(8)Normalization of PI length by the range of targets makes theobjective comparison of PIs possible, regardless of techniquesused for their construction or magnitudes of the underlying tar-gets. The upper bound of NMPIL is one, obtained for the casethat minimum and maximum of targets are considered as upperand lower bounds of PIs for all targets. Usually, the smaller theNMPIL, the more useful the PIs. The lower bound of NMPILis model dependent and is dominated by mean squared error(MSE) of NN models. Assuming that in the ideal case, the gra-dient term in (4) and (6) vanishes for unobserved samples, onecan obtain the lower bound of NMPIL for the delta technique asfollows:(9)Practically, achieving for PIs is far remote. Thisstems from the fact that gradient terms in (4) and (6) are notignorable. Indeed, they are often big for unobserved (test) sam-ples, as these samples are not used in the training stage of NNs.Empirically, it is desirable to have PIs such that theiris as small as possible. Although it is possible to  KHOSRAVI  et al. : CONSTRUCTION OF OPTIMAL PREDICTION INTERVALS FOR LOAD FORECASTING PROBLEMS 1499 have a very small for training samples (and by that mini-mizing ), this often leads to overfitting problems.While overfitting results in NNs with poor generalization (veryhigh MSE for unobserved samples), it negatively contributesto coverage of PIs. While NMPIL relates to the length of PIs,another measure is required for monitoring coverage of PIs.If PIs are deliberately squeezed in favor of achieving smallerNMPIL, many targets may drop out of PIs. Therefore, anothermeasure is required for quantification of this phenomenon. ThePI coverage probability (PICP) indicates the probability thatthe underlying targets will lie within the constructed PIs. It canbe calculated through counting the covered targets by PIs:(10)where if ; otherwise, .Theoretically, PICP should be as close as possible to its nom-inalvalue, %,theconfidencelevelthatPIshavebeencon-structed based on. Unfortunately, in reality this often does nothappen. Imperfectness of PICP is attributable to the presence of noise in samples and severe effects of uncertainty. Other issuessuch as under-fitting and over-fitting [which are direct results of using (very) small or big NNs] also contribute to the unsatisfac-tory smallness of PICP.PIs whose PICP is the highest possible value are a matter of interest. Such high PICP can be simply achieved through con-sidering target ranges as PIs for all samples. Needless to say,wide PIs like these ones are practically useless. This argumentmakesclearthatjudgmentaboutPIsbasedonPICPwithoutcon-sidering length of PIs (here, NMPIL) is always subjective andbiased. It is essential to evaluate PIs simultaneously based ontheir both key measures: length (NMPIL) and coverage proba-bility (PICP). Put in other words, these two measures should beread and interpreted in conjunction with each other.Generally,PIlengthsandPICPhaveadirectrelationship.Thewider the PIs, the higher the corresponding PICP. This meansthat as soon as PIs are squeezed, some targets will lie out of PIs,which results in a lower PICP. According to this discussion, thefollowing coverage-length-based criterion (CLC) is proposedfor comprehensive evaluation of PIs in term of their coverageprobability and lengths(11)where is the sigmoidal function defined as follows:(12)and are two controlling parameters determining howsharply and where the sigma function rises. The level of confidence that PIs have been constructed based on can beappropriately used as a guide for selecting hyperparametersof CLC. One reasonable principle is that we highly penalizePIs that their PICP is less than %. This is based on thetheory that the coverage probability of PIs in an infinite numberof replicates will approach towards %.Generally, as increases, the sigmoid function drops moresharply in higher values of PICP. The exact area of fall can becontrolled by values of . The critical values of PICP are deter-mined based on the confidence level of PIs, %. For in-stance, if the confidence level is 90%, values of and can beeasily adjusted to guarantee sharp drop of the sigmoid functionfor %. Based on this, the CLC will highly increase,no matter what the length of PIs is. In this way, PIs with unsatis-factorily high coverage probability are heavily penalized. Gen-erally, smallness of CLC is an indication of goodness of con-structed PIs (simultaneously achieving small NMPIL and highPICP). Smallness or bigness of CLC is totally case-dependant.However, if PICP is sufficiently high, CLC and NMPIL will bealmost the same.IV. PI O PTIMIZATION  P ROCEDURE As discussed in Section I, literature (with the exception of power engineering domain) is rich in applications of (4) and(6) for constructing PIs. Despite these reports, there are manyissues left unarticulated in this domain. One issue, which is infactthemainmotivationforconductingthisresearch,ishowPIscan be constructed to have the minimum length with the highestcoverage probability. The motivating argument here is that PIconstruction in scientific literature has always been investigatedfrom a point prediction perspective. As our focus here is on PIs,it is more reasonable to develop a cost function based on ex-planatoryfeaturesofPIs(lengthandcoverageprobability).Thisnew cost function then can be appropriately used for adjustingNNparameters.Webelievethatsuchattitudeisonestepforwardin turning focus from point prediction to optimally constructedPIs.The first problem in the definition of a new cost function isthat the delta technique is based on minimization of the tradi-tional cost functions defined in (2) and (5). All supporting the-ories of the delta technique are valid when NN parameters areadjusted based on these cost functions. For both of these costfunctions, the designing principle is minimization of predictionerror. To keep those theories valid, any effort for design of anew cost function needs to somehow cover the prediction error.With regards to this discussion and with the purpose of opti-mizing length and coverage probability of PIs, the followingPI-error-based cost function (PICF) is introduced for trainingparameters of NNs(13)Thefirsttermintherightsideof(13)hasbeendefinedin(11).It corresponds to the basic characteristics of PIs: NMPIL andPICP as defined in (8) and (10), respectively. The second termis an exponential term of the difference of the weight decay costfunctions(5)calculatedfortwosetsofNNparameters: ob-tained through minimization of (13) and obtained basedon minimization of (5). The exponential terms in (13) convertssmall differences in WDCFs into big values [can be potentiallymuch bigger than CLC in (13)]. Therefore, any action (here
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!