Investment Modelling Using Value at Risk Bayesian Mixture Modelling Approach and Backtesting to Assess Stock Risk

Background: Stock investment has been gaining momentum in the past years due to the development of technology. During the pandemic lockdown, people have invested more. One the one hand, stock investment has high potential profitability, but on the other, it is equally risky. Therefore, a value at risk (VaR) analysis is needed. One approach to calculate VaR is by using the Bayesian mixture model, which has been proven to be able to overcome heavy-tailed cases. Then, the VaR’s accuracy needs to be tested, and one of the ways is by using backtesting, such as the Kupiec test. Objective: This study aims to determine the VaR model of PT NFC Indonesia Tbk (NFCX) return data using Bayesian mixture modelling and backtesting. On a practical level, this study can provide information about the potential risks of investing that is grounded in empirical evidence. Methods: The data used was NFCX data retrieved from Yahoo Finance, which was then modelled with a mixture model based on the normal and Laplace distributions. After that, the VaR accuracy was calculated and then tested by using backtesting. Results: The test results showed that the VaR with the mixture Laplace autoregressive (MLAR) approach (2;[2],[4]) was accurate at 5% and 1% quantiles while mixture normal autoregressive MNAR (2;[2],[2,4]) was only accurate at 5% quantiles. Conclusion: The better performing NFCX VaR model for this study based on backtesting using Kupiec test is MLAR(2;[2],[4]).


INTRODUCTION
Economic uncertainty in the current pandemic has changed some aspects of consumer behaviour in Indonesia. Stock investment is on the rise among the Indonesians people today. This is shown by a DBS bank survey, which states that people will choose to save and invest their money rather than spending it on unnecessary consumer products or services [1]. In doing investment, people certainly hope to get a return after a certain period of time, but they also face a risk of loss. Risk assessment is needed to minimize such risk of loss.
As most people were in lockdown and shopping was considered more convenient online, sales on e-commerce soared to 66% during the COVID-19 pandemic [1]. This makes risk analysis on e-commerce stocks interesting to research. In 2020, three e-commerce companies took the floor on the Indonesia Stock Exchange (IDX), one of which is PT NFC Indonesia Tbk (NFCX) [2]. This company is the newest e-commerce company joining the IDX and there has been no research on the company related to the risk analysis [2]. Therefore, this study chose NFCX as the subject.
Value at Risk (VaR) is a tool commonly used to measure financial risks when the distribution of potential loss positions is not generally known [3]. VaR is generally estimated by assuming returns that have a normal distribution. However, this may not be appropriate because, in reality, return distribution is not always normal [4]. Some results of stock return visualization indicate that the data distribution is leptokurtic, which tends to resemble Laplace distribution. That being said, this does not mean that Laplace distribution is the one suitable for determining VaR. It is partially suitable because of the long-tail and asymmetric conditions that may occur in the stock return data. This kind of financial data could be handled by the asymmetrical distribution of Laplace [5]. Research shows that a parametric method based on skewed and fat-tails is the best method for determining VaR, especially when time variations are considered and independent and identical return distributions are ignored [6].
A method with a mixture model approach has been used in several studies to solve problems that occur in unimodal data [7] [8]. By considering the skewed and the heterogeneity, the mixture model method can improve the accuracy 12 of predictive cases [7] [8]. It has been proven to be better than the separate one [9] [10] and suitable to be applied to the autoregressive (AR) model, a time series model. Meanwhile, a mixture normal autoregressive (MNAR) has been developed both to consider the mixture of normal distribution and to analyse the time series that shows regimeswitching behaviour. The method considers the probability and weight of mixing based on past values, so that a) the stationarity and ergodicity of the underlying stochastic process could be easily established; and b) the explicit expression of the low-order of stationary marginal distribution is known. This is in contrast with other majority nonlinear autoregressive models [11]. The ability of MNAR to overcome the heavy-tailed problems that occur in unimodal data makes it applicable for VaR modelling [12] [13] [14] [15].
As for the pattern of actual conditions, Laplace distribution is able to capture it in mixture autoregressive models [16]. It has been proven to be more robust than the normal distribution in linear mixture models [17]. Since the distribution of returns tends to be peak similar to that of the asymmetric Laplace distribution, the current research uses MLAR for the VaR modelling [18]. Research [18] has shown how Bayesian MLAR approaches model the VaR of Islamic stock; the performance was then compared with the Bayesian MNAR analysis. The results showed that the Bayesian MLAR model performed better than the Bayesian MNAR model.
In practice, the results of VaR modelling such as this should be evaluated using a backtesting method in order to determine the best model. Kupiec Test developed by [19] is a backtesting procedure commonly used for VaR modelling studies [6] [20] [21]. All things considered, this study aims to determine the NFCX VaR model using Bayesian mixture modelling and backtesting.

II. METHODS
The analysis was carried out using stock closing data retrieved from https://finance.yahoo.com/quote/NFCX.JK/history?p=NFCX.JK from 12 June 2018 to 4 August 2020. Generally, there are four steps in doing the risk-return analysis: component identification of the mixture autoregressive models; analysis of the Bayesian mixture autoregressive models; VaR modelling; and evaluation of VaR model using Kupiec test. Before conducting the risk-return analysis, the return of NFCX stock must be calculated by (1).

= +
(1) where is the stock price return on the day, is the stock price on the day, and is the stock price on the ( − 1) day [22].
The first step of conducting a return-risk analysis is the component identification of the mixture autoregressive models. At this stage, the numbers of autoregressive mixture components and the autoregressive AR(p) components are determined by analysing the histogram of the return and autoregressive modelling respectively. The following is the AR(p) model [23] as seen on (2).
The is the return on the ( − ) time, is the autoregressive order, is the parameter of autoregressive, and is white noise. The order in AR(p) model is determined by ensuring that the return data is a qualified stationery in mean and variance. The parameters of the AR(p) model that have been specified are then estimated and tested to find out its significance. Only the significant ones are qualified to be mixed.
The AR(p) models that will be mixed in MNAR and MLAR modelling are the same. The difference is that the MNAR model uses normal distribution and MLAR model uses Laplace distribution. In MNAR( ; ( )) model, the conditional distribution of | ; is a normal mixture with conditional density function of (3).
where is the proportion of mixed components, and ; , , is the density function of , , . Let = ( , , … , , , , … , , , , … , ) denotes the vector of autoregressive parameters [15]. While in MLAR( ; ( )) model, the conditional distribution of | ; is a Laplace mixture with conditional density function of (4). where Equation (5) is Laplace density function with mean , variance , and (6) is the parameter vector of mixed models. The residual of the MLAR model is also considered to have a Laplace distribution [17].
Markov Chain Monte Carlo (MCMC) is an algorithm of the Bayesian inference commonly used to estimate parameter by generating samples from a given distribution. The subsequent sample is chosen based on the sample taken previously. This is done by determining the initialization at the start of the sampling. As a result, the sample taken forms the MCMC , , … , chain. The distribution of the given depends only on all the preceding θ at the most recent value, which is . The generated samples are not independent, but still identically distributed if the Markov chain is stationary [24]. This parameter is then corrected to obtain a value that is closer to the target of the posterior distribution of ( | ). Gibbs sampler is one of the MCMC methods that can solve multidimensional problems. In normal and Laplace case, = ( , , ) and posterior is ( , , | ). The Gibbs sampler will help estimate , , and iteratively following the sampling scheme.

Repeat step two T times, → ∞
In estimating the mixture parameters, step 2 must estimate as much as K of the mixture components of both , , and . The samples generated using the above algorithm will have a convergent and stationary data pattern and will be proportional to their respective distributions [18] [25]. The confirmation of an ergodic Markov chain must be made to identify the existence of limiting distribution in this chain. It can be divided into three sections, namely irreducible, periodicity, and recurrent and transient states [26]. The parameter significance test is used to select the suitable ones for the model. Testing the parameters resulting from the estimation with Bayesian MCMC assumes that the null hypothesis of = 0 and the alternative hypothesis of ≠ 0. The null hypothesis is rejected if in confidence interval (1 − ) of posterior, the credible interval does not contain null [27]. After obtaining the MNAR and MLAR model, deviance information criterion (DIC) for each MLAR model is calculated and the model with the smallest DIC is selected. The DIC formula is shown on (8).
is the posterior mean of the deviance that is defined as −2 ( | ) . is the effective number of parameters and is given by [29]. The best model of MNAR and MLAR that has been obtained is used to determine the VaR of each mixture model. It is calculated on (9).
where for MNAR best model is shown on (10), and for MLAR best model is shown on (11).
The last stage to evaluate the VaR model is by using backtesting, which is a statistical procedure to systematically compare the actual gains and losses with the estimated VaR. The most widely used backtesting, the Kupiec test, which is also known as the POF (proportion of failure) test, measures whether the number of exceptions is consistent with the quantile [30], which follows the binomial distribution. In other words, the information needed to perform the Kupiec test is the number of observations ( ), the number of exceptions ( ), and the quantile [2]. The null hypothesis of this test is = ̂ and the alternative hypothesis is ≠ . The test statistic used is the likelihood ratio ( ) [19] equated on (12).
where is the probability of failure in the quantile. is asymptotic with a chi-square distribution ( ) and a degree of freedom is 1. The null hypothesis is rejected if is greater than . Accordingly, the VaR model is declared valid if the null hypothesis is accepted. Finally, the last result of this analysis will show the VaR that the investors of PT. NFC Indonesia Tbk (NFCX) will face.

III. RESULTS
This section presents the results of the VaR modelling process, which consists of four steps-the identification, the analysis, the modelling and the backtesting. The best mixture model determined based on a normal and Laplace distribution is derived from the comparison of several mixture VaR models.

A. Component Identification of Mixture Autoregressive Model
The identification determines the number of components and which components of autoregressive and AR(p) to be mixed. The number of components of mixture autoregressive is detected from the histogram, whereas the mixable autoregressive components AR(p) are those with significant parameters.
The histogram of the NCFX return data, as presented in Fig. 1, shows that the data has outliers. The outliers of the data are indicated in the histogram as right-skewed or positively-skewed patterns. Besides, Fig. 1 also identifies the shapes of its frequency distribution, namely leptokurtic, platykurtic, and mesokurtic. If the peak of the curve is higher than the normal distribution, it is considered leptokurtic; if it is lower, it is platykurtic; and if it is the same, it is mesokurtic [31]. The histogram shows that the data have a higher peak than the normal distribution, so they are leptokurtic, which tends to be more similar to the Laplace distribution (blue line) than the normal distribution (red line). However, the outliers and the high variance result in a mismatched Laplace distribution. Variability in a histogram is higher when the taller bars are spread away from the mean; and lower when they are closer to the mean. The solution to these cases is to form a combination for each distribution. This results in two components for each distribution. One component is leptokurtic and the other is platykurtic. The two components in the Laplace distribution are shown by the black and yellow dash line, while the normal distribution is presented by the green and magenta dash line. Platykurtic conditions are expected to overcome the high variation. Journal of Information Systems Engineering and Business Intelligence, 2021, 7 (1), 11-21 15 Fig. 1 The Histogram of NFCX Return After obtaining the number of components, the AR(p) was selected using parameter significance. The order of AR(p) was determined by ensuring that the data was stationary in mean and variance. Stationarity of mean was detected from time series plot and augmented dickey fuller (ADF) test. The data in the time series plot, as presented in Fig.2, fluctuate around the mean. The plot indicates that the data are stationary in mean. This result was confirmed by the ADF test. To recall, the null hypothesis is when the data are not stationary in mean, whereas the alternative hypothesis is when the data are stationary in mean. Because the P-value (0.01) is less than the significant level (0.05), the null hypothesis is rejected. Furthermore, the data must also be stationary in variance. Stationarity in variance was detected by rounded value ( ) where the rounded value of the data equals 1. However, the NFCX return data did not fulfill the assumption since = −1; thus, the data must be Box-Cox transformed until = 1. After = 1, the order of AR(p) can be detected.  16 AR(p) order is determined by autocorrelation (ACF) and partial autocorrelation (PACF) plot, namely those which had the same cut off p-lag. The cut off is decided based on the lag that exits the blue interval limit. ACF and PACF cut off lag 1, 2, 4, and 9, respectively. The order of AR(p) models can be seen in Table 1.
Parameters were estimated using the Bayesian method with the function of ~ + + ⋯ + , . The prior distribution for autoregressive parameters ( ) was conjugate prior and was noninformative prior. This is defined by limiting the model to a relatively simple likelihood function with a suitable formula for the previous distribution [27]. The noninformative prior is a distribution that has a greater range of uncertainty than the reasonable parameter value [32]. The prior autoregressive parameter is normal distribution, while the standard deviation ( ) is inverse Gamma. The estimation and significance of the parameters results can be seen in Table 1. A 95% credible interval indicates the 2.5 th percentile and the 97.5 th percentile since this has been used by some authors and software [27]. Table 1 shows that all parameters of autoregressive models that contain p = 1 are not significant because the credible interval contains 0. However, all autoregressive models that do not contain p = 1 are significant.

B. Analysis of Bayesian Mixture Autoregressive Model
The mixture models are created by mixing two components of significant AR(p) as presented in Table 1. The number of mixed models is 42 consisting of 21 MNAR and 21 MLAR. In this study, only six models-three MNAR, and three MLAR-are shown. Parameter estimation began by creating a directed acyclic graph (DAG) for each model. DAG for MNAR and MLAR can be seen in Fig.4. The models are assumed as follows.  for MNAR models. The prior distribution for the autoregressive parameters ( ) is the conjugate prior for MNAR models; whereas the prior distribution for MLAR model is the pseudo priors, whose value is determined based on the frequentist estimation [33]. The pseudo prior of the MLAR models is based on the parameter estimation of the AR(p) models. The prior distribution for other parameters ( , , ) is noninformative prior.
The results of parameter estimation using DAG structure-as presented in Fig. 4-for each mixture autoregressive models are shown in Table 2. It shows that all MLAR and MNAR models are significant because the 95% credible interval does not contain 0. The selection of MNAR and MLAR models was measured using DIC with those being the smallest DIC is considered the best. Table 3 presents the best model of MNAR, namely MNAR (2; [2], [2,4]); and MLAR, namely MLAR(2; [2], [4]). The DIC of those models are -3293.10 and -3698.88 respectively. The DIC of the MLAR(2; [2], [4]) model is smaller than the MNAR(2; [2], [2,4]); however, the isolated model cannot be said to be the best model for estimating VaR; instead, both models can be used to calculate VaR.

IV. DISCUSSION
This research has a limitation on the accuracy of the VaR because it is tested only as a one-day investment. This is based on several references that have been referred to [34] [21]. Although it is known that the longer the investment the greater the risks, this information is not enough. The amount of risk that will be faced during the investment period that has been determined in this study-the five-day and twenty-day periods-also needs to be tested for its accuracy so that investors will be more confident in investing. In addition to knowing the accuracy of the risks faced during a specific investment period, a reanalysis is needed to corroborate the finding in this study.
This study result denotes that the VaR accuracy done using the Bayesian MLAR approach is more accurate than the Bayesian MNAR in terms of backtesting. This is in line with the comparison results of the mixture models based on DIC, where the best mixture model produces an accurate VaR model. In reality, this does not mean that the best mixture model produces accurate VaR, which can be seen in the research done by [2]. Hence, the backtesting test should be done on all already-formed mixture models. Also, the use of one backtesting method also limits the interpretation of the study because the Kupiec method is not always correct. Thus, backtesting methods other than Kupiec can also be added to increase the level of accuracy. Lastly, the analysis is only based on stock data in the past period. The research should be carried out by considering the factors that influence changes in share prices. Several factors are believed to affect stock prices, including oil and gold market prices and their volatilities [35].

V. CONCLUSIONS
This research was conducted to obtain an NFCX stock investment VaR model (NFCX VaR model) obtained with the Bayesian mixture model approach. The best VaR model is accurate at 5% and 1% quantiles based on the backtesting results using the Kupiec test. The research shows that the best VaR model results from MLAR(2; [2], [4]) approach, which consist of two-component Laplace mixed distribution with one component being leptokurtic, and the other platykurtic. This model's goodness shows that the model is accurate at 5% and 1% quantiles with accuracy of testing limited to one day.
Based on the discussion results, several further studies can be developed, one of which is to test VaR's accuracy not only during one day but during a predetermined period horizon. The backtesting method used should not only be the Kupiec test in order to improve the accuracy of the model. Modelling should be done with historical data from stock prices and data from factors that affect stock prices.