Crypto-sentiment Detection in Malay Text Using Language Models with an Attention Mechanism

Background: Due to the increased interest in cryptocurrencies, opinions on cryptocurrency-related topics are shared on news and social media. The enormous amount of sentiment data that is frequently released makes data processing and analytics on such important issues more challenging. In addition, the present sentiment models in the cryptocurrency domain are primarily focused on English with minimal work on Malay language, further complicating problems. Objective: The performance of the sentiment regression model to forecast sentiment scores for Malay news and tweets is examined in this study. Methods: Malay news headlines and tweets on Bitcoin and Ethereum are used as the input. A hybrid Generalized Autoregressive Pretraining for Language Understanding (XLNet) language model in combination with Bidirectional-Gated Recurrent Unit (Bi-GRU) deep learning model is applied in the proposed sentiment regression implementation. The effectiveness of the proposed sentiment regression model is also investigated using the multi-head self-attention mechanism. Then, a comparison analysis using Bidirectional Encoder Representations from Transformers (BERT) is carried out. Results: The experimental results demonstrate that the number of attention heads is vital in improving the XLNet-GRU sentiment model performance. There are slight improvements of 0.03 in the adjusted R 2 values with an average MAE of 0.163 (Malay news) and 0.174 (Malay tweets). In addition, an average RMSE of 0.267 and 0.255 were obtained respectively for Malay news and tweets, which show that the proposed XLNet-GRU sentiment model outperforms the BERT sentiment model with lesser prediction errors. Conclusion: The proposed model contributes to predicting sentiment on cryptocurrency. Moreover, this study also introduced two carefully curated Malay corpora, CryptoSentiNews-Malay and CryptoSentiTweets-Malay, which are extracted from news and tweets, respectively. Further works to enhance Malay news and tweets corpora on cryptocurrency-related issues will be expended with implementing the proposed XLNet Bi-GRU deep learning model for greater financial insight.


I. INTRODUCTION
Currency is an essential component of financial transactions.According to history, money has evolved from its most primitive forms-the barter trading system-to the current money system, known as fiat money.Trade is now more convenient than before thanks to this developed currency system, which can be utilized both physically and digitally via online transfer [1].Over the years, currency has evolved to a higher level via the introduction of a cryptocurrency known as Bitcoin, a digital currency introduced by Nakamoto [2] in 2008.Malaysia started to adopt a cryptocurrency system in 2019, following the policies and regulations of the Securities Commission Malaysia (SC) and Bank Negara Malaysia (BNM) [3], [4].To support the rising public interest among Malaysian merchants in the cryptocurrency industry [4], the Malaysian government has approved the use of a digital exchange platform named "Luno" to enable Malaysians to trade any of the nine cryptocurrency coins that are now supported, including Bitcoin, Ethereum, Ripple, Litecoin, Cardano, Solana, Chainlink, Uniswap, and Bitcoin Cash [5].
Over the years, cryptocurrency market is highly influenced by cryptocurrency enthusiasts' comments and social media presence.In which, these enthusiasts make remarks, and the sentiment of the remark will sway the II.LITERATURE REVIEW Positive and negative news have been observed to influence cryptocurrency values and movements, particularly during a bubble period [21].Cryptocurrency prices typically decline when news or social media posts have a negative tone and vice versa [22].One of the earliest works is presented in [23], which used IBM Watson and Stanford CoreNLP to analyze the sentiment of Reddit and Twitter users.It is reported that sentiment captured from both social media sources contributed to cryptocurrency price change in the market.Therefore, it is worthwhile to investigate how sentiment affects the price of cryptocurrencies.
The existing sentiment analysis methods can be predominantly categorized into two categories, namely the lexiconbased approach and the machine learning approach, which also incorporates deep learning.Prior research studies mainly focused on the lexicon-based approach for sentiment analysis on cryptocurrency-related words or sentences.Examples of lexicon-based approaches are the application of existing financial dictionaries [24], [25] or sentimentbased dictionaries such as the Affective Norms for English Words (ANEW) [26] in order to determine the sentiment polarity of each word or sentence [27]- [30].
In more recent work, Valence Aware Dictionary and Sentiment Reasoner (VADER) [31] is applied to label the sentiment texts in English based on four types of scoring (i.e., positive, negative, neutral, and compound score).The VADER positive and negative scores were used in [32]- [34] for online forums and tweets, while the VADER compound score for tweets was implemented in [35], [36].The experimental results demonstrate that the sentiment features are statistically significant for the task and can be used as the sentiment features for cryptocurrency price forecasting.Another comparable tool for sentiment analysis is TextBlob [37], [38], which determines the polarity and subjectivity of a sentence.The range of polarity values is from -1 to +1, with -1 denoting a negative sentiment and +1 denoting a positive sentiment.Subjectivity also ranges between 0 and 1, with 0 denoting very objective and 1 denoting very subjective.A higher subjectivity suggests that the text contains more subjective information, such as personal opinion, than factual information [37].Unfortunately, despite being well-developed in English, these technologies are not readily available for Malay text.
The use of machine learning methods for sentiment analysis is also feasible.However, such an approach requires sentiment-labeled corpora and word embeddings to be available.For example, Lamon et al. [39] implemented traditional machine learning algorithms using the bag-of-words (BoW) representation on news headlines and tweets labelled with binary labels (1 or 0).It was reported that logistic regression achieved greater accuracy than Support Vector Machine (SVM) and Naïve Bayes.Another study by [40] used multi-class classification to identify a range of sentiment levels from highly negative to very positive by implementing Recurrent Neural Network (RNN) -Long Short-Term Memory (LSTM) deep learning approaches.However, the findings did not achieve promising results as most news, and tweets were labeled neutral, making it less impactful on cryptocurrency prices.Therefore, developing accurate sentiment models requires the correct language resources and tools to ensure the effect of sentiment features is feasible for input to predict cryptocurrency price.
The dynamics of cryptocurrency and their market behavior heavily depend on sentiment.It influences the behaviors of cryptocurrency traders in the market [27], [35], [41], [42].Thus, this study attempts to further elaborate the sentiment analysis method by examining the topic of cryptocurrency in social media (Twitter) and news platforms.The input selected is the Malay web news and tweets on Bitcoin and Ethereum because there is a glaring need for Malay language resources.

III. METHODS
The research framework consists of data collection, preprocessing, and experiments on XLNet language models in developing the sentiment regression model.Fig. 1 presents the overview of the proposed solution.The details of each module are presented in the following subsections.A. Data collection and Pre-processing Malay news headlines and tweets are collected for data acquisition.Malay news headlines were extracted using the Parsehub tool from various websites, including Intraday.my,Utusan Malaysia, Berita Harian, Harian Metro, Astro Awani, and Malaysiakini.Malay tweets were extracted using the TwitterSearchScrapper Python package.The keywords "bitcoin," "btc," "ethereum," and "eth" are used to collect information about Bitcoin and Ethereum.The data was collected for a year, from January 1 to December 31, 2021.The reason for this data scrapping duration was due to the effect from COVID-19 pandemic season that happened since end of 2019 and at its peak throughout the year 2021, thus, the data are more adequate to explore the economic impact through the public's sentiment, particularly towards cryptocurrency-related matter [43], [44].During this period, social media and online news are optimally used due to Movement Control Order (MCO) that restrict people from going out.Most activities were performed online, and people are using online transaction very actively during this period.This also aligns with the data collected in [45] which stated that the year 2020 to 2021 was being crucially focused by media worldwide.Examples of the raw data that was extracted for Malay news headlines and tweets, respectively, are shown in Figs.2(a Before beginning the annotation task, the raw Malay text underwent preprocessing, which involves cleaning the texts.All special characters, URLs, and hashtags were removed during the preprocessing.Next, the emoticons in the messages were catered using a Python tool called Emot.Emot was created initially to handle English text.However, Emot dictionary is modified by converting the English text to Malay text and translates an emoticon into text that expresses emotion.

B. Malay cryptocurrency news and tweets sentiment corpora
After the texts are cleaned, all non-Malay texts for news and tweets were manually removed.Then, all cleaned instances were selected for the news corpus, which is identified as CryptoSentiNews-Malay.On the other hand, for the tweets corpus designated as CryptoSentiTweets-Malay, only 6,000 documents from each Bitcoin and Ethereum dataset were randomly selected.Table 1 shows the total size of each Malay sentiment corpus.Each text unit, such as a news item or a tweet, was manually annotated.The sentiment annotation scoring was conducted by three trained annotators who were native Malay speakers fluent in Malay.The highly negative to the very positive sentiment is reflected in the sentiment scores, which range from -1 to +1.In order to establish an interannotator agreement using Krippendorff's alpha measurement [46] of more than 60%, all annotators were given a codebook for the annotation task and underwent several rounds of training.Due to the challenging work of sentiment scoring, where it is very challenging to detect the degree of positivity or negativity in words, more than 60% agreement is deemed appropriate [47].The final sentiment score for each news headline and tweet was determined by averaging the sentiment scores across the three annotators.

C. Fine-tuning process
First, to create the vector representations, each sentence from the CryptoSentiNews-Malay and CryptoSentiTweet-Malay corpora was tokenized using the XLNetTokenizer for Malay text (also known as "xlnet-base-bahasa-cased"), which has 12 layers of transformer blocks, 768 hidden layers (dimensions), and 12 self-attention heads [48].Then, each dataset, namely, the BTC Malay news, ETH Malay news, BTC Malay tweets, and ETH Malay tweets, was finetuned before being fed into the sentiment regression models.The purpose of fine-tuning is to replace the current values of the pre-trained XLNet language model derived from Hugging Face (https://huggingface.co/) with the sentiment weight that was learned from the annotated corpora.The step for the fine-tuning process is shown in Algorithm 1 and was applied to all four datasets.
Once the datasets are cleaned, the regression task will be performed.This research explores the performance of the CryptoSentiNews-Malay and CryptoSentiTweets-Malay corpora using Bidirectional-Gated Recurrent Unit (Bi-GRU) deep learning.This approach is adopted based on the comparable result of Cantonese tweet rumour detection [49].On top of that, the different multi-head self-attention mechanism is also applied to the Bi-GRU model to see if performance can be enhanced.

D. XLNet-GRU sentiment model
The pre-trained XLNet model that has been fine-tuned is then added to the Bi-GRU layer.The sentiment model is anticipated to give each sentence's sentiment score a continuous value between -1 and +1 for regression analysis.For each dataset, four unique sentiment models were created.A 5-fold validation utilizing an 80% training -20% testing ratio is used to get a generalization result.The Bi-GRU model with 16 hidden states, one output layer and a dropout value of 0.5 Tanh activation function was adopted.The Grid Search of 5-fold cross validation was used to discover the optimized hyperparameter values.The hyperparameter settings for both Malay news and tweet sentiment models are outlined in Table 2.The XLNet-GRU sentiment model architecture is presented in Fig. 4(a) and Fig. 4(b) illustrating XLNet-GRU with multi-head self-attention mechanism model architecture.

E. XLNet-GRU sentiment model with an attention mechanism
In order to observe any potential performance improvement of the sentiment model, the self-attention mechanism [17] was introduced to the GRU deep learning layer model architecture.The multi-head self-attention mechanism aims to learn the semantic information of some sentences by encapsulating the significant words connected to sentiment or specific features of the issue [50].Additionally, it is advantageous to prevent the model from overfitting [51].This study uses four different numbers of attention heads -single-head, 2-head, 4-head, and 8-head attention.The value of attention heads was selected based on the findings by [50], which found that if there are more than eight attention heads, some will learn the same attention weights, which will cause noise in the sentiment regression.
The multi-head self-attention mechanism will produce a particular weight for each word in a sentence, increasing proportionately to how strongly the important word is being focused.Fig. 5 and Fig. 6 illustrate the sentiment words learnt using the generated attention weights from the XLNet-GRU sentiment model implementation.The words with a darker color draw more attention than a lighter color, which translates to a more significant effect on the sentence.It can be observed from Fig. 5 that the Malay word "turun" (which is translated as "drop" in English) in the sentence has the highest attention weight.Such a situation indicates that the attention mechanism allows the model to focus more on words related to a sentiment that is important to predict the overall negative sentiment.Similar patterns are observed with positive sentiment in a sentence shown in Fig. 6.According to Fig. 6, the word "naik" in Malay, which means "up" in English, has more significance than the other words in the sentence.These two examples show two simple situations in which the attention mechanism can be helpful in sentiment regression challenges.Even though attention mechanisms have been quite successful recently, our study's findings indicate that adding an attention layer only sometimes ensures a noticeable performance gain.

Sentence Sentence
Fig. 6 Visualization of attention weights on a positive sentiment word F. XLNet-GRU sentiment model vs BERT-GRU sentiment model BERT is a benchmark for comparative analysis because it has demonstrated strong performance in sentiment analysis in the cryptocurrency domain [8]and its pre-trained language model ('bert-base-bahasa-cased') can be employed on Malay text.This work is motivated by a study by [52] that used the identical pipeline and hyperparameter configurations to compare the XLNet-GRU sentiment model with the BERT-GRU sentiment model.This study tested Malay-BERT on tweets for sentiment classification utilizing hierarchical attention mechanisms.Consequently, six performance results will be presented in the next section for the following experiments:

1) XLNet sentiment model (XLNet language model only without any deep learning model) 2) XLNet-GRU sentiment model (XLNet language model with the application of Bi-GRU deep learning model) 3) XLNet-GRU+ sentiment model (XLNet-GRU with multi-head self-attention mechanism) 4) BERT sentiment model (BERT language model only without any deep learning model) 5) BERT-GRU sentiment model (BERT language model with the application of Bi-GRU deep learning model) 6) BERT-GRU+ sentiment model (BERT -GRU with multi-head self-attention mechanism)
G. Evaluation metrics Three metrics are used to evaluate the proposed sentiment regression model: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and adjusted R 2 .The RMSE metric is good for getting unbiased forecasts but in some cases RMSE is bound to penalize significant errors.The MAE metric is used to protect outliers and is more robust than RMSE, making it a more appropriate metric for error assessment [53].In addition, the adjusted R 2 was also selected as the performance metric because it is more accurate and informative [54].Adjusted R 2 is different from R 2 that it tests many independent variables against the model, which is not being done in R 2 .Therefore, adjusted R 2 measurement is favoured by most investors because it provides a more accurate assessment of the correlation between one variable and another [55].

IV. RESULTS
The performance of the XLNet sentiment model, followed by the XLNet-GRU sentiment model, and the XLNet-GRU sentiment model across different attention heads for the multi-head self-attention mechanism is examined.Finally, the results of three experiments using the BERT sentiment model, the BERT-GRU sentiment model, and the BERT-GRU sentiment model with the best-performing number of attention heads produced for XLNet are presented.The experiments are performed independently for news and tweets.

A. Malay News
The results of the XLNet sentiment model for Bitcoin and Ethereum Malay news are shown in Table 3, with and without the use of the Bi-GRU deep learning model.When utilizing the XLNet with Bi-GRU deep learning model, the adjusted R 2 shows a minor improvement of 0.025 (Bitcoin Malay news) and 0.008 (Ethereum Malay news) from Table 3.The BERT sentiment model yields result like those in Table 4.
When using the Bi-GRU to BERT language model, the adjusted R 2 increases by 0.051 for Bitcoin Malay news and 0.004 for Ethereum Malay news, as presented in Table 4.The results show that a deep learning model can improve performance.Additionally, it is noted that in both XLNet and BERT applications, the adjusted R 2 for Ethereum Malay news is higher than that for Bitcoin Malay news.This might be because there are more texts in Bitcoin Malay news than Ethereum Malay news, and more texts produce more noise.
For Bitcoin and Ethereum Malay news, the XLNet-GRU sentiment regression model results with and without an attention mechanism are shown in Table 5 and 6, respectively.Without an attention mechanism, the adjusted R 2 for Bitcoin Malay news was 0.628, whereas the adjusted R 2 for Ethereum Malay news was 0.653.By incorporating singlehead attention to the model, it is observed that there is a modest improvement in performance, with adjusted R 2 values for Bitcoin and Ethereum Malay news being 0.608 and 0.631, respectively.Subsequently, after testing with different number of attention heads, it is found that 2-head attention yields the best performance for Bitcoin Malay news (adjusted R 2 = 0.628) and 8-head attention gives a good result for Ethereum Malay news (adjusted R 2 = 0.653).

B. Malay Tweets
The results of the XLNet sentiment model with and without using the Bi-GRU deep learning model for Bitcoin and Ethereum Malay tweets are shown in Table 7.It is observed in Table 7 that XLNet-GRU achieved an adjusted R 2 of 0.426, which is 0.014 higher than XLNet adjusted R 2 of 0.412 for Bitcoin Malay tweets, whereas, for Ethereum Malay tweets, an improvement of 0.019 was acquired (adjusted R 2 = 0.381).The same experimental setup was used with BERT; the findings are displayed in Table 8.Similar findings have been obtained, as seen in Table 8, where the Bi-GRU deep learning model has been demonstrated to deliver a modest improvement over the BERT sentiment model.An adjusted R 2 of 0.415 was attained for Bitcoin Malay tweets, while an adjusted R 2 of 0.366 was attained for Ethereum Malay tweets.The results for Malay tweets differ from those for Malay news.When employing XLNet and BERT, the adjusted R 2 for Bitcoinrelated Malay tweets is superior to Ethereum-related Malay tweets.Since the number of documents is the same for Bitcoin and Ethereum Malay tweets, it may be assumed that Bitcoin tweets have more understandable Malay content than Ethereum tweets.
Then, the XLNet-GRU sentiment regression model with and without an attention mechanism was tested for Malay tweets about Bitcoin and Ethereum.Results from Malay tweets are consistent with the observations from Malay news.The XLNet-GRU sentiment model with 2-head attention still yielded the best performance for Bitcoin Malay tweets with an adjusted R 2 of 0.457, showing a slight improvement of 0.038 from the no-attention head.The 8-head attention XLNet-GRU sentiment model performed the best while analyzing Ethereum Malay tweets, achieving an adjusted R 2 of 0.418 with an improvement of between 0.031 and 0.037 of the adjusted R 2 from the no-attention head XLNet-GRU sentiment model.The results are presented in Table 9 and Table 10.11 and Table 12 presents the performance results of XLNet without Bi-GRU, XLNet-GRU sentiment regression models without attention mechanism, XLNet-GRU with the best number of attention heads (XLNet-GRU+), and BERT-GRU with attention mechanism sentiment model (BERT-GRU+) for Bitcoin and Ethereum Malay news and tweets accordingly.The XLNet-GRU+ model with a multi-head attention mechanism outperformed the BERT sentiment model for both Bitcoin and Ethereum news and tweets.Model performance on news is consistently higher than tweets, which can be attributed to the tweet corpus naturally containing more noise.However, we observe a smaller performance gap between news and tweets for Bitcoin while this gap is noticeably wider for Ethereum.The results demonstrate that the Malay news and tweets for Bitcoin (2-head attention) and Ethereum (8-head attention) perform significantly better using the multi-head self-attention technique.A mean adjusted R 2 of 0.620 (noattention) and mean adjusted R 2 of 0.641 (multi-head self-attention) were obtained by averaging across Bitcoin and Ethereum for Malay news.Moreover, a mean adjusted R 2 of 0.404 (no-attention) and mean adjusted R 2 of 0.438 (multi-head self-attention) were achieved for Bitcoin and Ethereum Malay tweets.
The results indicate that the multi-head self-attention mechanism can slightly improve the precision of the XLNet-GRU sentiment regression model, which is in line with the experiment done by [49].The XLNet language model incorporating a deep learning model does produce significant results for sentiment analysis.To the best of our knowledge, there is no existing research study that reported on the sentiment regression evaluation on cryptocurrencyrelated matter in Malay text.Therefore, we compare our results with several related work that used similar dataset domain that addressed regression problem which is sentiment analysis on financial news headlines by [15], which reported their result in R 2 metric, and sentiment analysis on financial microblogging by [16] using MAE and RMSE as their evaluation metrics.However, these research studies used English text.The work by [15] implemented simple regression models such as linear regression, support vector regression, and XGBoost regression and obtained maximum R 2 result of 0.38 for English financial news headlines, while our XLNet-GRU+ model managed to achieve a higher adjusted R 2 of 0.628 (Bitcoin Malay news) and 0.653 (Ethereum Malay news).Adjusted R 2 is always less than or equals to the R 2 , thus, this proves that albeit Malay text has poor language resources, it managed to perform well using our XLNet-GRU+ model.
Moreover, this also supported by comparing the MAE and RMSE achieved by [16] with our work.[16] applied basic machine learning regressors such as linear regression, decision tree, and neural network regression on English microblogging texts such as tweets.It was reported that the lowest MAE achieved was 0.143 using a neural network regression and lowest RMSE of 0.183.Even though our Malay tweets only achieved an average MAE result of 0.174 and an average RMSE result of 0.255, it is still considered good since there is more noise in tweet texts compared to news headlines, plus the low language resource of Malay text.Based on the findings, it provides a potential for further research on Malay language particularly in the financial domain in terms of predicting market prices.
The experiment provides new insight into the relationship between the language models with and without deep learning model integration, as well as the effect of using different numbers of attention heads.Although it can be seen that there is only a slight increase in the adjusted R 2 (precision), it is still considered a beneficial outcome as the model provides a more focused approach toward the important terms in sentiment texts.The study can experiment with two sources (i.e., online news and tweets).However, manual annotations are performed on a few documents due to the limited computational resources, time constraints, and inadequate qualified annotators.

VI. CONCLUSIONS
This study builds explicitly two curated Malay sentiment corpora (CryptoSentiNews-Malay and CryptoSentiTweet-Malay) for the cryptocurrency domain.This will motivate future research to evaluate the effect of sentiment features in cryptocurrency price prediction using various contextual word embeddings and deep learning methods.
Due to computational resource constraints, we could only present the best hyperparameter settings within the investigated value ranges in our studies.These settings are by no means all-inclusive.Additionally, there are mixtures of Indonesian and English words in some of the Malay sentences, and the lack of a cryptocurrency-related dictionary for Malay spelling correction when dealing with tweets during text pre-processing prevents the model from producing a significantly higher accuracy.
More experiments can be conducted to test different embeddings and deep learning algorithms.Despite a year of data collection, the size of the curated Malay sentiment corpora, especially for news, still needs to be considered limited compared to sentiment corpora that are available in other resource-rich languages such as English.Nevertheless, the performance of the Malay corpus, despite its modest size and lack of language resources, is comparable to that of other similar experiments on sentiment regression.
Therefore, this study has successfully addressed the primary objective which is to construct a Malay sentiment regression model.Regression model is better than classification because it provides precise sentiment score as opposed to limited positive/negative sentiment polarity.Furthermore, this study has provided a new set of data that focuses on Malay cryptocurrency text that are gathered during the peak of the Covid19 pandemic that depict the volatile and chaotic cryptocurrency market.
For future work, we plan to continue to increase the size of the Malay news and tweet corpora and include more text sources relevant to cryptocurrency that can help advance research, particularly in the financial domain.Additionally, this model can be integrated as a part of the engine for cryptocurrency price prediction to obtain the

Fig. 5
Fig. 5 Visualization of attention weights on a negative sentiment word

TABLE 7 XLNET
SENTIMENT REGRESSION RESULTS FOR BITCOIN & ETHEREUM MALAY TWEETS WITH AND WITHOUT BI-GRU DEEP LEARNING MODEL