Crypto-sentiment Detection in Malay Text Using Language Models with an Attention Mechanism


November 1, 2023


Background: Due to the increased interest in cryptocurrencies, opinions on cryptocurrency-related topics are shared on news and social media. The enormous amount of sentiment data that is frequently released makes data processing and analytics on such important issues more challenging. In addition, the present sentiment models in the cryptocurrency domain are primarily focused on English with minimal work on Malay language, further complicating problems.

Objective: The performance of the sentiment regression model to forecast sentiment scores for Malay news and tweets is examined in this study.

Methods: Malay news headlines and tweets on Bitcoin and Ethereum are used as the input. A hybrid Generalized Autoregressive Pretraining for Language Understanding (XLNet) language model in combination with Bidirectional-Gated Recurrent Unit (Bi-GRU) deep learning model is applied in the proposed sentiment regression implementation. The effectiveness of the proposed sentiment regression model is also investigated using the multi-head self-attention mechanism. Then, a comparison analysis using Bidirectional Encoder Representations from Transformers (BERT) is carried out.

Results: The experimental results demonstrate that the number of attention heads is vital in improving the XLNet-GRU sentiment model performance. There are slight improvements of 0.03 in the adjusted R2 values with an average MAE of 0.163 (Malay news) and 0.174 (Malay tweets). In addition, an average RMSE of 0.267 and 0.255 were obtained respectively for Malay news and tweets, which show that the proposed XLNet-GRU sentiment model outperforms the BERT sentiment model with lesser prediction errors.

Conclusion: The proposed model contributes to predicting sentiment on cryptocurrency. Moreover, this study also introduced two carefully curated Malay corpora, CryptoSentiNews-Malay and CryptoSentiTweets-Malay, which are extracted from news and tweets, respectively. Further works to enhance Malay news and tweets corpora on cryptocurrency-related issues will be expended with implementing the proposed XLNet Bi-GRU deep learning model for greater financial insight.

Keywords: Cryptocurrency, Deep learning model, Malay text, Sentiment analysis, Sentiment regression model