scispace - formally typeset
Search or ask a question
Author

Ibrahim Said Ahmad

Other affiliations: National University of Malaysia
Bio: Ibrahim Said Ahmad is an academic researcher from Bayero University Kano. The author has contributed to research in topics: Computer science & Sentiment analysis. The author has an hindex of 4, co-authored 13 publications receiving 51 citations. Previous affiliations of Ibrahim Said Ahmad include National University of Malaysia.

Papers
More filters
Proceedings Article
20 Jan 2022
TL;DR: This work introduces the first large-scale human-annotated Twitter sentiment dataset for Nigeria—Hausa, Igbo, Nigerian-Pidgin, and Yorùbá—consisting of around 30,000 annotated tweets per language, including a significant fraction of code-mixed tweets.
Abstract: Sentiment analysis is one of the most widely studied applications in NLP, but most work focuses on languages with large amounts of data. We introduce the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria—Hausa, Igbo, Nigerian-Pidgin, and Yorùbá—consisting of around 30,000 annotated tweets per language, including a significant fraction of code-mixed tweets. We propose text collection, filtering, processing and labeling methods that enable us to create datasets for these low-resource languages. We evaluate a range of pre-trained models and transfer strategies on the dataset. We find that language-specific models and language-adaptive fine-tuning generally perform best. We release the datasets, trained models, sentiment lexicons, and code to incentivize research on sentiment analysis in under-represented languages.

48 citations

Journal ArticleDOI
TL;DR: The findings indicate that trust in information sources such as institute and media information or interpersonal communication related to distance learning programs is correlated with awareness and readiness, and that readiness strongly influences the adoption of distance learning amid the COVID-19 pandemic.

31 citations

Journal ArticleDOI
TL;DR: The AfriSenti dataset as mentioned in this paper consists of 14 sentiment datasets of 110,000+ tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yor\`ub\'a) from four language families annotated by native speakers.
Abstract: Africa is home to over 2000 languages from over six language families and has the highest linguistic diversity among all continents. This includes 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial in enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, which consists of 14 sentiment datasets of 110,000+ tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yor\`ub\'a) from four language families annotated by native speakers. The data is used in SemEval 2023 Task 12, the first Afro-centric SemEval shared task. We describe the data collection methodology, annotation process, and related challenges when curating each of the datasets. We conduct experiments with different sentiment classification baselines and discuss their usefulness. We hope AfriSenti enables new work on under-represented languages. The dataset is available at https://github.com/afrisenti-semeval/afrisent-semeval-2023 and can also be loaded as a huggingface datasets (https://huggingface.co/datasets/shmuhammad/AfriSenti).

31 citations

Journal ArticleDOI
TL;DR: This paper builds a model for movie revenue prediction prior to the movie's release using YouTube trailer reviews and proves the superiority of this approach compared to three baseline approaches and achieved a relative absolute error of 29.65%.
Abstract: The increase in acceptability and popularity of social media has made extracting information from the data generated on social media an emerging field of research. An important branch of this field is predicting future events using social media data. This paper is focused on predicting box-office revenue of a movie by mining people's intention to purchase a movie ticket, termed purchase intention, from trailer reviews. Movie revenue prediction is important due to risks involved in movie production despite the high cost involved in the production. Previous studies in this domain focus on the use of twitter data and IMDB reviews for the prediction of movies that have already been released. In this paper, we build a model for movie revenue prediction prior to the movie's release using YouTube trailer reviews. Our model consists of novel methods of calculating purchase intention, positive-to-negative sentiment ratio, and like-to-dislike ratio for movie revenue prediction. Our experimental results prove the superiority of our approach compared to three baseline approaches and achieved a relative absolute error of 29.65%.

29 citations

Journal ArticleDOI
TL;DR: Two metaheuristics algorithms, Magnetic Optimization Algorithm and Particle Swarm Optimization (PSO) have been enhanced through hybridization to propose a new method MOA-PSO, which exhibits promising results with a faster and more accurate prediction, with 99.7% accuracy.
Abstract: Corporate bankruptcy prediction is an important task in the determination of corporate solvency, that is, whether a company can meet up to its financial obligations or not. It is widely studied as it has a significant effect on employees, customers, management, stockholders, bank lending assessments, and profitability. In recent years, machine learning techniques, particularly Artificial Neural Network (ANN), have widely been studied for bankruptcy prediction since they have proven to be a good predictor, especially in financial applications. A critical process in learning a network is weight training. Although the ANN is mathematically efficient, it has a complex weight training process, especially in computation time when involving a large training data. Many studies improved ANN’s weight training using metaheuristic algorithms such as Evolutionary Algorithms (EA), and Swarm Intelligence (SI) approaches for bankruptcy prediction. In this study, two metaheuristics algorithms, Magnetic Optimization Algorithm (MOA) and Particle Swarm Optimization (PSO), have been enhanced through hybridization to propose a new method MOA-PSO. Hybrid algorithms have been proven to be capable of solving optimization problems faster, with better accuracy. The MOA-PSO was used in training ANN to improve the performance of the ANN in bankruptcy prediction. The performance of the hybrid MOA-PSO was compared with that of four existing algorithms. The proposed hybrid MOA-PSO algorithm exhibits promising results with a faster and more accurate prediction, with 99.7% accuracy.

28 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Book ChapterDOI
01 Jan 2005
TL;DR: The goal is to help developers find the most suitable language for their representation needs in the Semantic Web, which has a need for languages to represent the semantic information that this Web requires.
Abstract: being used in many other applications to explicitly declare the knowledge embedded in them. However, not only are ontologies useful for applications in which knowledge plays a key role, but they can also trigger a major change in current Web contents. This change is leading to the third generation of the Web—known as the Semantic Web—which has been defined as “the conceptual structuring of the Web in an explicit machine-readable way.”1 This definition does not differ too much from the one used for defining an ontology: “An ontology is an explicit, machinereadable specification of a shared conceptualization.”2 In fact, new ontology-based applications and knowledge architectures are developing for this new Web. A common claim for all of these approaches is the need for languages to represent the semantic information that this Web requires—solving the heterogeneous data exchange in this heterogeneous environment. Here, we don’t decide which language is best of the Semantic Web. Rather, our goal is to help developers find the most suitable language for their representation needs.

212 citations

Journal ArticleDOI
01 Feb 2021
TL;DR: Supervised machine learning models for COVID-19 infection were developed in this work with learning algorithms which include logistic regression, decision tree, support vector machine, naive Bayes, and artificial neutral network using epidemiology labeled dataset for positive and negative CO VID-19 cases of Mexico.
Abstract: COVID-19 or 2019-nCoV is no longer pandemic but rather endemic, with more than 651,247 people around world having lost their lives after contracting the disease. Currently, there is no specific treatment or cure for COVID-19, and thus living with the disease and its symptoms is inevitable. This reality has placed a massive burden on limited healthcare systems worldwide especially in the developing nations. Although neither an effective, clinically proven antiviral agents' strategy nor an approved vaccine exist to eradicate the COVID-19 pandemic, there are alternatives that may reduce the huge burden on not only limited healthcare systems but also the economic sector; the most promising include harnessing non-clinical techniques such as machine learning, data mining, deep learning and other artificial intelligence. These alternatives would facilitate diagnosis and prognosis for 2019-nCoV pandemic patients. Supervised machine learning models for COVID-19 infection were developed in this work with learning algorithms which include logistic regression, decision tree, support vector machine, naive Bayes, and artificial neutral network using epidemiology labeled dataset for positive and negative COVID-19 cases of Mexico. The correlation coefficient analysis between various dependent and independent features was carried out to determine a strength relationship between each dependent feature and independent feature of the dataset prior to developing the models. The 80% of the training dataset were used for training the models while the remaining 20% were used for testing the models. The result of the performance evaluation of the models showed that decision tree model has the highest accuracy of 94.99% while the Support Vector Machine Model has the highest sensitivity of 93.34% and Naive Bayes Model has the highest specificity of 94.30%.

185 citations

Journal ArticleDOI
21 Jul 2021
TL;DR: A hybrid convolutional neural network-long short-term memory (CNN-LSTM) model is proposed for sentiment analysis, which demonstrates that the proposed model outperforms with 91.3% accuracy in sentiment analysis.
Abstract: With the fastest growth of information and communication technology (ICT), the availability of web content on social media platforms is increasing day by day. Sentiment analysis from online reviews drawing researchers’ attention from various organizations such as academics, government, and private industries. Sentiment analysis has been a hot research topic in Machine Learning (ML) and Natural Language Processing (NLP). Currently, Deep Learning (DL) techniques are implemented in sentiment analysis to get excellent results. This study proposed a hybrid convolutional neural network-long short-term memory (CNN-LSTM) model for sentiment analysis. Our proposed model is being applied with dropout, max pooling, and batch normalization to get results. Experimental analysis carried out on Airlinequality and Twitter airline sentiment datasets. We employed the Keras word embedding approach, which converts texts into vectors of numeric values, where similar words have small vector distances between them. We calculated various parameters, such as accuracy, precision, recall, and F1-measure, to measure the model’s performance. These parameters for the proposed model are better than the classical ML models in sentiment analysis. Our results analysis demonstrates that the proposed model outperforms with 91.3% accuracy in sentiment analysis.

51 citations

Proceedings Article
13 Apr 2022
TL;DR: This paper performs multilingual adaptive fine-tuning on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent to encourage cross-lingual transfer learning.
Abstract: Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is language adaptive fine-tuning (LAFT) — fine-tuning a multilingual PLM on monolingual texts of a language using the pre-training objective. However, adapting to target language individually takes large disk space and limits the cross-lingual transfer abilities of the resulting models because they have been specialized for a single language. In this paper, we perform multilingual adaptive fine-tuning on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent to encourage cross-lingual transfer learning. To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT, thus reducing the model size by around 50%. Our evaluation on two multilingual PLMs (AfriBERTa and XLM-R) and three NLP tasks (NER, news topic classification, and sentiment classification) shows that our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space. Additionally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efficient fine-tuning methods.

42 citations