What are the recommendations based on inaccurate data challenge in cyberbullying detection?4 answersTo address the challenge of inaccurate data in cyberbullying detection, recommendations can be drawn from the research. One approach involves leveraging state-of-the-art NLP techniques to enhance automated detection. Another suggestion is to utilize a novel pronunciation-based convolutional neural network (PCNN) that corrects spelling errors without altering pronunciation, thus reducing noise in the data. Additionally, a principled framework that identifies and blocks the influence of confounders, known as p-confounders, can improve the robustness and causal interpretability of cyberbullying detection models. These recommendations collectively aim to enhance the accuracy and effectiveness of cyberbullying detection systems by mitigating the impact of inaccurate data.
What are the recommendations based on Imbalance of the dataset classes challege?cyberbullying detection?4 answersTo address the challenge of class imbalance in datasets for cyberbullying detection, several recommendations can be made based on the insights from the provided contexts. Firstly, utilizing techniques like weighted class approach, SMOTE, and Artificial Neural Networks can help in handling imbalanced data effectively. Additionally, employing Auto-encoders, semi-supervised learning, and GAN algorithms like SGAN can aid in overcoming obstacles posed by high dispersion and imbalance classes in the dataset. Furthermore, leveraging Convolutional Neural Networks and Long Short-Term Memory models can enhance the efficiency of cyberbullying detection systems. Overall, a combination of these approaches, including ensemble techniques, can significantly improve the accuracy and performance of cyberbullying detection systems when dealing with imbalanced datasets.
What are the main advantages of Machine Learning approaches in cyberbullying detection?4 answersMachine Learning approaches offer several advantages in cyberbullying detection. They enable the extraction of various features like textual, behavioral, and demographic aspects, enhancing the detection accuracy. By training algorithms on known cyberbullying incidents, predictive models can automatically classify new instances, reducing investigation time and effort significantly. These approaches can proactively detect cyberbullying before posts are made, thus preventing harm and harassment on social media. Additionally, Machine Learning algorithms, such as SVM, Logistic Regressor, and Naive Bayes, have shown high accuracy rates in identifying cyberbullying in text data from platforms like Twitter and Wikipedia, showcasing their effectiveness in real-world scenarios.
What is datasets used in machine learning applied on cybersécurité?5 answersMachine learning applied in cybersecurity utilizes various datasets such as BRON, KDD Cup 99, NSL-KDD, UNSW-NB15, and Kyoto. Researchers are moving away from supervised learning and exploring clustering and other algorithms to detect unknown and zero-day attacks. The use of hybrid algorithms has also increased significantly. While some articles still consider KDD Cup 99 and its reduced variant as principal training datasets for IDSs, modern datasets are being increasingly used to match the evolving cyber threats and technologies like cloud computing, IoT, and IPv6. The choice of dataset is crucial in IDS design as it influences the selection of machine learning algorithms.
What is type dataset of cyberattacks?5 answersCyberattack datasets can be categorized into different types based on their characteristics. One type of dataset is created by removing attack samples from the training dataset and including them only in the testing dataset for evaluating the performance of machine learning-based intrusion detection systems (IDS). Another type of dataset is collected through virtual machines or simulated environments, which may not accurately represent real-world networks. Additionally, there are datasets specifically designed for detecting intrusions in Internet of Things (IoT) devices, such as the IoT-23 dataset, which includes network flows from devices like Somfy door lock, Philips Hue, and Amazon Echo. Furthermore, there are cybersecurity entity alignment datasets that integrate vulnerability information from different channels, enabling comprehensive threat assessment. Overall, the type of dataset used for cyberattacks depends on the specific research focus and the goals of the intrusion detection or vulnerability assessment system.
What are the current state of the art models for cyberbullying detection?5 answersCurrent state-of-the-art models for cyberbullying detection include a real-time system for Twitter that uses Natural Language Processing (NLP) and Machine Learning (ML). Another approach involves text mining and machine learning algorithms to proactively detect bullying text by extracting textual, behavioral, and demographic features. Emotion detection models have also been proposed, where emotions and sentiment are extracted from cyberbullying datasets and used as features for detection. Additionally, a model combining parallel BERT and Bi-LSTM has been proposed, along with the use of Contrastive Self-Supervised Learning to augment training data from unlabeled sources. These models have shown improved performance in cyberbullying detection, outperforming previous approaches and achieving high F1 scores.