scispace - formally typeset
Search or ask a question

Showing papers on "Phrase published in 2022"


Journal ArticleDOI
TL;DR: A predictive ensemble model to classify idioms and literals using BERT and RoBERTa, fine-tuned with the TroFi dataset is proposed, which outperforms the baseline models in terms of the metrics considered, such as F-score and accuracy.
Abstract: An idiom is a common phrase that means something other than its literal meaning. Detecting idioms automatically is a serious challenge in natural language processing (NLP) domain applications like information retrieval (IR), machine translation and chatbot. Automatic detection of Idioms plays an important role in all these applications. A fundamental NLP task is text classification, which categorizes text into structured categories known as text labeling or categorization. This paper deals with idiom identification as a text classification task. Pre-trained deep learning models have been used for several text classification tasks; though models like BERT and RoBERTa have not been exclusively used for idiom and literal classification. We propose a predictive ensemble model to classify idioms and literals using BERT and RoBERTa, fine-tuned with the TroFi dataset. The model is tested with a newly created in house dataset of idioms and literal expressions, numbering 1470 in all, and annotated by domain experts. Our model outperforms the baseline models in terms of the metrics considered, such as F-score and accuracy, with a 2% improvement in accuracy.

44 citations


MonographDOI
09 Sep 2022
TL;DR: This book presents a novel viewpoint on automatic sequences, and more generally on combinatorics on words, by introducing a decision method through which many new results in combinatoric and number theory can be automatically proved or disproved with little or no human intervention.
Abstract: Automatic sequences are sequences over a finite alphabet generated by a finite-state machine. This book presents a novel viewpoint on automatic sequences, and more generally on combinatorics on words, by introducing a decision method through which many new results in combinatorics and number theory can be automatically proved or disproved with little or no human intervention. This approach to proving theorems is extremely powerful, allowing long and error-prone case-based arguments to be replaced by simple computations. Readers will learn how to phrase their desired results in first-order logic, using free software to automate the computation process. Results that normally require multipage proofs can emerge in milliseconds, allowing users to engage with mathematical questions that would otherwise be difficult to solve. With more than 150 exercises included, this text is an ideal resource for researchers, graduate students, and advanced undergraduates studying combinatorics, sequences, and number theory.

33 citations


Journal ArticleDOI
TL;DR: This paper proposed a predictive ensemble model to classify idioms and literals using BERT and RoBERTa, fine-tuned with the TroFi dataset, which outperformed the baseline models in terms of the metrics considered, such as F-score and accuracy, with a 2% improvement in accuracy.
Abstract: An idiom is a common phrase that means something other than its literal meaning. Detecting idioms automatically is a serious challenge in natural language processing (NLP) domain applications like information retrieval (IR), machine translation and chatbot. Automatic detection of Idioms plays an important role in all these applications. A fundamental NLP task is text classification, which categorizes text into structured categories known as text labeling or categorization. This paper deals with idiom identification as a text classification task. Pre-trained deep learning models have been used for several text classification tasks; though models like BERT and RoBERTa have not been exclusively used for idiom and literal classification. We propose a predictive ensemble model to classify idioms and literals using BERT and RoBERTa, fine-tuned with the TroFi dataset. The model is tested with a newly created in house dataset of idioms and literal expressions, numbering 1470 in all, and annotated by domain experts. Our model outperforms the baseline models in terms of the metrics considered, such as F-score and accuracy, with a 2% improvement in accuracy.

30 citations


Journal ArticleDOI
E. Sravani1
01 Jan 2022
TL;DR: This article proposed a phrase dependency graph attention network (PD-RGAT) for aspect-based sentiment analysis, which aggregates directed dependency edges and phrase information, and showed that incorporating direction information can enhance the capture of aspect-sentiment polarities on the ABSA task.
Abstract: Aspect-based Sentiment Analysis (ABSA) is a subclass of sentiment analysis, which aims to identify the sentiment polarity such as positive, negative, or neutral for specific aspects or attributes that appear in a sentence. Previous studies have focused on extracting aspect-sentiment polarity pairs based on dependency trees, ignoring edge labels and phrase information. In this paper, we instead propose a phrase dependency graph attention network (PD-RGAT) on the ABSA task, which is a relational graph attention network constructed based on the phrase dependency graph, aggregating directed dependency edges and phrase information. We perform experiments with two pre-training models, GloVe and BERT. Experimental results on the benchmarking datasets (i.e., Twitter, Restaurant, and Laptop) demonstrate that our proposed PD-RGAT has comparable effectiveness to a range of state-of-the-art models and further illustrate that the graph convolutional structure based on the phrase dependency graph can capture both syntactic information and short long-range word dependencies. It also shows that incorporating directed edge labels and phrase information can enhance the capture of aspect-sentiment polarities on the ABSA task.

26 citations


Journal ArticleDOI
TL;DR: The authors found an EEG spectral power peak elicited at a frequency that only 'tagged' covert, implicit prosodic change, but not any major syntactic constituents, and concluded that processing of both overt and covert prosody is reflected in the frequency-tagged neural responses at sentence constituent frequencies.
Abstract: Recent neurophysiological research suggests that slow cortical activity tracks hierarchical syntactic structure during online sentence processing. Here we tested an alternative hypothesis: electrophysiological activity peaks at constituent phrase as well as sentence frequencies reflect cortical tracking of overt or covert (implicit) prosodic grouping. Participants listened to series of sentences presented in three conditions while electroencephalography (EEG) was recorded. First, prosodic cues in the sentence materials were neutralized. We found an EEG spectral power peak elicited at a frequency that only 'tagged' covert, implicit prosodic change, but not any major syntactic constituents. In the second condition, participants listened to a series of sentences with overt prosodic grouping cues that either aligned or misaligned with the syntactic phrasing in the sentences (initial overt prosody trials). Following each overt prosody trial, participants were presented with a second series of sentences lacking overt prosodic cues (instructed prosody trial) and were instructed to imagine the prosodic contour present in the previous, overt prosody trial. The EEG responses reflected an interactive relationship between syntactic processing and prosodic tracking at the frequencies of syntactic constituents (sentences and phrases): alignment of syntax and prosody boosted EEG responses, whereas their misalignment had an opposite effect. This was true for both overt and imagined prosody conditions. We conclude that processing of both overt and covert prosody is reflected in the frequency-tagged neural responses at sentence constituent frequencies. These findings need to be incorporated in any account that aims to identify neural markers reflecting syntactic processing.

17 citations


Journal ArticleDOI
TL;DR:
Abstract: The power of language to modify the reader’s perception of interpreting biomedical results cannot be underestimated. Misreporting and misinterpretation are pressing problems in randomized controlled trials (RCT) output. This may be partially related to the statistical significance paradigm used in clinical trials centered around a P value below 0.05 cutoff. Strict use of this P value may lead to strategies of clinical researchers to describe their clinical results with P values approaching but not reaching the threshold to be “almost significant.” The question is how phrases expressing nonsignificant results have been reported in RCTs over the past 30 years. To this end, we conducted a quantitative analysis of English full texts containing 567,758 RCTs recorded in PubMed between 1990 and 2020 (81.5% of all published RCTs in PubMed). We determined the exact presence of 505 predefined phrases denoting results that approach but do not cross the line of formal statistical significance (P < 0.05). We modeled temporal trends in phrase data with Bayesian linear regression. Evidence for temporal change was obtained through Bayes factor (BF) analysis. In a randomly sampled subset, the associated P values were manually extracted. We identified 61,741 phrases in 49,134 RCTs indicating almost significant results (8.65%; 95% confidence interval (CI): 8.58% to 8.73%). The overall prevalence of these phrases remained stable over time, with the most prevalent phrases being “marginally significant” (in 7,735 RCTs), “all but significant” (7,015), “a nonsignificant trend” (3,442), “failed to reach statistical significance” (2,578), and “a strong trend” (1,700). The strongest evidence for an increased temporal prevalence was found for “a numerical trend,” “a positive trend,” “an increasing trend,” and “nominally significant.” In contrast, the phrases “all but significant,” “approaches statistical significance,” “did not quite reach statistical significance,” “difference was apparent,” “failed to reach statistical significance,” and “not quite significant” decreased over time. In a random sampled subset of 29,000 phrases, the manually identified and corresponding 11,926 P values, 68,1% ranged between 0.05 and 0.15 (CI: 67. to 69.0; median 0.06). Our results show that RCT reports regularly contain specific phrases describing marginally nonsignificant results to report P values close to but above the dominant 0.05 cutoff. The fact that the prevalence of the phrases remained stable over time indicates that this practice of broadly interpreting P values close to a predefined threshold remains prevalent. To enhance responsible and transparent interpretation of RCT results, researchers, clinicians, reviewers, and editors may reduce the focus on formal statistical significance thresholds and stimulate reporting of P values with corresponding effect sizes and CIs and focus on the clinical relevance of the statistical difference found in RCTs.

17 citations


Proceedings ArticleDOI
30 Jan 2022
TL;DR: In this paper , fine-grained contextual knowledge selection (fineCoS) was proposed to reduce the uncertainty of token predictions by re-normalizing the attention weights of most relevant phrases in inference to obtain more focused phrase-level contextual representations.
Abstract: Nowadays, most methods for end-to-end contextual speech recognition bias the recognition process towards contextual knowledge. Since all-neural contextual biasing methods rely on phrase-level contextual modeling and attention-based relevance modeling, they may suffer from the confusion between similar context-specific phrases, which hurts predictions at the token level. In this work, we focus on mitigating confusion problems with fine-grained contextual knowledge selection (FineCoS). In FineCoS, we introduce fine-grained knowledge to reduce the uncertainty of token predictions. Specifically, we first apply phrase selection to narrow the range of phrase candidates, and then conduct token attention on the tokens in the selected phrase candidates. Moreover, we re-normalize the attention weights of most relevant phrases in inference to obtain more focused phrase-level contextual representations, and inject position information to help model better discriminate phrases or tokens. On LibriSpeech and an in-house 160,000-hour dataset, we explore the proposed methods based on an all-neural biasing method, collaborative decoding (ColDec). The proposed methods further bring at most 6.1% relative word error rate reduction on LibriSpeech and 16.4% relative character error rate reduction on the in-house dataset.

15 citations


Journal ArticleDOI
TL;DR: This paper found that the posterior superior temporal sulcus (pSTS) is a highly specialized region composed of sparsely interwoven heterogeneous constituents that encodes both lower and higher level linguistic features.
Abstract: The ability to comprehend phrases is an essential integrative property of the brain. Here, we evaluate the neural processes that enable the transition from single-word processing to a minimal compositional scheme. Previous research has reported conflicting timing effects of composition, and disagreement persists with respect to inferior frontal and posterior temporal contributions. To address these issues, 19 patients (10 male, 9 female) implanted with penetrating depth or surface subdural intracranial electrodes, heard auditory recordings of adjective-noun, pseudoword-noun, and adjective-pseudoword phrases and judged whether the phrase matched a picture. Stimulus-dependent alterations in broadband gamma activity, low-frequency power, and phase-locking values across the language-dominant left hemisphere were derived. This revealed a mosaic located on the lower bank of the posterior superior temporal sulcus (pSTS), in which closely neighboring cortical sites displayed exclusive sensitivity to either lexicality or phrase structure, but not both. Distinct timings were found for effects of phrase composition (210-300 ms) and pseudoword processing (∼300-700 ms), and these were localized to neighboring electrodes in pSTS. The pars triangularis and temporal pole encoded anticipation of composition in broadband low frequencies, and both regions exhibited greater functional connectivity with pSTS during phrase composition. Our results suggest that the pSTS is a highly specialized region composed of sparsely interwoven heterogeneous constituents that encodes both lower and higher level linguistic features. This hub in pSTS for minimal phrase processing may form the neural basis for the human-specific computational capacity for forming hierarchically organized linguistic structures.SIGNIFICANCE STATEMENT Linguists have claimed that the integration of multiple words into a phrase demands a computational procedure distinct from single-word processing. Here, we provide intracranial recordings from a large patient cohort, with high spatiotemporal resolution, to track the cortical dynamics of phrase composition. Epileptic patients volunteered to participate in a task in which they listened to phrases (red boat), word-pseudoword or pseudoword-word pairs (e.g., red fulg). At the onset of the second word in phrases, greater broadband high gamma activity was found in posterior superior temporal sulcus in electrodes that exclusively indexed phrasal meaning and not lexical meaning. These results provide direct, high-resolution signatures of minimal phrase composition in humans, a potentially species-specific computational capacity.

14 citations


Journal ArticleDOI
TL;DR: In this paper , the authors investigated whether cortical tracking of phrase structures is modulated by the extent to which these structures compositionally determine meaning, using EEG recordings of 38 native speakers who listened to naturally spoken Dutch stimuli in different conditions, which parametrically modulated the degree to which syntactic structure and lexical semantics determine sentence meaning.
Abstract: Abstract Recent research has established that cortical activity “tracks” the presentation rate of syntactic phrases in continuous speech, even though phrases are abstract units that do not have direct correlates in the acoustic signal. We investigated whether cortical tracking of phrase structures is modulated by the extent to which these structures compositionally determine meaning. To this end, we recorded electroencephalography (EEG) of 38 native speakers who listened to naturally spoken Dutch stimuli in different conditions, which parametrically modulated the degree to which syntactic structure and lexical semantics determine sentence meaning. Tracking was quantified through mutual information between the EEG data and either the speech envelopes or abstract annotations of syntax, all of which were filtered in the frequency band corresponding to the presentation rate of phrases (1.1–2.1 Hz). Overall, these mutual information analyses showed stronger tracking of phrases in regular sentences than in stimuli whose lexical-syntactic content is reduced, but no consistent differences in tracking between sentences and stimuli that contain a combination of syntactic structure and lexical content. While there were no effects of compositional meaning on the degree of phrase-structure tracking, analyses of event-related potentials elicited by sentence-final words did reveal meaning-induced differences between conditions. Our findings suggest that cortical tracking of structure in sentences indexes the internal generation of this structure, a process that is modulated by the properties of its input, but not by the compositional interpretation of its output.

13 citations


Proceedings ArticleDOI
27 Feb 2022
TL;DR: Cluster-assisted contrastive learning (CCL) is proposed which largely reduces noisy negatives by selecting negatives from clusters and further improves phrase representations for topics accordingly.
Abstract: High-quality phrase representations are essential to finding topics and related terms in documents (a.k.a. topic mining). Existing phrase representation learning methods either simply combine unigram representations in a context-free manner or rely on extensive annotations to learn context-aware knowledge. In this paper, we propose UCTopic, a novel unsupervised contrastive learning framework for context-aware phrase representations and topic mining. UCTopic is pretrained in a large scale to distinguish if the contexts of two phrase mentions have the same semantics. The key to the pretraining is positive pair construction from our phrase-oriented assumptions. However, we find traditional in-batch negatives cause performance decay when finetuning on a dataset with small topic numbers. Hence, we propose cluster-assisted contrastive learning (CCL) which largely reduces noisy negatives by selecting negatives from clusters and further improves phrase representations for topics accordingly. UCTopic outperforms the state-of-the-art phrase representation model by 38.2% NMI in average on four entity clustering tasks. Comprehensive evaluation on topic mining shows that UCTopic can extract coherent and diverse topical phrases.

12 citations



Journal ArticleDOI
30 May 2022-PeerJ
TL;DR: This research presents several experiments to validate that context-based key extraction is significant compared to traditional methods, and the KeyBERT model outperformed traditional approaches in producing similar keywords to the authors’ provided keywords.
Abstract: A document’s keywords provide high-level descriptions of the content that summarize the document’s central themes, concepts, ideas, or arguments. These descriptive phrases make it easier for algorithms to find relevant information quickly and efficiently. It plays a vital role in document processing, such as indexing, classification, clustering, and summarization. Traditional keyword extraction approaches rely on statistical distributions of key terms in a document for the most part. According to contemporary technological breakthroughs, contextual information is critical in deciding the semantics of the work at hand. Similarly, context-based features may be beneficial in the job of keyword extraction. For example, simply indicating the previous or next word of the phrase of interest might be used to describe the context of a phrase. This research presents several experiments to validate that context-based key extraction is significant compared to traditional methods. Additionally, the KeyBERT proposed methodology also results in improved results. The proposed work relies on identifying a group of important words or phrases from the document’s content that can reflect the authors’ main ideas, concepts, or arguments. It also uses contextual word embedding to extract keywords. Finally, the findings are compared to those obtained using older approaches such as Text Rank, Rake, Gensim, Yake, and TF-IDF. The Journals of Universal Computer (JUCS) dataset was employed in our research. Only data from abstracts were used to produce keywords for the research article, and the KeyBERT model outperformed traditional approaches in producing similar keywords to the authors’ provided keywords. The average similarity of our approach with author-assigned keywords is 51%.


Journal ArticleDOI
TL;DR: A subject‐independent application of brain–computer interfacing (BCI) is proposed and it is revealed that the alpha band can recognize SI better than other EEG frequencies.
Abstract: This article proposes a subject‐independent application of brain–computer interfacing (BCI). A 32‐channel Electroencephalography (EEG) device is used to measure imagined speech (SI) of four words (sos, stop, medicine, washroom) and one phrase (come‐here) across 13 subjects. A deep long short‐term memory (LSTM) network has been adopted to recognize the above signals in seven EEG frequency bands individually in nine major regions of the brain. The results show a maximum accuracy of 73.56% and a network prediction time (NPT) of 0.14 s which are superior to other state‐of‐the‐art techniques in the literature. Our analysis reveals that the alpha band can recognize SI better than other EEG frequencies. To reinforce our findings, the above work has been compared by models based on the gated recurrent unit (GRU), convolutional neural network (CNN), and six conventional classifiers. The results show that the LSTM model has 46.86% more average accuracy in the alpha band and 74.54% less average NPT than CNN. The maximum accuracy of GRU was 8.34% less than the LSTM network. Deep networks performed better than traditional classifiers.

Journal ArticleDOI
TL;DR: This article conducted a systematic mapping review of how cognitive load theory has been used across a number of leading computing education research (CER) forums since 2010 and found that the most common reason to cite CLT is to mention it briefly as a design influence; authors predominantly cite old versions of the theory; hypotheses phrased in terms of cognitive load components are rare; and only a small selection of cognitive Load measures have been applied, sparsely.
Abstract: One of the most commonly cited theories in computing education research is cognitive load theory (CLT), which explains how learning is affected by the bottleneck of human working memory and how teaching may work around that limitation. The theory has evolved over a number of decades, addressing shortcomings in earlier versions; other issues remain and are being debated by the CLT community. We conduct a systematic mapping review of how CLT has been used across a number of leading computing education research (CER) forums since 2010. We find that the most common reason to cite CLT is to mention it briefly as a design influence; authors predominantly cite old versions of the theory; hypotheses phrased in terms of cognitive load components are rare; and only a small selection of cognitive load measures have been applied, sparsely. Overall, the theory’s evolution and recent themes in CLT appear to have had limited impact on CER so far. We recommend that studies in CER explain which version of the theory they use and why; clearly distinguish between load components (e.g., intrinsic and extraneous load); phrase hypotheses in terms of load components a priori; look further into validating different measures of cognitive load; accompany cognitive load measures with complementary constructs, such as motivation; and explore themes such as collaborative CLT and individual differences in working-memory capacity.

Journal ArticleDOI
TL;DR: This article used EEG data from N = 31 native speakers of Mandarin and found robust delta synchronization to syntactically well-formed isochronous speech, consistent with the hierarchical, but not the lexical, accounts.
Abstract: Abstract Neural responses appear to synchronize with sentence structure. However, researchers have debated whether this response in the delta band (0.5–3 Hz) really reflects hierarchical information or simply lexical regularities. Computational simulations in which sentences are represented simply as sequences of high-dimensional numeric vectors that encode lexical information seem to give rise to power spectra similar to those observed for sentence synchronization, suggesting that sentence-level cortical tracking findings may reflect sequential lexical or part-of-speech information, and not necessarily hierarchical syntactic information. Using electroencephalography (EEG) data and the frequency-tagging paradigm, we develop a novel experimental condition to tease apart the predictions of the lexical and the hierarchical accounts of the attested low-frequency synchronization. Under a lexical model, synchronization should be observed even when words are reversed within their phrases (e.g., “sheep white grass eat” instead of “white sheep eat grass”), because the same lexical items are preserved at the same regular intervals. Critically, such stimuli are not syntactically well-formed; thus a hierarchical model does not predict synchronization of phrase- and sentence-level structure in the reversed phrase condition. Computational simulations confirm these diverging predictions. EEG data from N = 31 native speakers of Mandarin show robust delta synchronization to syntactically well-formed isochronous speech. Importantly, no such pattern is observed for reversed phrases, consistent with the hierarchical, but not the lexical, accounts.


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed the Key Phrase Aware Transformer (KPAT) model, which incorporates prior knowledge of key phrases into the transformer-based summarization model and guides the model to encode key phrases.
Abstract: Abstractive summarization aims to generate a concise summary covering salient content from single or multiple text documents. Many recent abstractive summarization methods are built on the transformer model to capture long-range dependencies in the input text and achieve parallelization. In the transformer encoder, calculating attention weights is a crucial step for encoding input documents. Input documents usually contain some key phrases conveying salient information, and it is important to encode these phrases completely. However, existing transformer-based summarization works did not consider key phrases in input when determining attention weights. Consequently, some of the tokens within key phrases only receive small attention weights, which is not conducive to encoding the semantic information of input documents. In this paper, we introduce some prior knowledge of key phrases into the transformer-based summarization model and guide the model to encode key phrases. For the contextual representation of each token in the key phrase, we assume the tokens within the same key phrase make larger contributions compared with other tokens in the input sequence. Based on this assumption, we propose the Key Phrase Aware Transformer (KPAT), a model with the highlighting mechanism in the encoder to assign greater attention weights for tokens within key phrases. Specifically, we first extract key phrases from the input document and score the phrases’ importance. Then we build the block diagonal highlighting matrix to indicate these phrases’ importance scores and positions. To combine self-attention weights with key phrases’ importance scores, we design two structures of highlighting attention for each head and the multi-head highlighting attention. Experimental results on two datasets (Multi-News and PubMed) from different summarization tasks and domains show that our KPAT model significantly outperforms advanced summarization baselines. We conduct more experiments to analyze the impact of each part of our model on the summarization performance and verify the effectiveness of our proposed highlighting mechanism.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper construct a phrase bank by pooling all phrases extracted from a corpus, and assign phrase candidates to new documents by a simple partial matching algorithm, and then rank these candidates by their relevance to the document from both lexical and semantic perspectives.
Abstract: Keyphrase generation aims to summarize long documents with a collection of salient phrases. Deep neural models have demonstrated remarkable success in this task, with the capability of predicting keyphrases that are even absent from a document. However, such abstractiveness is acquired at the expense of a substantial amount of annotated data. In this paper, we present a novel method for keyphrase generation, AutoKeyGen, without the supervision of any annotated doc-keyphrase pairs. Motivated by the observation that an absent keyphrase in a document may appear in other places, in whole or in part, we construct a phrase bank by pooling all phrases extracted from a corpus. With this phrase bank, we assign phrase candidates to new documents by a simple partial matching algorithm, and then we rank these candidates by their relevance to the document from both lexical and semantic perspectives. Moreover, we bootstrap a deep generative model using these top-ranked pseudo keyphrases to produce more absent candidates. Extensive experiments demonstrate that AutoKeyGen outperforms all unsupervised baselines and can even beat a strong supervised method in certain cases.

Journal ArticleDOI
TL;DR: This article found that all language regions respond more strongly during sentence production than comprehension, suggesting that production incurs a greater cost for the language network, and that language comprehension and sentence production draw on the same knowledge representations, which are stored in a distributed manner within the language selective network and are used to both interpret and generate linguistic utterances.
Abstract: Abstract A fronto-temporal brain network has long been implicated in language comprehension. However, this network’s role in language production remains debated. In particular, it remains unclear whether all or only some language regions contribute to production, and which aspects of production these regions support. Across 3 functional magnetic resonance imaging experiments that rely on robust individual-subject analyses, we characterize the language network’s response to high-level production demands. We report 3 novel results. First, sentence production, spoken or typed, elicits a strong response throughout the language network. Second, the language network responds to both phrase-structure building and lexical access demands, although the response to phrase-structure building is stronger and more spatially extensive, present in every language region. Finally, contra some proposals, we find no evidence of brain regions—within or outside the language network—that selectively support phrase-structure building in production relative to comprehension. Instead, all language regions respond more strongly during production than comprehension, suggesting that production incurs a greater cost for the language network. Together, these results align with the idea that language comprehension and production draw on the same knowledge representations, which are stored in a distributed manner within the language-selective network and are used to both interpret and generate linguistic utterances.

Journal ArticleDOI
TL;DR: This paper found that lexical frequency predicted the word-by-word elicited MEG signal in a widespread cortical network, irrespective of sentential context, while index (ordinal word position) was more strongly encoded in sentence words, in left front-temporal areas.
Abstract: Typical adults read remarkably quickly. Such fast reading is facilitated by brain processes that are sensitive to both word frequency and contextual constraints. It is debated as to whether these attributes have additive or interactive effects on language processing in the brain. We investigated this issue by analysing existing magnetoencephalography data from 99 participants reading intact and scrambled sentences. Using a cross-validated model comparison scheme, we found that lexical frequency predicted the word-by-word elicited MEG signal in a widespread cortical network, irrespective of sentential context. In contrast, index (ordinal word position) was more strongly encoded in sentence words, in left front-temporal areas. This confirms that frequency influences word processing independently of predictability, and that contextual constraints affect word-by-word brain responses. With a conservative multiple comparisons correction, only the interaction between lexical frequency and surprisal survived, in anterior temporal and frontal cortex, and not between lexical frequency and entropy, nor between lexical frequency and index. However, interestingly, the uncorrected index × frequency interaction revealed an effect in left frontal and temporal cortex that reversed in time and space for intact compared to scrambled sentences. Finally, we provide evidence to suggest that, in sentences, lexical frequency and predictability may independently influence early (<150 ms) and late stages of word processing, but also interact during late stages of word processing (>150-250 ms), thus helping to converge previous contradictory eye-tracking and electrophysiological literature. Current neurocognitive models of reading would benefit from accounting for these differing effects of lexical frequency and predictability on different stages of word processing.

Proceedings ArticleDOI
22 Mar 2022
TL;DR: EyeSayCorrect, an eye gaze and voice based hands-free text correction method for mobile devices that uses a Bayesian approach for determining the selected word given an eye-gaze trajectory, is presented.
Abstract: Text correction on mobile devices usually requires precise and repetitive manual control. In this paper, we present EyeSayCorrect, an eye gaze and voice based hands-free text correction method for mobile devices. To correct text with EyeSayCorrect, the user first utilizes the gaze location on the screen to select a word, then speaks the new phrase. EyeSayCorrect would then infer the user’s correction intention based on the inputs and the text context. We used a Bayesian approach for determining the selected word given an eye-gaze trajectory. Given each sampling point in an eye-gaze trajectory, the posterior probability of selecting a word is calculated and accumulated. The target word would be selected when its accumulated interest is larger than a threshold. The misspelt words have higher priors. Our user studies showed that using priors for misspelt words reduced the task completion time up to 23.79% and the text selection time up to 40.35%, and EyeSayCorrect is a feasible hands-free text correction method on mobile devices.

Proceedings ArticleDOI
23 May 2022
TL;DR: Measurement of invasive electrocorticogram signals from epilepsy patients as they spoke a sentence consisting of multiple phrases revealed that the proposed model with the Transformer achieved significantly better decoding accuracy than a conventional long short-term memory model.
Abstract: Invasive brain–machine interfaces (BMIs) are a promising neurotechnological venture for achieving direct speech communication from a human brain, but it faces many challenges. In this paper, we measured the invasive electrocorticogram (ECoG) signals from seven participating epilepsy patients as they spoke a sentence consisting of multiple phrases. A Transformer encoder was incorporated into a "sequence-to-sequence" model to decode spoken sentences from the ECoG. The decoding test revealed that the use of the Transformer model achieved a minimum phrase error rate (PER) of 16.4%, and the median (±standard deviation) across seven participants was 31.3% (±10.0%). Moreover, the proposed model with the Transformer achieved significantly better decoding accuracy than a conventional long short-term memory model.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed an approach by which appropriate keyphrases are assigned to scientific video lectures, in which the textual content of video frames along with the text extracted from audio signal are merged together, and a new keyphrase extraction method is proposed.
Abstract: Due to the growth of technology, the expansion of communication infrastructure and crises of COVID-19 pandemic, e-learning and virtual education is expanding. One of the best ways to access and organize these information is indexing using automatic intelligent methods. Indexing requires assigning keywords or keyphrases to each video, to represent its content. The main focus of this research is to propose an approach by which appropriate keyphrases are assigned to scientific video lectures. For this purpose, a new algorithm called LVTIA, Lecture Video Text mining-base Indexing Algorithm, is proposed in which the textual content of video frames along with the text extracted from audio signal are merged together, and a new keyphrase extraction method is proposed. The proposed method considers new local and global features for each candidate phrases, along with a new feature reflecting the occurrence of each phrase in the audio signals or video frames. The method is implemented using five distinct data sets in English and Persian. The results are evaluated based on precision, recall, F1-measure and MAP@K metrics and compared with some of the well-known keyphrase extraction algorithms. Based on the results, the best MAP@K for English videos is related to LVTIA algorithm with the values of, 0.7912, 0.8069, 0.8069 for k = 5 , 10 , 15 , respectively. In addition, LVTIA is able to provide best MAP@K for Persian videos which are 0.6367, 0.6866, 0.6874 for k = 5 , 10 , 15 , respectively. According to Friedman nonparametric statistical test, the performance of different algorithms in precision, recall, F1-measure metrics, are statistically different from LVTIA as well.

Journal ArticleDOI
TL;DR: In this article , a cross-sectional study assesses patient understanding of common medical jargon terms using results of a survey conducted at the Minnesota State Fair (MSF) in 2011.
Abstract: This cross-sectional study assesses patient understanding of common medical jargon terms using results of a survey conducted at the Minnesota State Fair.

Journal ArticleDOI
TL;DR: In this paper , the discursive manifestation forms of the cultural and ideological content of cancel culture and woke movements are analyzed in terms of discursive semantics, which is mainly concerned with the critics' narratives of the above phenomena, the positions of their proponents being poorly verbalized.
Abstract: The article deals with discursive manifestation forms of the cultural and ideological content of cancel culture and woke movements. It is suggested that these phenomena are connected by cause (woke) and effect (cancel) relation. Both are analyzed in terms of the discursive semantics, which is mainly concerned with the critics’ narratives of the above phenomena, the positions of their proponents being poorly verbalized. Nevertheless, just as an example, we present a linguistic analysis of the petition posted by US undergraduate students. The category of proponents also includes representatives of management structures, described by the words government (AmE) and illiberal bureaucracies whose meanings are extremely broad in American culture, which may cause some misunderstanding. The translation interpretation of woke and cancel cultures is also supported by linguo-stylistic and cultural information from English and Russian dictionaries on the words cancel and culture. As a result, it is proved that the phrase cancel culture can have the following adequate context-related correspondences: “culture” of cancellation and “cancellation” of culture. However, in the opposite direction (from Russian to English), for similar content, the equivalents woke, wokeness, wokeism appear to be much more precise, especially if cultural or social values are implied rather than pop celebrities.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed an approach by which appropriate keyphrases are assigned to scientific video lectures, in which the textual content of video frames along with the text extracted from audio signal are merged together, and a new keyphrase extraction method is proposed.
Abstract: Due to the growth of technology, the expansion of communication infrastructure and crises of COVID-19 pandemic, e-learning and virtual education is expanding. One of the best ways to access and organize these information is indexing using automatic intelligent methods. Indexing requires assigning keywords or keyphrases to each video, to represent its content. The main focus of this research is to propose an approach by which appropriate keyphrases are assigned to scientific video lectures. For this purpose, a new algorithm called LVTIA, Lecture Video Text mining-base Indexing Algorithm, is proposed in which the textual content of video frames along with the text extracted from audio signal are merged together, and a new keyphrase extraction method is proposed. The proposed method considers new local and global features for each candidate phrases, along with a new feature reflecting the occurrence of each phrase in the audio signals or video frames. The method is implemented using five distinct data sets in English and Persian. The results are evaluated based on precision, recall, F1-measure and [email protected] metrics and compared with some of the well-known keyphrase extraction algorithms. Based on the results, the best [email protected] for English videos is related to LVTIA algorithm with the values of, 0.7912, 0.8069, 0.8069 for k=5,10,15, respectively. In addition, LVTIA is able to provide best [email protected] for Persian videos which are 0.6367, 0.6866, 0.6874 for k=5,10,15, respectively. According to Friedman nonparametric statistical test, the performance of different algorithms in precision, recall, F1-measure metrics, are statistically different from LVTIA as well.

Proceedings ArticleDOI
23 May 2022
TL;DR: This study tackles the problem of long-term, phrase-level symbolic melody inpainting by equipping a sequence prediction model with phrase- level representation and contrastive loss and shows that this method significantly outperforms the baselines.
Abstract: Deep generative modeling has already become the leading technique for music automation. However, long-term generation remains a challenging task as most methods fall short in preserving a natural structure and the overall musicality when the generation scope exceeds several beats. In this study, we tackle the problem of long-term, phrase-level symbolic melody inpainting by equipping a sequence prediction model with phrase-level representation (as an extra condition) and contrastive loss (as an extra optimization term). The underlying ideas are twofold. First, to predict phrase-level music, we need phrase-level representations as a better context. Second, we should predict notes and their high-level representations simultaneously, while contrastive loss serves as a better target for abstract representations. Experimental results show that our method significantly outperforms the baselines. In particular, contrastive loss plays a critical role in the generation quality, and the phase-level representation further enhances the structure of long-term generation.1

Journal ArticleDOI
Zhong Ji, Junhua Hu, Deyin Liu, Yuan Wu, Ye Zhao 
TL;DR: Li et al. as discussed by the authors presented a transformer-based model to extract multi-scale representations, and performed asymmetric cross-scale alignment (ACSA) to precisely align the two modalities.
Abstract: Text-based person search (TBPS) is of significant importance in intelligent surveillance, which aims to retrieve pedestrian images with high semantic relevance to a given text description. This retrieval task is characterized with both modal heterogeneity and fine-grained matching. To implement this task, one needs to extract multi-scale features from both image and text domains, and then perform the cross-modal alignment. However, most existing approaches only consider the alignment confined at their individual scales, e.g., an image-sentence or a region-phrase scale. Such a strategy adopts the presumable alignment in feature extraction, while overlooking the cross-scale alignment, e.g., image-phrase. In this paper, we present a transformer-based model to extract multi-scale representations, and perform Asymmetric Cross-Scale Alignment (ACSA) to precisely align the two modalities. Specifically, ACSA consists of a global-level alignment module and an asymmetric cross-attention module, where the former aligns an image and texts on a global scale, and the latter applies the cross-attention mechanism to dynamically align the cross-modal entities in region/image-phrase scales. Extensive experiments on two benchmark datasets CUHK-PEDES and RSTPReid demonstrate the effectiveness of our approach. Codes are available at \href{url}{https://github.com/mul-hjh/ACSA}.

Proceedings ArticleDOI
01 Jan 2022
TL;DR: The authors propose a pre-training objective based on question answering for learning general-purpose contextual representations, motivated by the intuition that the representation of a phrase in a passage should encode all questions that the phrase can answer in context.
Abstract: We propose a pre-training objective based on question answering (QA) for learning general-purpose contextual representations, motivated by the intuition that the representation of a phrase in a passage should encode all questions that the phrase can answer in context. To this end, we train a bi-encoder QA model, which independently encodes passages and questions, to match the predictions of a more accurate cross-encoder model on 80 million synthesized QA pairs. By encoding QA-relevant information, the bi-encoder’s token-level representations are useful for non-QA downstream tasks without extensive (or in some cases, any) fine-tuning. We show large improvements over both RoBERTa-large and previous state-of-the-art results on zero-shot and few-shot paraphrase detection on four datasets, few-shot named entity recognition on two datasets, and zero-shot sentiment analysis on three datasets.