scispace - formally typeset
Search or ask a question
Author

Li Nguyen

Bio: Li Nguyen is an academic researcher from University of Cambridge. The author has contributed to research in topics: Code-switching & News media. The author has an hindex of 2, co-authored 6 publications receiving 23 citations. Previous affiliations of Li Nguyen include University of Canberra & Australian National University.

Papers
More filters
Journal ArticleDOI
05 Jul 2016
TL;DR: The authors provided a socio-cognitive discourse analysis of Australian news media's use of certain metaphoric concepts to represent maritime asylum seekers (MASs) and discuss how such metaphorical constructions function to shape shared knowledge and legitimise certain immigration policies.
Abstract: This article provides a socio-cognitive discourse analysis of Australian news media’s use of certain metaphoric concepts to represent maritime asylum seekers (MASs) and discuss how such metaphorical constructions function to shape shared knowledge and legitimise certain immigration policies. The article argues that Australian news media feature a range of figurative language that discursively and consistently depicts MAS as an ‘uncontrollable danger’. Two major metaphoric themes are identified: MAS as water or water catastrophe (all italics in this document are my own italics for emphasis, unless otherwise stated), and Australia as an invaded home. These metaphorical constructions appear to have emerged at the expense of earlier concerns regarding assimilation and difference and the metaphorical use of the queue, suggesting a recent shift in the immigration discourse in Australia. We conclude that both the water catastrophe and the home metaphors cognitively concretise and socially amplify the lin...

16 citations

Journal ArticleDOI
TL;DR: This article explored the use of kin terms in a corpus of Vietnamese-English bilingual spontaneous conversation and found that the corpus features a range of single Vietnamese lexical items in English, while the corpus contained only a few Vietnamese words in English.
Abstract: This study explores the use of kin terms in a corpus of Vietnamese–English bilingual spontaneous conversation. While the corpus features a range of single Vietnamese lexical items in otherwise Engl...

11 citations

Journal ArticleDOI
TL;DR: A data-driven position paper discussing the current state of affairs, difficulties of the existing educational natural language processing (NLP) tools for CSW and possible directions for future work, and presenting some empirical user cases of how CSW manifests is presented and suggest possible technological solutions.
Abstract: Code-switching (CSW) is the phenomenon where speakers use two or more languages in a single discourse or utterance—an increasingly recognised natural product of multilingualism in many settings. In language teaching and learning in particular, code-switching has been shown to bring in many pedagogical benefits, including accelerating students’ confidence, increasing their access to content, as well as improving their participation and engagement. Unfortunately, however, current educational technologies are not yet able to keep up with this ‘multilingual turn’ in education. and are partly responsible for the constraint of this practice to only classroom contexts. In an effort to make progress in this area, we offer a data-driven position paper discussing the current state of affairs, difficulties of the existing educational natural language processing (NLP) tools for CSW and possible directions for future work. We specifically focus on two cases of feedback and assessment technologies, demonstrating how the current state of the art in these domains fails with code-switching data due to a lack of appropriate training data, lack of robust evaluation benchmarks and lack of end-to-end user-facing educational applications. We present some empirical user cases of how CSW manifests and suggest possible technological solutions for each of these scenarios.

4 citations

Journal ArticleDOI
25 Jun 2021
TL;DR: The principles behind the CanVEC toolkit were applied to data from the ICON-2016 shared task, which consists of social media posts that have been annotated with language and POS tags, and showed great potential for effectively automating the annotation of code-switched corpora, on different language combinations, and in different genres.
Abstract: Natural Language Processing (NLP) tools typically struggle to process code-switched data and so linguists are commonly forced to annotate such data manually. As this data becomes more readily available, automatic tools are increasingly needed to help speed up the annotation process and improve consistency. Last year, such a toolkit was developed to semi-automatically annotate transcribed bilingual code-switched Vietnamese-English speech data with token-based language information and POS tags (hereafter the CanVEC toolkit, L. Nguyen & Bryant, 2020). In this work, we extend this methodology to another language pair, Hindi-English, to explore the extent to which we can standardise the automation process. Specifically, we applied the principles behind the CanVEC toolkit to data from the International Conference on Natural Language Processing (ICON) 2016 shared task, which consists of social media posts (Facebook, Twitter and WhatsApp) that have been annotated with language and POS tags (Molina et al., 2016). We used the ICON-2016 annotations as the gold-standard labels in the language identification task. Ultimately, our tool achieved an F1 score of 87.99% on the ICON-2016 data. We then evaluated the first 500 tokens of each social media subset manually, and found almost 40% of all errors were caused entirely by problems with the gold-standard, i.e., our system was correct. It is thus likely that the overall accuracy of our system is higher than reported. This shows great potential for effectively automating the annotation of code-switched corpora, on different language combinations, and in different genres. We finally discuss some limitations of our approach and release our code and human evaluation together with this paper.

4 citations

Proceedings Article
01 May 2020
TL;DR: The Canberra Vietnamese-English Code-switching corpus (CanVEC), an original corpus of natural mixed speech that the authors semi-automatically annotated with language information, part of speech (POS) tags and Vietnamese translations, is introduced.
Abstract: This paper introduces the Canberra Vietnamese-English Code-switching corpus (CanVEC), an original corpus of natural mixed speech that we semi-automatically annotated with language information, part of speech (POS) tags and Vietnamese translations. The corpus, which was built to inform a sociolinguistic study on language variation and code-switching, consists of 10 hours of recorded speech (87k tokens) between 45 Vietnamese-English bilinguals living in Canberra, Australia. We describe how we collected and annotated the corpus by pipelining several monolingual toolkits to considerably speed up the annotation process. We also describe how we evaluated the automatic annotations to ensure corpus reliability. We make the corpus available for research purposes.

3 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this paper, Imagined communities: Reflections on the origin and spread of nationalism are discussed. And the history of European ideas: Vol. 21, No. 5, pp. 721-722.

13,842 citations

30 Jun 2004
TL;DR: Sociolinguistics is "the study of language as it is used by real speakers in social and situational contexts of use" as discussed by the authors, and it has four characteristics: sociolinguists believe that criteria of correct language usage be based upon not only pure grammatical standards but also societal norms in terms of its relevance and general acceptance.
Abstract: Kim, Hyesouk. 2004. Theories and Developments of Sociolinguistics. The Sociolinguistic Journal of Korea, 12(1). The purpose of this article is to understand a current status of sociolinguistics by reviewing previous studies and attempting to see the future of the discipline as linguistics. As Milroy and Milroy (1990:485) have defined, sociolinguistics is "the study of language as it is used by real speakers in social and situational contexts of use." It has four characteristics. (1)Those who study sociolinguistics are linguists but they have great interests in adding social variables to pure linguistics. Sociolinguists believe that criteria of correct language usage be based upon not only pure grammatical standards but also societal norms in terms of its relevance and general acceptance. (2)The goal of sociolinguistics is to identify a co-variance between language and society and to establish a theory of language performance. (3)Sociolinguistics regards synchronical and diachronical traits as an identical frame. (4)Sociolingustics pays attention to language usage in societal contexts and extends language competence, which is the main subject of pure linguistics, to communicative competence. D. Hymes predicts that the core areas of linguistics is actually sociolinguistics and, thus, the prefix 'socio' will not be necessary. Although we still have that prefix, it is true that sociolingusitics has already had its own identity and is growing rapidly as an independent discipline. In conclusion, this paper argues that sociolinguistics will receive more attention from linguists and play a key role in linguistics by explaining variation in language more systematically, and by interpreting and eliminating language conflict in everyday life.

744 citations

Journal ArticleDOI
TL;DR: Chafe as mentioned in this paper demonstrates how the study of language and consciousness together can provide an unexpectedly broad understanding of the way the mind works, relying on close analyses of conversational speech, as well as written fiction and non-fiction, he investigates both the flow of ideas through consciousness and the displacement of consciousness by way of memory and imagination.
Abstract: In this text, the author demonstrates how the study of language and consciousness together can provide an unexpectedly broad understanding of the way the mind works. Relying on close analyses of conversational speech, as well as written fiction and non-fiction, he investigates both the flow of ideas through consciousness and the displacement of consciousness by way of memory and imagination. Chafe draws on several decades of research to demonstrate that understanding the nature of consciousness is essential to understanding many linguistic phenomena, such as pronouns, tense, clause structure and intonation, as well as stylistic usages, such as the historical present and the free indirect style. While the book focuses on English, there are also discussions of the North American Indian language, Seneca, and the music of Mozart and of the Seneca people.

113 citations

Journal ArticleDOI
TL;DR: Poplack and Tagliamonte as discussed by the authors described the African American English in the Diaspora as a "disparity" in the 1990s and 2000s, and used it as an example of racism.
Abstract: African American English in the Diaspora: By Shana Poplack and Sali Tagliamonte. Malden, MA. Blackwell. 2001.

71 citations

Journal ArticleDOI
TL;DR: Myers-Scotton et al. as mentioned in this paper presented a code and consequences for choosing Linguistic Varieties for Choosing Language Types (Codes and Consequences).
Abstract: Codes and Consequences: Choosing Linguistic Varieties. Carol Myers-Scotton. ed. New York: Oxford University Press, 1998. 219 pp.

57 citations