Home
/
Authors
/
Hamed Nilforoshan

Author

Hamed Nilforoshan

Bio: Hamed Nilforoshan is an academic researcher from Columbia University. The author has contributed to research in topics: Computer science & Quality (business). The author has an hindex of 4, co-authored 8 publications receiving 69 citations. Previous affiliations of Hamed Nilforoshan include Stanford University.

Topics: Computer science, Quality (business), Overweight, Medicine, Recall ...read more

Papers

PDF

Open Access

More filters

Posted Content•

On the Opportunities and Risks of Foundation Models.

[...]

Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ B. Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie Chen, Kathleen Creel, Jared Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel¹, Noah D. Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Ahmad Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf H. Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Yang Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang - Show less +110 more•Institutions (1)

Stanford University¹

16 Aug 2021-arXiv: Learning

TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.

...read moreread less

Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

...read moreread less

76 citations

Journal Article•DOI•

Large-scale diet tracking data reveal disparate associations between food environment and diet

[...]

Tim Althoff, Hamed Nilforoshan, Jenna Hua, Jure Leskovec

18 Jan 2022-Nature Communications

TL;DR: For instance, this paper found that higher access to grocery stores, lower access to fast food, higher income and college education are independently associated with higher consumption of fresh fruits and vegetables, lower consumption of fast food and soda, and lower likelihood of being affected by overweight and obesity.

...read moreread less

Abstract: An unhealthy diet is a major risk factor for chronic diseases including cardiovascular disease, type 2 diabetes, and cancer1-4. Limited access to healthy food options may contribute to unhealthy diets5,6. Studying diets is challenging, typically restricted to small sample sizes, single locations, and non-uniform design across studies, and has led to mixed results on the impact of the food environment7-23. Here we leverage smartphones to track diet health, operationalized through the self-reported consumption of fresh fruits and vegetables, fast food and soda, as well as body-mass index status in a country-wide observational study of 1,164,926 U.S. participants (MyFitnessPal app users) and 2.3 billion food entries to study the independent contributions of fast food and grocery store access, income and education to diet health outcomes. This study constitutes the largest nationwide study examining the relationship between the food environment and diet to date. We find that higher access to grocery stores, lower access to fast food, higher income and college education are independently associated with higher consumption of fresh fruits and vegetables, lower consumption of fast food and soda, and lower likelihood of being affected by overweight and obesity. However, these associations vary significantly across zip codes with predominantly Black, Hispanic or white populations. For instance, high grocery store access has a significantly larger association with higher fruit and vegetable consumption in zip codes with predominantly Hispanic populations (7.4% difference) and Black populations (10.2% difference) in contrast to zip codes with predominantly white populations (1.7% difference). Policy targeted at improving food access, income and education may increase healthy eating, but intervention allocation may need to be optimized for specific subpopulations and locations.

...read moreread less

24 citations

Proceedings Article•DOI•

Causal Conceptions of Fairness and their Consequences

[...]

Hamed Nilforoshan, Johann Gaebler, Ravindra.P. Shroff, Sharad Goel

12 Jul 2022

TL;DR: In this article , the authors assemble and categorize popular causal deﬁnitions of algorithmic fairness into two broad families: (i) those that constrain the effects of decisions on counterfactual disparities; and (ii) those which constrain legally protected characteristics on decisions.

...read moreread less

Abstract: Recent work highlights the role of causality in designing equitable decision-making algorithms. It is not immediately clear, however, how existing causal conceptions of fairness relate to one another, or what the consequences are of using these deﬁnitions as design principles. Here, we ﬁrst as-semble and categorize popular causal deﬁnitions of algorithmic fairness into two broad families: (1) those that constrain the effects of decisions on counterfactual disparities; and (2) those that constrain the effects of legally protected characteristics, like race and gender, on decisions. We then show, analytically and empirically, that both families of deﬁnitions almost always —in a measure theoretic sense—result in strongly Pareto dominated decision policies, meaning there is an alternative, unconstrained policy favored by every stakeholder with preferences drawn from a large, natural class. For example, in the case of college admissions decisions, policies constrained to satisfy causal fairness deﬁnitions would be disfavored by every stakeholder with neutral or positive preferences for both academic preparedness and diversity. Indeed, under a prominent deﬁnition of causal fairness, we prove the resulting policies require admitting all students with the same probability, regardless of academic qualiﬁcations or group membership. Our results highlight formal limitations and potential adverse consequences of common mathematical notions of causal fairness. admissions, with the Pareto frontier depicted by the solid purple curve. For path-speciﬁc fairness, we set Π equal to the single path A → E → T → D , and set W = X . For each causal fairness deﬁnition, the depicted constrained policies are strongly Pareto dominated, meaning there is an alternative feasible policy that simultaneously achieves greater student-body diversity and higher college degree attainment. Our analytical results show, more generally, that under mild distributional assumptions, every policy constrained to satisfy these causal fairness deﬁnitions is strongly Pareto dominated.

...read moreread less

21 citations

Proceedings Article•DOI•

SliceNDice: Mining Suspicious Multi-Attribute Entity Groups with Multi-View Graphs

[...]

Hamed Nilforoshan¹, Neil Shah•Institutions (1)

Columbia University¹

01 Oct 2019

TL;DR: This work proposes the SliceNDice algorithm which enables efficient extraction of highly suspicious entity groups, and demonstrates its practicality in production, in terms of strong detection performance and discoveries on Snapchat's large advertiser ecosystem.

...read moreread less

Abstract: Given the reach of web platforms, bad actors have considerable incentives to manipulate and defraud users at the expense of platform integrity. This has spurred research in numerous suspicious behavior detection tasks, including detection of sybil accounts, false information, and payment scams/fraud. In this paper, we draw the insight that many such initiatives can be tackled in a common framework by posing a detection task which seeks to find groups of entities which share too many properties with one another across multiple attributes (sybil accounts created at the same time and location, propaganda spreaders broadcasting articles with the same rhetoric and with similar reshares, etc.) Our work makes four core contributions: Firstly, we posit a novel formulation of this task as a multi-view graph mining problem, in which distinct views reflect distinct attribute similarities across entities, and contextual similarity and attribute importance are respected. Secondly, we propose a novel suspiciousness metric for scoring entity groups given the abnormality of their synchronicity across multiple views, which obeys intuitive desiderata that existing metrics do not. Finally, we propose the SliceNDice algorithm which enables efficient extraction of highly suspicious entity groups, and demonstrate its practicality in production, in terms of strong detection performance and discoveries on Snapchat's large advertiser ecosystem (89% precision and numerous discoveries of real fraud rings), marked outperformance of baselines (over 97% precision/recall in simulated settings) and linear scalability.

...read moreread less

12 citations

Proceedings Article•

Leveraging Quality Prediction Models for Automatic Writing Feedback.

[...]

Hamed Nilforoshan¹, Eugene Wu¹•Institutions (1)

Columbia University¹

01 Jan 2018

TL;DR: In this paper, a perturbation-based explanation method for tree-ensembles is proposed to identify writing features that, if changed, will most improve the text quality.

...read moreread less

Abstract: User-generated, multi-paragraph writing is pervasive and important in many social media platforms (i.e. Amazon reviews, AirBnB host profiles, etc). Ensuring high-quality content is important. Unfortunately, content submitted by users is often not of high quality. Moreover, the characteristics that constitute high quality may even vary between domains in ways that users are unaware of. Automated writing feedback has the potential to immediately point out and suggest improvements during the writing process. Most approaches, however, focus on syntax/phrasing, which is only one characteristic of high-quality content. Existing research develops accurate quality prediction models. We propose combining these models with model explanation techniques to identify writing features that, if changed, will most improve the text quality. To this end, we develop a perturbation-based explanation method for a popular class of models called tree-ensembles. Furthermore, we use a weak-supervision technique to adapt this method to generate feedback for specific text segments in addition to feedback for the entire document. Our user study finds that the perturbation-based approach, when combined with segment-specific feedback, can help improve writing quality on Amazon (review helpfulness) and Airbnb (host profile trustworthiness) by > 14% (3X improvement over recent automated feedback techniques).

...read moreread less

5 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters

[...]

Yingtong Dou¹, Zhiwei Liu¹, Li Sun², Yutong Deng³, Hao Peng⁴, Philip S. Yu¹ - Show less +2 more•Institutions (4)

University of Illinois at Chicago¹, Association for Computing Machinery², Beijing University of Posts and Telecommunications³, Beihang University⁴

19 Aug 2020-arXiv: Social and Information Networks

TL;DR: This paper introduces two types of camouflages based on recent empirical studies, i.e., the feature camouflage and the relation camouflage and proposes a new model named CAmouflage-REsistant GNN (CARE-GNN), to enhance the GNN aggregation process with three unique modules against camouflages.

...read moreread less

Abstract: Graph Neural Networks (GNNs) have been widely applied to fraud detection problems in recent years, revealing the suspiciousness of nodes by aggregating their neighborhood information via different relations However, few prior works have noticed the camouflage behavior of fraudsters, which could hamper the performance of GNN-based fraud detectors during the aggregation process In this paper, we introduce two types of camouflages based on recent empirical studies, ie, the feature camouflage and the relation camouflage Existing GNNs have not addressed these two camouflages, which results in their poor performance in fraud detection problems Alternatively, we propose a new model named CAmouflage-REsistant GNN (CARE-GNN), to enhance the GNN aggregation process with three unique modules against camouflages Concretely, we first devise a label-aware similarity measure to find informative neighboring nodes Then, we leverage reinforcement learning (RL) to find the optimal amounts of neighbors to be selected Finally, the selected neighbors across different relations are aggregated together Comprehensive experiments on two real-world fraud datasets demonstrate the effectiveness of the RL algorithm The proposed CARE-GNN also outperforms state-of-the-art GNNs and GNN-based fraud detectors We integrate all GNN-based fraud detectors as an opensource toolbox: this https URL The CARE-GNN code and datasets are available at this https URL

...read moreread less

160 citations

Book•DOI•

International Quarterly of Community Health Education

[...]

Edward Lichtenstein, Lawrence Wallack, Terry F. Pechacek

01 Apr 2019

101 citations

Proceedings Article•DOI•

Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters

[...]

Yingtong Dou¹, Zhiwei Liu¹, Li Sun², Yutong Deng², Hao Peng³, Philip S. Yu¹ - Show less +2 more•Institutions (3)

University of Illinois at Chicago¹, Beijing University of Posts and Telecommunications², Beihang University³

19 Oct 2020

TL;DR: CARE-GNN as mentioned in this paper proposes a new model named CAmouflage-REsistant GNN (CAREGNN) to enhance the GNN aggregation process with three unique modules against camouflages.

...read moreread less

Abstract: Graph Neural Networks (GNNs) have been widely applied to fraud detection problems in recent years, revealing the suspiciousness of nodes by aggregating their neighborhood information via different relations. However, few prior works have noticed the camouflage behavior of fraudsters, which could hamper the performance of GNN-based fraud detectors during the aggregation process. In this paper, we introduce two types of camouflages based on recent empirical studies, i.e., the feature camouflage and the relation camouflage. Existing GNNs have not addressed these two camouflages, which results in their poor performance in fraud detection problems. Alternatively, we propose a new model named CAmouflage-REsistant GNN (CARE-GNN), to enhance the GNN aggregation process with three unique modules against camouflages. Concretely, we first devise a label-aware similarity measure to find informative neighboring nodes. Then, we leverage reinforcement learning (RL) to find the optimal amounts of neighbors to be selected. Finally, the selected neighbors across different relations are aggregated together. Comprehensive experiments on two real-world fraud datasets demonstrate the effectiveness of the RL algorithm. The proposed CARE-GNN also outperforms state-of-the-art GNNs and GNN-based fraud detectors. We integrate all GNN-based fraud detectors as an opensource toolbox https://github.com/safe-graph/DGFraud. The CARE-GNN code and datasets are available at https://github.com/YingtongDou/CARE-GNN.

...read moreread less

69 citations

Posted Content•

Finetuned Language Models Are Zero-Shot Learners

[...]

Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le - Show less +5 more

03 Sep 2021-arXiv: Computation and Language

TL;DR: The authors showed that instruction tuning on a collection of tasks described via instructions substantially improves zero-shot performance on unseen tasks and even outperforms few-shot GPT-3 by a large margin on several NLP tasks verbalized via natural language instruction templates.

...read moreread less

Abstract: This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially boosts zero-shot performance on unseen tasks. We take a 137B parameter pretrained language model and instruction-tune it on over 60 NLP tasks verbalized via natural language instruction templates. We evaluate this instruction-tuned model, which we call FLAN, on unseen task types. FLAN substantially improves the performance of its unmodified counterpart and surpasses zero-shot 175B GPT-3 on 19 of 25 tasks that we evaluate. FLAN even outperforms few-shot GPT-3 by a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze. Ablation studies reveal that number of tasks and model scale are key components to the success of instruction tuning.

...read moreread less

31 citations

Posted Content•

AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts

[...]

Tongshuang Wu, Michael Terry, Carrie J. Cai

04 Oct 2021-arXiv: Human-Computer Interaction

TL;DR: In this paper, the authors introduce the concept of Chain LLM steps together, where the output of one step becomes the input for the next, thus aggregating the gains per step.

...read moreread less

Abstract: Although large language models (LLMs) have demonstrated impressive potential on simple tasks, their breadth of scope, lack of transparency, and insufficient controllability can make them less effective when assisting humans on more complex tasks. In response, we introduce the concept of Chaining LLM steps together, where the output of one step becomes the input for the next, thus aggregating the gains per step. We first define a set of LLM primitive operations useful for Chain construction, then present an interactive system where users can modify these Chains, along with their intermediate results, in a modular way. In a 20-person user study, we found that Chaining not only improved the quality of task outcomes, but also significantly enhanced system transparency, controllability, and sense of collaboration. Additionally, we saw that users developed new ways of interacting with LLMs through Chains: they leveraged sub-tasks to calibrate model expectations, compared and contrasted alternative strategies by observing parallel downstream effects, and debugged unexpected model outputs by "unit-testing" sub-components of a Chain. In two case studies, we further explore how LLM Chains may be used in future applications.

...read moreread less

21 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

Collapse