Harvard University Department of Statistics

Harvard University Department of Statistics Welcome to The Statistics Department at Harvard. We help students acquire the conceptual, computatio
(6)

If you are interested in donating to our Department, please visit https://community.alumni.harvard.edu/give/78328663.

Read “HEDE: Heritability estimation in high dimensions by Ensembling Debiased Estimators" by Yanke Song, Xihong Lin, and...
09/05/2024

Read “HEDE: Heritability estimation in high dimensions by Ensembling Debiased Estimators" by Yanke Song, Xihong Lin, and Pragya Sur. They develop a new method for estimating heritability or signal-to-noise ratio in challenging high-dimensional scenarios, with applications to statistical genetics:

Estimating heritability remains a significant challenge in statistical genetics. Diverse approaches have emerged over the years that are broadly categorized as either random effects or fixed effects heritability methods. In this work, we focus on the latter. We propose HEDE, an ensemble approach to....

Congrats to Kevin Luo, Yufan Li, and Pragya Sur on their paper “ROTI-GCV: Generalized Cross-Validation for right-ROTatio...
08/29/2024

Congrats to Kevin Luo, Yufan Li, and Pragya Sur on their paper “ROTI-GCV: Generalized Cross-Validation for right-ROTationally Invariant Data." They introduce ROTI-GCV, a new framework for cross-validation under sample dependence and heavy-tailed covariates:

Two key tasks in high-dimensional regularized regression are tuning the regularization strength for good predictions and estimating the out-of-sample risk. It is known that the standard approach -- $k$-fold cross-validation -- is inconsistent in modern high-dimensional settings. While leave-one-out....

Tune into the Stats and Stories podcast to explore how Bayesian methods are impacting the environmental landscape. The p...
08/28/2024

Tune into the Stats and Stories podcast to explore how Bayesian methods are impacting the environmental landscape. The podcast features Noel Cressie, Distinguished Professor at the University of Wollongong, Australia, and Director of its Centre for Environmental Informatics:

Would you be surprised if a wombat won a statistical achievement award? well our guest Noel Cressie is here to talk about the WOllongong Methodology for Bayesian Assimilation of Trace-gases and how it can impact the environmental landscape.

To all literature & stats fans, how similar is stats analysis to close readings? Take a look at responses in Statistical...
08/27/2024

To all literature & stats fans, how similar is stats analysis to close readings? Take a look at responses in Statistical Modeling, Causal Inference, and Social Science blog by Stats PhD Alum Prof. Andrew Gelman: https://statmodeling.stat.columbia.edu/2024/08/14/close-reading-in-literary-criticism-and-statistical-analysis/.

Close reading in literary criticism and statistical analysis Posted on August 14, 2024 9:14 AM by Andrew I happened to come across this comment from Jrc a few years back: A lot of your posts about journalism and writing are almost like literary criticism – close reading for context, picking out pa...

For the last meeting, our Data & Society Book Club enjoyed the outdoors while discussing The Harvard Data Science Review...
08/26/2024

For the last meeting, our Data & Society Book Club enjoyed the outdoors while discussing The Harvard Data Science Review's Special Issue 5 "Future Shock: Grappling With the Generative AI Revolution." Sponsored by the Stats Department, the book club was co-led by Julie Vu and Emily Palmer and included participants from across the Harvard campus. During this last meeting, we shared our insights and reactions to problems associated with GenAI - thanks for a lively discussion!

Thank you, Harvard Data Science Review, for your insightful article on "Is Chat GPT More Biased Than You?"  The article ...
08/23/2024

Thank you, Harvard Data Science Review, for your insightful article on "Is Chat GPT More Biased Than You?" The article examines the bias in LLMs and explores how to reduce this bias. Read more here:

AI is changing the world in ways that are difficult to forecast, but the impact will surely be enormous. Large language models (LLMs) are the most recent AI system that has captured the public eye. The rise of AI and LLMs offers efficiency and assistance, but raises questions of job loss, fairness,....

Read "Predictive Inference in Multi-environment Scenarios" by John C. Duchi, Suyash Gupta, Kuanhao Jiang and Pragya Sur,...
08/22/2024

Read "Predictive Inference in Multi-environment Scenarios" by John C. Duchi, Suyash Gupta, Kuanhao Jiang and Pragya Sur, which addresses the challenge of constructing valid confidence intervals in problems of prediction across multiple environments:

We address the challenge of constructing valid confidence intervals and sets in problems of prediction across multiple environments. We investigate two types of coverage suitable for these problems, extending the jackknife and split-conformal methods to show how to obtain distribution-free coverage....

Check out "Statistics Behind the Headlines" audiobook from the Stats and Stories podcast in which John Bailer and Rosema...
08/20/2024

Check out "Statistics Behind the Headlines" audiobook from the Stats and Stories podcast in which John Bailer and Rosemary Pennington provide a "roadmap to statistical literacy and a data self-defense course"!

Listen to full audiobook of The Statistics Behind the Headlines with John Bailer and Rosemary Pennington.

08/19/2024

Take a look at the Institute of Mathematical Statistics (IMS) August Bulletin, which features the 2024 Gottfried E. Noether Early Career Scholar Award presentations by Edgar Dobriban, Associate Professor of Statistics and Data Science at Wharton, and Lucas Janson, Associate Professor in the Harvard Statistics Department:

08/16/2024

Congratulations to Yanke Song, Sohom Bhattacharya, and Pragya Sur on their new paper, “Generalization error of min-norm interpolators in transfer learning”. They develop a new framework for quantifying the generalization error of min-norm interpolators under transfer learning: https://arxiv.org/abs/2406.13944.

08/05/2024

Congrats to Huang, Austern & Orbanz on "Gaussian Universality for Approximately Polynomial Functions of High-dimensional Data." They establish an invariance principle for polynomial functions of n independent high-dimensional random vectors: https://arxiv.org/abs/2403.10711

07/30/2024

Suqi Liu and Morgane Austern characterize the performance of graph neural networks for graph alignment problems in the presence of vertex feature information in a recent paper: https://arxiv.org/abs/2402.07340

“Future Shock: Grappling With the Generative AI Revolution,” a special issue of the Harvard Data Science Review, explore...
07/30/2024

“Future Shock: Grappling With the Generative AI Revolution,” a special issue of the Harvard Data Science Review, explores societal consequences of rapid AI advances. Related articles will appear through 11/30 (the 2nd anniversary of ChatGPT). Read Xiao-Li Meng & David Leslie's intro:

In the jarring opening scene of the 1972 documentary Future Shock, a film directed by Alexander Grasshoff and based on the classic study by sociologist and futurist Alvin Toffler (1970), the blurred silhouettes of a man and woman walk side by side down a serene country road. As the two figures amble...

07/26/2024

In Talebi, Zheng, Kraisler, Li, & Mesbahi's recent paper, discover how a geometric perspective on policy optimization transforms feedback control systems, revealing new insights into stability, performance, and algorithmic design: https://arxiv.org/abs/2406.04243

Read Daiqi Gao, Yuanjia Wang & Donglin Zeng's paper "Fusing Individualized Treatment Rules (ITR) Using Secondary Outcome...
07/26/2024

Read Daiqi Gao, Yuanjia Wang & Donglin Zeng's paper "Fusing Individualized Treatment Rules (ITR) Using Secondary Outcomes." They seek an ITR that optimizes the primary outcome and causes minimal harm to secondary outcomes:

An individualized treatment rule (ITR) is a decision rule that recommends treatments for patients based on their individual feature variables. In many practices, the ideal ITR for the primary outco...

07/25/2024

Congrats to Melody Huang on her paper "Overlap Violations in External Validity." Huang introduces a framework for considering external validity in the presence of overlap violations: https://arxiv.org/abs/2403.19504

07/25/2024

Congrats to Samy Jelassi, David Brandfonbrener, Sham Kakade, & Eran Malach on their paper "Repeat After Me: Transformers are Better than State Space Models at Copying": https://arxiv.org/abs/2402.01032

Congrats to Alan Chung, Amin Saberi & Morgane Austern on "Statistical Guarantees for Link Prediction using Graph Neural ...
07/24/2024

Congrats to Alan Chung, Amin Saberi & Morgane Austern on "Statistical Guarantees for Link Prediction using Graph Neural Networks (GNN)." They propose a linear GNN architecture that is used to produce consistent estimators for the underlying edge probabilities in the graphon random graph model. Read more here:

This paper derives statistical guarantees for the performance of Graph Neural Networks (GNNs) in link prediction tasks on graphs generated by a graphon. We propose a linear GNN architecture (LG-GNN) that produces consistent estimators for the underlying edge probabilities. We establish a bound on th...

Anna Trella, Kelly Zhang, Inbal Nahum-Shani, Vivek Shetty, Iris Yan, Finale Doshi-Velez, and Susan Murphy introduce a ne...
07/24/2024

Anna Trella, Kelly Zhang, Inbal Nahum-Shani, Vivek Shetty, Iris Yan, Finale Doshi-Velez, and Susan Murphy introduce a new framework for monitoring the fidelity of online reinforcement learning algorithms in clinical trials:

Online reinforcement learning (RL) algorithms offer great potential for personalizing treatment for participants in clinical trials. However, deploying an online, autonomous algorithm in the high-stakes healthcare setting makes quality control and data quality especially difficult to achieve. This p...

In a recent paper, PhD Student Zeyang Jia and Professors Kosuke Imai & Michael Lingzhi Li show that the cram method, com...
07/23/2024

In a recent paper, PhD Student Zeyang Jia and Professors Kosuke Imai & Michael Lingzhi Li show that the cram method, compared to sample-fitting, reduces the evaluation standard error by more than 40% & improves learned policy performance:

We introduce the "cram" method, a general and efficient approach to simultaneous learning and evaluation using a generic machine learning (ML) algorithm. In a single pass of batched data, the proposed method repeatedly trains an ML algorithm and tests its empirical performance. Because it utilizes t...

Read Ziping Xu, Kelly Zhang, & Susan Murphy’s paper on “The Fallacy of Minimizing Local Regret in the Sequential Task Se...
07/19/2024

Read Ziping Xu, Kelly Zhang, & Susan Murphy’s paper on “The Fallacy of Minimizing Local Regret in the Sequential Task Setting.” To realize the optimal cumulative regret bound across all the tasks, algorithms have to overly explore in the earlier tasks:

In the realm of Reinforcement Learning (RL), online RL is often conceptualized as an optimization problem, where an algorithm interacts with an unknown environment to minimize cumulative regret. In a stationary setting, strong theoretical guarantees, like a sublinear ($\sqrt{T}$) regret bound, can b...

Check out PhD student Haoyu Ye, Peter Orbanz and Morgane Austern's paper on “Poisson Approximation for Stochastic Proces...
07/18/2024

Check out PhD student Haoyu Ye, Peter Orbanz and Morgane Austern's paper on “Poisson Approximation for Stochastic Processes Summed Over Amenable Groups":

We generalize the Poisson limit theorem to binary functions of random objects whose law is invariant under the action of an amenable group. Examples include stationary random fields, exchangeable sequences, and exchangeable graphs. A celebrated result of E. Lindenstrauss shows that normalized sums o...

07/18/2024

Take a look at the paper "On Naive Mean-Field Approximation for high-dimensional canonical GLMs" by Prof. Sumit Mukherjee, PhD graduate Jiaze Qiu, Prof. Subhabrata Sen: https://arxiv.org/abs/2406.15247

Qi, Zhang, Xing, Kakade & Lakkaraju study risk of datastore leakage in Retrieval-In-Context RAG Language Models. They de...
07/17/2024

Qi, Zhang, Xing, Kakade & Lakkaraju study risk of datastore leakage in Retrieval-In-Context RAG Language Models. They design an attack that can cause datastore leakage with a 100% success rate on 25 randomly selected customized GPTs with at most 2 queries:

Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-In-Context RAG Language Models (LMs). We show that an adversary can exploit LMs' instruction-followin...

07/17/2024

Check out Stats Prof. Zheng Tracy Ke, Pengsheng Ji, Jiashun Jin, & Wanshan Li’s paper on “Recent Advances in Text Analysis" (in the Annual Review of Statistics and Its Applications). They conduct text analysis on a stats publication dataset to analyze trends in stats research from 1975-2015:

Address

1 Oxford Street
Cambridge, MA
02138

Alerts

Be the first to know and let us send you an email when Harvard University Department of Statistics posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Videos

Share


Other Cambridge travel agencies

Show All