Hima Lakkaraju



Contact

hlakkaraju@hbs.edu
hlakkaraju@seas.harvard.edu

Morgan Hall 491
Science and Engineering Complex 6.220

@hima_lakkaraju
lvhimabindu


I am an Assistant Professor at Harvard University with appointments in the Business School and the Department of Computer Science. Prior to my stint at Harvard, I received my PhD in Computer Science from Stanford University.

My research interests lie within the broad area of trustworthy machine learning. More specifically, I focus on improving the interpretability, fairness, privacy, robustness, and reasoning capabilities of different kinds of ML models, including large language models and other pre-trained models. My research addresses the following fundamental questions pertaining to human and algorithmic decision-making:

  1. How can we build interpretable and accurate models to assist in human decision-making?
  2. How do we identify and correct underlying biases in both human decisions and model predictions?
  3. How can we ensure that models and their interpretations are robust to adversarial and privacy attacks?
  4. How do we train and evaluate models when faced with missing counterfactuals?

These questions have far-reaching implications in domains involving high-stakes decisions such as health care, policy, law, and business.

I lead the AI4LIFE research group at Harvard and I recently co-founded the Trustworthy ML Initiative (TrustML) to help lower entry barriers into trustworthy ML and bring together researchers and practitioners working in the field. My research is being generously supported by NSF, Google, Amazon, JP Morgan, Adobe, Bayer, Harvard Data Science Initiative, and D^3 Insitute at Harvard. My work has been featured in various major media outlets including the New York Times, TIME magazine, Fortune, Forbes, MIT Technology Review, and Harvard Business Review.

Please check out my CV for more details about me and my research.

NOTE: I am looking for motivated graduate and undergraduate students and postdocs who are broadly interested in trustworthy machine learning and large pre-trained models. If you are excited about this line of research and would like to work with me, please read this before contacting me.

  • * below indicates equal contribution

Selected Preprints


  • In-context Unlearning: Language Models as Few Shot Unlearners
    Martin Pawelczyk, Seth Neel, Himabindu Lakkaraju
    pdf

  • Certifying LLM Safety Against Adversarial Prompting
    Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Li, Soheil Feizi, Himabindu Lakkaraju
    pdf

  • Are Large Language Models Post hoc Explainers?
    Nicholas Kroeger*, Dan Ley*, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
    pdf

  • The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
    Satyapriya Krishna*, Tessa Han*, Alex Gu, Javin Pombra, Shahin Jabbari, Steven Wu, Himabindu Lakkaraju
    Press: Fortune Magazine
    pdf


Publications


  • Quantifying Uncertainty in Natural Language Explanations of Large Language Models
    Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju
    International Conference on Artificial Intelligence and Statistics (AISTATS), 2024.
    Spotlight Presentation, NeurIPS Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models, 2023.
    pdf

  • Fair Machine Unlearning: Data Removal while Mitigating Disparities
    Alex Oesterling, Jiaqi Ma, Flavio Calmon, Himabindu Lakkaraju
    International Conference on Artificial Intelligence and Statistics (AISTATS), 2024.
    pdf

  • TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations
    Dylan Slack, Satyapriya Krishna, Himabindu Lakkaraju*, Sameer Singh*
    Nature Machine Intelligence, 2023.
    Outstanding Paper Award Honorable Mention, NeurIPS Workshop on Trustworthy and Socially Responsible ML, 2022.
    pdf

  • Evaluating Explainability for Graph Neural Networks
    Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, Marinka Zitnik
    Nature Scientific Data, 2023.
    pdf

  • Post Hoc Explanations of Language Models Can Improve Language Models
    Satyapriya Krishna, Jiaqi Ma, Dylan Slack, Asma Ghandeharioun, Sameer Singh, Himabindu Lakkaraju
    Advances in Neural Information Processing Systems (NeurIPS), 2023.
    pdf

  • Verifiable Feature Attributions: A Bridge between Post Hoc Explainability and Inherent Interpretability
    Usha Bhalla*, Suraj Srinivas*, Himabindu Lakkaraju
    Advances in Neural Information Processing Systems (NeurIPS), 2023.
    pdf

  • Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness
    Suraj Srinivas*, Sebastian Bordt*, Himabindu Lakkaraju
    Advances in Neural Information Processing Systems (NeurIPS), 2023.
    Spotlight Presentation [Top 3%]
    pdf

  • M4: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities, and Models
    Xuhong Li, Mengnan Du, Jiamin Chen, Yekun Chai, Himabindu Lakkaraju, Haoyi Xiong
    Advances in Neural Information Processing Systems (NeurIPS), 2023.

  • When Does Uncertainty Matter?: Understanding the Impact of Predictive Uncertainty in ML Assisted Decision Making
    Sean McGrath, Parth Mehta, Alexandra Zytek, Isaac Lage, Himabindu Lakkaraju
    Transactions on Machine Learning Research (TMLR), 2023.
    pdf

  • Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten
    Satyapriya Krishna*, Jiaqi Ma*, Himabindu Lakkaraju
    International Conference on Machine Learning (ICML), 2023
    pdf

  • On the Impact of Actionable Explanations on Social Segregation
    Ruijiang Gao, Himabindu Lakkaraju
    International Conference on Machine Learning (ICML), 2023
    pdf

  • On Minimizing the Impact of Dataset Shifts on Actionable Explanations
    Anna Meyer*, Dan Ley*, Suraj Srinivas, Himabindu Lakkaraju
    Conference on Uncertainty in Artificial Intelligence (UAI), 2023
    Oral Presentation [Top 5%]
    pdf

  • Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness in Algorithmic Recourse
    Martin Pawelczyk, Teresa Datta, Johannes van-den-Heuvel, Gjergji Kasneci, Himabindu Lakkaraju
    International Conference on Learning Representations (ICLR), 2023
    pdf

  • On the Privacy Risks of Algorithmic Recourse
    Martin Pawelczyk, Himabindu Lakkaraju, Seth Neel
    International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
    pdf

  • Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post hoc Explanations
    Tessa Han, Suraj Srinivas, Himabindu Lakkaraju
    Advances in Neural Information Processing Systems (NeurIPS), 2022.
    Best Paper Award, ICML Workshop on Interpretable Machine Learning in Healthcare, 2022.
    pdf

  • Flatten the Curve: Efficiently Training Low-Curvature Neural Networks
    Suraj Srinivas, Kyle Matoba, Himabindu Lakkaraju, Francois Fleuret
    Advances in Neural Information Processing Systems (NeurIPS), 2022.
    pdf

  • OpenXAI: Towards a Transparent Evaluation of Model Explanations
    Chirag Agarwal, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju
    Advances in Neural Information Processing Systems (NeurIPS), 2022.
    pdf

  • Data Poisoning Attacks on Off-Policy Evaluation Methods
    Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju
    Conference on Uncertainty in Artificial Intelligence (UAI), 2022.
    Oral Presentation [Top 5%]
    pdf

  • Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis.
    Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, Himabindu Lakkaraju
    International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
    pdf

  • Probing GNN Explainers: A Rigorous Theoretical and Empirical Analysis of GNN Explanation Methods.
    Chirag Agarwal, Marinka Zitnik, Himabindu Lakkaraju
    International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
    pdf

  • Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations.
    Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen Bach, Himabindu Lakkaraju
    AAAI/ACM Conference on AI, Society, and Ethics (AIES), 2022.
    pdf

  • Towards Robust Off-Policy Evaluation via Human Inputs.
    Harvineet Singh, Shalmali Joshi, Finale Doshi-Velez, Himabindu Lakkaraju
    AAAI/ACM Conference on AI, Society, and Ethics (AIES), 2022.
    pdf

  • A Human-Centric Take on Model Monitoring.
    Murtuza N Shergadwala, Himabindu Lakkaraju, Krishnaram Kenthapadi
    AAAI Conference on Human Computation and Crowdsourcing (HCOMP), 2022.
    pdf

  • Towards the Unification and Robustness of Post hoc Explanation Methods.
    Sushant Agarwal, Shahin Jabbari, Chirag Agarwal*, Sohini Upadhyay*, Steven Wu, Himabindu Lakkaraju
    Symposium on Foundations of Responsible Computing (FORC), 2022.
    pdf

  • Towards Robust and Reliable Algorithmic Recourse.
    Sohini Upadhyay*, Shalmali Joshi*, Himabindu Lakkaraju
    Advances in Neural Information Processing Systems (NeurIPS), 2021.
    Best Paper Runner Up, ICML Workshop on Algorithmic Recourse, 2021.
    pdf

  • Reliable Post hoc Explanations: Modeling Uncertainty in Explainability.
    Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju
    Advances in Neural Information Processing Systems (NeurIPS), 2021.
    pdf

  • Counterfactual Explanations Can Be Manipulated
    Dylan Slack, Sophie Hilgard, Himabindu Lakkaraju, Sameer Singh
    Advances in Neural Information Processing Systems (NeurIPS), 2021.
    pdf

  • Learning Models for Algorithmic Recourse
    Alexis Ross, Himabindu Lakkaraju, Osbert Bastani
    Advances in Neural Information Processing Systems (NeurIPS), 2021.
    pdf

  • Towards the Unification and Robustness of Perturbation and Gradient Based Explanations.
    Sushant Agarwal, Shahin Jabbari, Chirag Agarwal*, Sohini Upadhyay*, Steven Wu, Himabindu Lakkaraju
    International Conference on Machine Learning (ICML), 2021.
    Spotlight Presentation
    pdf

  • Towards a Unified Framework for Fair and Stable Graph Representation Learning.
    Chirag Agarwal, Himabindu Lakkaraju, Marinka Zitnik
    Conference on Uncertainty in Artificial Intelligence (UAI), 2021.
    pdf

  • Fair influence maximization: A welfare optimization approach.
    Aida Rahmattalabi, Shahin Jabbari, Himabindu Lakkaraju, Phebe Vayanos, Eric Rice, Milind Tambe
    AAAI International Conference on Artificial Intelligence (AAAI), 2021.
    pdf

  • Does Fair Ranking Improve Minority Outcomes? Understanding the Interplay of Human and Algorithmic Biases in Online Hiring.
    Tom Suhr, Sophie Hilgard, Himabindu Lakkaraju
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2021.
    pdf

  • Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses
    Kaivalya Rawal and Himabindu Lakkaraju.
    Advances in Neural Information Processing Systems (NeurIPS), 2020.
    pdf

  • Incorporating Interpretable Output Constraints in Bayesian Neural Networks
    Wanqian Yang, Lars Lorch, Moritz Gaule, Himabindu Lakkaraju, Finale Doshi-Velez.
    Advances in Neural Information Processing Systems (NeurIPS), 2020.
    Spotlight Presentation [Top 3%]
    pdf

  • Robust and Stable Black Box Explanations.
    Himabindu Lakkaraju, Nino Arsov, Osbert Bastani.
    International Conference on Machine Learning (ICML), 2020.
    pdf

  • Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods.
    Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, Himabindu Lakkaraju.
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2020.
    Oral Presentation
    pdf
    Press: deeplearning.ai | Harvard Business Review

  • "How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations.
    Himabindu Lakkaraju, Osbert Bastani.
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2020.
    Oral Presentation
    pdf

  • Faithful and Customizable Explanations of Black Box Models.
    Himabindu Lakkaraju, Ece Kamar, Rich Carauna, Jure Leskovec.
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2019.
    Oral Presentation
    pdf

  • Human Decisions and Machine Predictions.
    Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, Sendhil Mullainathan.
    Quarterly Journal of Economics (QJE), 2018.
    Featured in MIT Technology Review, Harvard Business Review, The New York Times,
    and as Research Spotlight on National Bureau of Economics front page.
    pdf

  • The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables.
    Himabindu Lakkaraju, Jon Kleinberg, Jure Leskovec, Jens Ludwig, Sendhil Mullainathan.
    ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2017.
    Oral Presentation
    pdf

  • Learning Cost-Effective and Interpretable Treatment Regimes.
    Himabindu Lakkaraju, Cynthia Rudin.
    International Conference on Artificial Intelligence and Statistics (AISTATS), 2017.
    INFORMS Data Mining Best Paper Award .
    Invited Talk at INFORMS Annual Meeting.
    pdf

  • Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration.
    Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Eric Horvitz.
    AAAI Conference on Artificial Intelligence (AAAI), 2017.
    Featured in Bloomberg Technology.
    pdf

  • Interpretable and Explorable Approximations of Black Box Models.
    Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Jure Leskovec.
    KDD Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT ML), 2017.
    Invited Talk at INFORMS Annual Meeting.
    pdf

  • Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making.
    Himabindu Lakkaraju, Jure Leskovec.
    Advances in Neural Information Processing Systems (NIPS), 2016.
    pdf

  • Interpretable Decision Sets: A Joint Framework for Description and Prediction.
    Himabindu Lakkaraju, Stephen H. Bach, Jure Leskovec.
    ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2016.
    Invited Talk at INFORMS Annual Meeting.
    pdf

  • Mining Big Data to Extract Patterns and Predict Real-Life Outcomes.
    Michal Kosinki, Yilun Wang, Himabindu Lakkaraju, Jure Leskovec.
    Psychological Methods, 2016.
    pdf

  • Learning Cost-Effective and Interpretable Regimes for Treatment Recommendation.
    Himabindu Lakkaraju, Cynthia Rudin.
    NIPS Workshop on Interpretable Machine Learning in Complex Systems, 2016.
    pdf

  • Learning Cost-Effective and Interpretable Treatment Regimes for Judicial Bail Decisions.
    Himabindu Lakkaraju, Cynthia Rudin.
    NIPS Symposium on Machine Learning and the Law, 2016.
    pdf

  • Discovering Unknown Unknowns of Predictive Models.
    Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Eric Horvitz.
    NIPS Workshop on Reliable Machine Learning in the Wild, 2016.
    pdf

  • A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes.
    Himabindu Lakkaraju, Everaldo Aguiar, Carl Shan, David Miller, Nasir Bhanpuri, Rayid Ghani.
    ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2015.
    Oral Presentation
    pdf

  • A Bayesian Framework for Modeling Human Evaluations.
    Himabindu Lakkaraju, Jure Leskovec, Jon Kleinberg, Sendhil Mullainathan.
    SIAM International Conference on Data Mining (SDM) , 2015.
    Oral Presentation
    pdf

  • Who, When, and Why: A Machine Learning Approach to Prioritizing Students at Risk of not Graduating High School on Time.
    Everaldo Aguiar, Himabindu Lakkaraju, Nasir Bhanpuri, David Miller, Ben Yuhas, Kecia Addison, Rayid Ghani.
    Learning Analytics and Knowledge Conference (LAK), 2015.
    pdf

  • What's in a name ? Understanding the Interplay Between Titles, Content, and Communities in Social Media.
    Himabindu Lakkaraju, Julian McAuley, Jure Leskovec.
    International AAAI Conference on Weblogs and Social Media (ICWSM), 2013.
    Oral Presentation
    Featured in Time, Forbes, Phys.Org, Business Insider.
    pdf

  • Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media.
    Himabindu Lakkaraju, Indrajit Bhattacharya, Chiranjib Bhattacharyya.
    IEEE International Conference on Data Mining (ICDM), 2012.
    Oral Presentation
    pdf

  • TEM: a novel perspective to modeling content on microblogs.
    Himabindu Lakkaraju, Hyung-Il Ahn.
    International World Wide Web Conference (WWW), short paper, 2012.
    pdf

  • Exploiting Coherence for the Simultaneous Discovery of Latent Facets and associated Sentiments.
    Himabindu Lakkaraju, Chiranjib Bhattacharyya, Indrajit Bhattacharya, Srujana Merugu.
    SIAM International Conference on Data Mining (SDM), 2011.
    Best Paper Award.
    pdf

  • Attention prediction on social media brand pages.
    Himabindu Lakkaraju, Jitendra Ajmera.
    ACM Conference on Information and Knowledge Management (CIKM), 2011.
    pdf

  • Smart news feeds for social networks using scalable joint latent factor models.
    Himabindu Lakkaraju, Angshu Rai, Srujana Merugu.
    International World Wide Web Conference (WWW), short paper, 2011.
    pdf


Patents


  • Extraction and Grouping of Feature Words.
    Himabindu Lakkaraju, Chiranjib Bhattacharyya, Sunil Aravindam, Kaushik Nath.
    US8484228

  • Enhancing knowledge bases using rich social media.
    Jitendra Ajmera, Shantanu Ravindra Godbole, Himabindu Lakkaraju, Bernard Andrew Roden, Ashish Verma.
    US20130224714

I am very fortunate to be working with the following core group of students, interns, postdocs, and research affiliates

  • Suraj Srinivas (Postdoc, Harvard University)
  • Chirag Agarwal (Postdoc, Harvard University)
  • Martin Pawelczyk (Postdoc, Harvard University); Co-advised with Seth Neel
  • Aounon Kumar (Postdoc, Harvard University)
  • Tessa Han (PhD Student, Harvard University)
  • Satyapriya Krishna (PhD Student, Harvard University)
  • Usha Bhalla (PhD Student, Harvard University)
  • Dan Ley (PhD Student, Harvard University)
  • Alex Oesterling (PhD Student, Harvard University); Co-advised with Flavio Calmon
  • Hanlin Zhang (PhD Student, Harvard University); Co-advised with Sham Kakade
  • Paul Hamilton (PhD Student, Harvard University)
  • Yanchen Liu (Masters Student, Harvard University)
  • Sree Harsha Tanneru (Masters Student, Harvard University)
  • Nikhil Nayak (Masters Student, Harvard University)
  • Aaron Li (Masters Student, Harvard University)
  • Catherine Huang (Undergrad, Harvard University)
  • Charu Badrinath (Undergrad, Harvard University)
  • Eric Shen (Undergrad, Harvard University)
  • Christina Xiao (Undergrad, Harvard University)

Alumni (Past Advisees, Close Collaborators, and Visitors):

  • Jiaqi Ma (Postdoc, Harvard University --> Assistant Professor, UIUC)
  • Dylan Slack (PhD Student, UC Irvine --> Research Scientist, Scale AI)
  • Alexis Ross (Undergraduate Student, Harvard University -- Winner of Hoopes Prize for Best Undergrad Thesis --> PhD Student, MIT EECS)
  • Isha Puri (Undergraduate Student, Harvard University --> PhD Student, MIT EECS)
  • Jessica Dai (Undergraduate Student, Brown University --> PhD Student, UC Berkeley EECS)
  • Aditya Karan (Masters Student, Harvard University --> PhD Student, UIUC CS)
  • Kaivalya Rawal (Masters Student, Harvard University --> Fiddler AI)
  • Ethan Kim (Undergraduate Student, Harvard University --> Cyndx)
  • Eshika Saxena (Undergraduate Student, Harvard University)

  • Sophie Hilgard (PhD Student, Harvard University --> Research Scientist, Twitter)
  • Sushant Agarwal (Masters Student, University of Waterloo --> PhD Student, Northeastern University)

  • Harvineet Singh (PhD Student, New York University; Research Intern, Harvard University --> Postdoc UCSF/UC Berkeley)
  • Tom Suhr (MS Student, TU Berlin; Research Fellow, Harvard University --> PhD Student, Max Planck Institute)
  • Elita Lobo (PhD Student, UMass Amherst; Research Intern, Harvard University)
  • Anna Meyer (PhD Student, University of Wisconsin; Research Intern, Harvard University)
  • Ruijiang Gao (PhD Student, University of Texas at Austin; Research Intern, Harvard University)
  • Vishwali Mhasawade (PhD Student, New York University; Research Intern, Harvard University)
  • Nick Kroeger (PhD Student, University of Florida; Research Intern, Harvard University)
  • Chhavi Yadav (PhD Student, UC San Diego; Research Intern, Harvard University)
  • Davor Ljubenkov (Fullbright Scholar; Research Fellow, Harvard University)

  • Introduction to Data Science and Machine Learning
    Instructor
    Harvard University, Fall 2020 - 2023.

  • Explainable AI: From Simple Predictors to Complex Generative Models
    Instructor
    Harvard University, Fall 2019, Spring 2021, Spring 2023.

  • Introduction to Data Science
    Guest Lecture
    Stanford Law School, 2016.

  • Probability with Mathemagic
    Co-Instructor
    Stanford Splash Initiative for High School Students, 2016.

  • Mining Massive Datasets Course
    Teaching Assistant
    Stanford Computer Science, 2016.

  • Submodular Optimization
    Guest Lecture
    Mining Massive Datasets Course, Stanford, 2016.

  • Introduction to Python Programming
    Co-Instructor
    Stanford Girls Teaching Girls to Code Initiative for High School Students, 2015.

  • Mathematics and Science
    Tutor
    Dreamcatchers Non-Profit Organization, Palo Alto, 2015.

  • Social and Information Network Analysis Course
    Head Teaching Assistant
    Stanford Computer Science, 2014.

  • Machine Learning Course
    Teaching Assistant
    Indian Institute of Science, 2010.

  • English and Mathematics
    Tutor
    UNICEF's Teach India Initiative, 2008 - 2010.