Click here for my full academic CV (updated January 2023) and here for my project-based resume.
See my Google Scholar or Semantic Scholar page for the most up-to-date list of publications.
Papers
Current preprints (submitted/unpublished)
- Rastogi C., Teh, T. H., Mishra, P., Patel, R., Ashwood, Z., Davani, A. M., Díaz, M., Paganini, M., Parrish, A., Wang, D., Prabhakaran, V., Aroyo, L., Rieser, V. (2024) Insights on Disagreement Patterns in Multimodal Safety Perception across Diverse Rater Groups. Preprint link: https://arxiv.org/abs/2410.17032
- Röttger et al. (2025). MSTS: A Multimodal Safety Test Suite for Vision-Language Models. Preprint link: https://arxiv.org/abs/2501.10057
- Mishra et al. (2025). Nuanced Safety for Generative AI: How Demographics Shape Responsiveness to Severity. Preprint link: https://arxiv.org/abs/2503.05609
2025
- Ghosh et al., (2025). AILuminate: Introducing v1. 0 of the AI risk and reliability benchmark from MLCommons. Preprint link: https://arxiv.org/abs/2503.05731
2024
- Pfohl, S. [+30 authors]. A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models. (2024). Nature Medicine.
- *Parrish, A., *Prabhakaran, V.,Aroyo, L, Homan, C., Taylor, A., Díaz, M., Wang, D., Serapio-García, G. (2024). Diversity-aware annotation for conversational AI safety. In the Proceedings of Safety for Conversational AI Workshop. *Equal contribution.
- *Quaye, J., *Parrish, A., Inel, O., Rastogi, C., Kirk, H. R., Kahng, M., van Liemt, E., Bartolo, M., Tsang, J., White, J., Clement, N., Mosquera, R., Ciro, J., Reddi, V. J., Aroyo, L. (2024) Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation. In the Proceedings of FAccT. Preprint link: https://arxiv.org/abs/2403.12075. *Equal contribution.
- Parrish, A., Hao, S. Laszlo, S., & Aroyo, L. (2024). “Is a picture of a bird a bird”: A mixed-methods approach to understanding diverse human perspectives and ambiguity in machine vision models. In the Proceedings of NLPerspectives.
- *Homan, C. M., *Serapio-García, G., Aroyo, L., Díaz, M., Parrish, A., Prabhakaran, V., Taylor, A. S., & Wang, D. (2024). Intersectionality in AI Safety: Using Multilevel Models to Understand Diverse Perceptions of Safety in Conversational AI. In the Proceedings of NLPerspectives.
- Oala, L., [+37 authors] (2024). DMLR: Data-centric Machine Learning Research — Past, Present and Future. In the Journal of Data-centric Machine Learning Research. Preprint link: https://arxiv.org/abs/2311.13028. Featured certification.
- Prabhakaran, V., Homan, C., Aroyo, L., Parrish, A., Taylor, A., Díaz, M., Wang, D. (2024). GRASP Metrics: A Framework to Assess (Dis)agreement Among Diverse Rater Groups. To appear in Proceedings of NAACL. Preprint link: https://arxiv.org/abs/2311.05074
- *Aroyo, L., *Taylor, A. S., Díaz, M., Homan, C. M., Parrish, A., Serapio-García, G., Prabhakaran, V., & Wang, D. (2024). DICES Dataset: Diversity in Conversational AI Evaluation for Safety. In Proceedings of the NeurIPS 2023 Datasets and Benchmarks Track. Preprint link: https://arxiv.org/abs/2306.11247
- Mazumder, M., [+43 authors]. DataPerf: Benchmarks for data-centric AI development. (2024). In Proceedings of the NeurIPS 2023 Datasets and Benchmarks Track. Preprint link: https://arxiv.org/pdf/2207.10062.pdf.
- Chen, A., Phang, J., Parrish, A., Padmakumar, V., Zhao, C., Bowman, S. R., & Cho, K. (2024). Two failures of self-consistency in the multi-step reasoning of LLMs. In Transactions on Machine Learning Research.
- Four large-scale collaborations:
- Imagen Team, Google [250 authors]. Imagen 3. Preprint link: https://arxiv.org/abs/2408.07009
- Gemma Team [204 authors]. Gemma 2: Improving Open Language Models at a Practical Size. Preprint link: https://arxiv.org/pdf/2408.00118
- Gemini Team Google [+1134 authors]. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. https://arxiv.org/abs/2403.05530
- Vidgen, B. [+96 authors]. Introducing v0.5 of the AI Safety Benchmark from MLCommons. (2024) Preprint link: https://arxiv.org/abs/2404.12241
2023
- McKenzie, I. R., Lyzhov, A., Pieler, M., Parrish, A., Mueller, A., Prabhu, A., McLean, E., Kirtland, A., Ross, A., Liu, A., Gritsevskiy, A., Wurgaft, D., Kauffman, D., Recchia, G., Liu, J., Cavanagh, J., Weiss, M., Huang, S., The Floating Droid, Tseng, T., Korbak, T., Shen, X., Zhang, Y., Zhou, Z., Kim, N., Bowman, S. R., & Perez, E. (2023). Inverse Scaling: When Bigger Isn’t Better. Transactions on Machine Learning Research. Featured certification.
- Goldman, E., Bou-Dargham, S., Lai, M., Guda, A., Fallon, J., Hauptman, M., Reinoso, A., Phillips, S., Abrams, E., Parrish, A., Pylkkänen, L. (2023). MEG correlates of speech planning in simple vs. interactive picture naming in children and adults. PLoS ONE 18(10): e0292316.
- Michael, J., Holtzman, A., Parrish, A., Mueller, A., Wang, A., Chen, A., Madaan, D., Nangia, N., Pang, R. Y., Phang, J., & Bowman, S. R. (2023). What Do NLP Researchers Believe? Results of the NLP Community Metasurvey. In Proceedings of the Association for Computational Linguistics (ACL) 2023.
- Three large-scale collaborations:
- Gemini Team Google (941 authors). (2023). Gemini: A Family of Highly Capable Multimodal Models. arXiv link: https://arxiv.org/abs/2312.11805
- Google (128 authors). (2023). PaLM 2 technical report. arXiv link: https://arxiv.org/abs/2305.10403
- BIG-Bench (444 authors). (2023). Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research.
2022
- *Parrish, A., *Trivedi, H., Nangia, N., Padmakumar, V., Phang, J., Saimbhi, A. S., & Bowman, S. R. (2022) Two-Turn Debate Does Not Help Humans Answer Hard Reading Comprehension Questions. In Proceedings of the 2022 NeurIPS Machine Learning Safety Workshop. *Equal contribution. Best paper award.
- *Pang, R. Y., *Parrish, A., *Joshi, N., Nangia, N., Phang, J., Chen, A., Padmakumar, V., Ma, J., Thompson, J., He, H. & Bowman, S. R. (2022). QuALITY: Question Answering with Long Input Texts, Yes!. In Proceedings of the North American Association for Computational Linguistics (NAACL) 2022. *Equal contribution.
- Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P. M., & Bowman, S. R. (2022). BBQ: A Hand-Built Bias Benchmark Question Answering. In Findings of the Association for Computational Linguistics (ACL) 2022.
- *Parrish, A., *Trivedi, H., *Perez, E., Chen, A., Nangia, N., Phang, J., & Bowman, S. R. (2022). Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions. In Proceedings of the Workshop on Learning from Natural Language Supervision (LNLS) 2022. *Equal contribution.
- Parrish, A. & Pylkkänen, L. (2022). Conceptual combination in the LATL with and without syntactic composition. Neurobiology of Language. https://doi.org/10.1162/nol_a_00048.
2021
- Parrish, A., Huang, W., Agha, O., Lee, S.-H., Nangia, N., Warstadt, A., Aggarwal, K., Allaway, E., Linzen, T., & Bowman, S. (2021). Does Putting a Linguist in the Loop Improve NLU Data Collection? In Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021.
- *Parrish, A., *Schuster, S., *Warstadt, A., Agha, O., Lee, S.-H., Zhao, Z., Bowman, S. R., & Linzen, T. (2021). NOPE: A Corpus of Naturally-Occurring Presuppositions in English. In the Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL) 2021. *Equal contribution.
2020
- Warstadt, A., Parrish, A., Liu, H., Mohananey, A., Peng, W., Wang, S. F., & Bowman, S. (2020). BLiMP: The Benchmark of Linguistic Minimal Pairs for English. Transactions of the Association for Computational Linguistics, 8. (pp. 377–392).
- Parrish, A. & Cournane, A. (2020). A within-subjects comparison of the acquisition of quantity-related inferences. In Proceedings of the Linguistics Society of America, 5(1). (pp. 558–572). https://doi.org/10.3765/plsa.v5i1.4731.
2019
- *Warstadt, A., *Cao, Y., *Grosu, I., *Peng, W., *Blix, H., *Nie, Y., *Alsop, A., *Bordia, S., *Liu, H., *Parrish, A., *Wang, S. F., *Phang, J., *Mohananey, A., *Htut, P. M., *Jeretič, P., & Bowman, S. R. (2019). Investigating BERT’s knowledge of language: Five analysis methods with NPIs. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2019. *Equal contribution.
- *Durvasula, K. & *Parrish, A. (2019). Is there phonological feature priming? Linguistics Vanguard, 5(1). DOI: 10.1515/lingvan-2018-0041. *Equal contribution.
- Parrish, A. & Feldscher C. (2019). On the structure of splitting verbs in Yoruba. In Theory and description in African Linguistics (pp. 537–554). Berlin: Language Science Press. http://doi.org/10.5281/zenodo.3367183.
Unpublished Manuscripts
- *Wang, D., *Díaz, M., *Parrish, A., Aroyo, L., Homan, C. M., Serapio-García, G., Prabhakaran, V., Taylor, A. S. (2023). All that Agrees Is Not Gold: Evaluating Ground Truth Labels and Dialogue Content for Safety. Preprint link: https://research.google/pubs/pub52726.pdf. *Equal contribution.
- Parrish, A., Rodriguez, A., & Pylkkänen, L. (2023). Non-Local Conceptual Combination. Preprint link: https://www.biorxiv.org/content/10.1101/2022.12.11.519989v1
- *Parrish, A., *Kirk, H. R., *Quaye, J., *Rastogi, C., *Bartolo, M., *Inel, O., Ciro, J., Mosquera, R., Howard, A., Cukierski, W., Sculley, D., *Reddi, V. J., & *Aroyo, L. (2023). Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models. Preprint link: https://arxiv.org/abs/2305.14384. *Equal contribution.
- Parrish, A. (2022). The Interaction Between Conceptual Combination and Linguistic Structure. Unpublished PhD thesis, New York University
- Parrish, A. (2017). Incremental processing effects in nominal compounds. Unpublished MA thesis, Michigan State University.