I’m a second year Ph.D. student in the CILVR lab advised by Mengye Ren at New York University. My research is supported by the NSERC PGS-D scholarship. I obtained my M.Sc. in Computer Science advised by Richard Zemel and B.A.Sc in Engineering Science from the University of Toronto.

Recently, my work is focused on:

  • Unsupervised visual representation learning using minimally curated data – specifically first-person, long-form and multi-object videos with no obvious subject, foreground, or background.
  • The impact of architectural elements in ViTs like patchification, non-spatial tokens and positional embeddings and their effect on learned representations.

Looking ahead, I’d also like to explore the limits of video benchmarks and what they may, or may not, be telling us about our video representations. Finally, I’m interested in the use of 3D and video understanding in reasoning with large pretrained language models.

Feel free to contact me at anw2067 [at]