Modelling coreference resolution across syntax, semantics, and discourse

Modelling coreference resolution across syntax, semantics, and discourse, Leverhulme Trust project RPG-2025-064, runs from September 2025 to August 2027. The Principal Investigator is Prof. Mary Dalrymple, and the Postdoctoral Research Fellow is Dr. Chit Fung Lam (Lawrence).

Coreference is a linguistic phenomenon where different words refer to the same entity. For example, in the sentence Thomas said he shaved himself, he and himself may both refer to Thomas; the referential relationships between words often vary, depending on the sentence structure, meaning, and context. Understanding coreference across languages is known to be a key challenge for theoretical, experimental, and computational linguistics.

While coreference has been studied in many languages, little attention has been paid to Cantonese, which exhibits complex coreference behaviour. Existing theoretical work on Cantonese coreference is rare, and none to date provides a comprehensive analysis with formally explicit constraints capturing the interplay across all relevant levels of grammatical architecture. In the field of natural language processing, Cantonese is an under-resourced language in terms of data scale and diversity. Therefore, to further demystify the cross-linguistic variation of coreference phenomena, our project has adopted Cantonese as the main object language for investigation.

We will integrate theoretical, experimental, corpus, and computational approaches, embodying a holistic investigation. Our theoretical analysis will be conducted using an advanced grammatical framework, namely Lexical-Functional Grammar (LFG). What makes LFG especially useful for this project is its ability to handle different levels of linguistic analysis simultaneously, from sentence structure to meaning and discourse. LFG works well with Glue Semantics, which helps us understand how different pieces of meaning fit together in a sentence, and with Discourse Representation Theory, which focuses on how information is carried across sentences in a conversation. Our research will also take into account existing analyses that have been conducted in other linguistic frameworks, in particular Minimalism, in the spirit of facilitating cross-framework dialogue and understanding.

Computationally, we will develop computational grammar fragments for Cantonese using the tool Xerox Linguistic Environment, which enables computational testing of linguistic constraints and creation of computational grammar resources. In the current AI-driven landscape, handcrafted computational grammars, grounded in well-defined linguistic theories, have often been overlooked. However, it has been noted that machine-learning models, including LLMs, require vast amounts of training data, which are difficult and expensive to collect and annotate. Broad-coverage handcrafted grammars could, in principle, generate high-quality, well-annotated data for training and fine-tuning these models.

More broadly, our project aims to inform the development of the linguistic theory of coreference resolution by adducing solid empirical evidence. As part of our project, we will engage with the wider computational linguistics community via the Parallel Grammar Consortium to explore the future potential of handcrafted grammars in the AI era, bridging the current gap between theoretical and computational linguistics.

Papers, publications, presentations, and course materials

2026:

Syntax, Semantics, and Processing: A Constraint-Based Theory. Chit-Fung Lam (Lawrence). Online course taught at The NYI Global Institute of Cultural, Cognitive, and Linguistic Studies (V-NYI 2026), July 2026. Materials available: https://sites.google.com/view/c-f-lam/materials

Grammar Engineering Meets LLMs: Development of Cantonese and Irish ParGram Treebanks. Chit-Fung Lam (Lawrence) & Elaine Uí Dhonnchadha. The Third Workshop on Bridges and Gaps between Formal and Computational Linguistics. Université Paris Cité, Paris, 11 July 2026. Association for Computational Linguistics (To appear in ACL Anthology, archival long paper). Preprint: https://research.manchester.ac.uk/en/publications/grammar-engineering-meets-llms-development-of-cantonese-and-irish/

Extending the ParGram Treebank: Progress and Issues in Cantonese and Irish. LFG’26: The 31st International Lexical-Functional Grammar Conference. Chit-Fung Lam (Lawrence), Elaine Uí Dhonnchadha. University of Jaffna, Sri Lanka, 27–30 July, 2026. Conference paper in progress for the LFG'26 Proceedings.

Copy control and other control properties in Mandarin. Chit-Fung Lam (Lawrence) & Mary Dalrymple. Preprint manuscript: https://ling.auf.net/lingbuzz/009650

Controlling Overt Subject in Cantonese and Mandarin: Where Theory Meets Experiment and Grammar Engineering. Plenary talk at Manchester Forum in Linguistics (MFiL). Chit-Fung Lam (Lawrence). University of Manchester, 30 April–1 May, 2026. https://mfilconf.wordpress.com/plenary-speakers/

Invited talk on coreference theory in Cantonese and Mandarin in the SynSem workshop. Chit-Fung Lam (Lawrence). University of Oxford, Trinity Term, 2026.

Modelling coreference resolution across syntax, semantics, and discourse

Leverhulme Trust project RPG-2025-064

Papers, publications, presentations, and course materials

Main sections

Investigators

Supported By