PCC: Accelerating Chain-of-Thought and Context-Driven Inference in LLMs

This project was undertaken as coursework for the graduate course in computer science - CSC2210: Visual and Mobile Computing Systems, at the University of Toronto during my undergraduate studies.

In this project we studied methods of accelerating inference in large language models (LLMs), particularly in the scenario of chain-of-thought (CoT) reasoning and when the prompt was based on including prior context. We proposed a method called PCC (Parameterizing Context Clues) which used an auxiliary network to parameterize the additional context clues in the prompt (eg. prior context, the chain of thought reasoning) as LoRA parameters. These parameters were dynamically generated based on the input and attached to LoRA heads in the LLM to adjust model activations and influence the model’s reasoning process.

We submitted this work to the ICML 2025 workshop on Test-Time Adaptation, but were rejected due to a shortage of experimental results and because a similar idea (unbeknown to us at the time) was proposed in a concurrent work which was accepted to the main conference.

The project also included another method called Meta-LoRA which proposed a new method for learning LoRA parameters by including metadata about the LoRA head and the LLM architecture and layer to better inform the parameter learning process. This method was not the focus of the project, but was included as a side experiment to explore the potential of metadata in improving LoRA parameter learning. This method is in the project report but not in the workshop submission.

Link to the project report and Link to the workshop paper we submitted.