Connecting Reasoning, Model Editing, and LLM Knowledge: RECKONING's Contribution to AI Research

cover
28 Oct 2025

Abstract and 1. Introduction

  1. Background

  2. Method

  3. Experiments

    4.1 Multi-hop Reasoning Performance

    4.2 Reasoning with Distractors

    4.3 Generalization to Real-World knowledge

    4.4 Run-time Analysis

    4.5 Memorizing Knowledge

  4. Related Work

  5. Conclusion, Acknowledgements, and References

A. Dataset

B. In-context Reasoning with Distractors

C. Implementation Details

D. Adaptive Learning Rate

E. Experiments with Large Language Models

Logical Reasoning Datasets and Benchmarks As a central building block of human cognition and intelligence [27], logical reasoning has been a long-pursued topic in the field of AI [2, 9, 13, 17, 45, 47, 54, 72]. Logical reasoning, in general, can be categorized in a trichotomy of deductive, inductive, and abductive reasoning [25]. Multiple datasets have been published that evaluate neural models’ ability on these three types of logical reasoning [6, 17, 71]. Initially, logical reasoning tasks focused on hypothesis classification, where, given a theory consisting of multiple facts and rules, a model would determine whether the hypothesis was correct. Recently, transformer-based language models have been directly used to solve this task in synthetic [17, 65], real-world [29], and adversarial [26, 61, 67] settings. However, simply predicting whether the hypothesis is valid does not elucidate whether the model correctly reasons over the provided knowledge. To better analyze and interpret the reasoning process of language models, new tasks focus on generating the valid proof that explains the model’s decision [20, 73]. Our proposed method, RECKONING, is optimized for the hypothesis classification reasoning task and evaluates on many of these datasets [28, 73, 29].

Logical Reasoning over Natural Language Historically, automatic logical reasoners used symbolic systems and formal languages as a knowledge representation [1, 42, 48, 51, 54, 80]. However, these systems were hard to scale up due to the knowledge-acquisition bottleneck and the brittleness of formal representation [34, 83]. With recent advances in transformer-based language modeling [75] and self-supervised pre-training [21, 59, 60], a novel paradigm for logical reasoning emerged, where pre-trained language models (PLMs) could be used as soft reasoners over knowledge expressed in natural language. Natural language as a knowledge representation allowed PLMs to handle raw input with diverse formats [15, 31], resulting in PLMs being applied to various types of deductive [17], abductive [6], and inductive [28] reasoning tasks. However, language models as soft reasoners also showed structural weaknesses, as their performance dropped on complex logical operations [13, 79], and their reasoning process was not interpretable [44, 64]. Consequently, a new line of work uses neuro-symbolic methods to combine the best of both language models and symbolic reasoning [7, 14, 35, 40, 43]. Specifically, the interpretability gap motivated modular and step-wise reasoning systems that use PLMs as intermediate modules [32, 58, 66, 68, 74, 82] to generate reasoning steps (e.g., proofs). In contrast to these works, our method RECKONING dynamically encodes natural language knowledge into the model parameters, thereby reasoning by mixing contextual knowledge with pre-encoded parametric knowledge and allowing the model to determine a conclusion based on its updated parametric knowledge.

Model Editing While our motivations are grounded in research on machine reasoning, our methods are more often used in the area of model editing. Model editing is a method to edit a model’s parameters to correct its errors or update the model. Several works propose hypernetwork-based methods to edit knowledge in a model by predicting updates conditioned on new factual statements [30] or transforming the gradients from new provided facts [53] to make local edits to a model. Other approaches focus on more direct edits of model behavior, such as directly modifying neuron outputs [19, 84], localizing distinct feed-forward layers that are responsible for factual recall, and modifying these weights [49], and performing weight updates across multiple layers to perform simultaneous edits [50]. Similarly, our method also rapidly edits the model parameters to add knowledge. However, our bi-level framework optimizes model edits for the reasoning task in the outer loop, allowing the model to learn to quickly memorize knowledge that can support the model’s reasoning ability.

Language Models as Knowledge Bases Our work learns to reason by dynamically encoding contextual knowledge in the parameters of language models before answering questions about them. Previous studies have found that LLMs can store real-world facts learned during pre-training [5, 10, 11, 49, 63]. Learning these facts during pre-training allows language models to be prompted [38, 57, 70, 85] or adapted [8, 36, 37, 62] to produce these facts on-demand. However, LLM knowledge is latent and hard to identify or control. The model generation is sensitive to specific words or phrases. LLMs emit knowledge encoded in the parameters only when prompted appropriately [10, 18, 23, 56]. It is also difficult to inject or update knowledge for LLMs [49], and the memorization of knowledge in LLMs is not optimized toward their reasoning ability. In our work, we seek to find a way to add knowledge to LLMs in a controllable and adaptive way that can benefit downstream reasoning applications.

Authors:

(1) Zeming Chen, EPFL ([email protected]);

(2) Gail Weiss, EPFL ([email protected]);

(3) Eric Mitchell, Stanford University ([email protected])';

(4) Asli Celikyilmaz, Meta AI Research ([email protected]);

(5) Antoine Bosselut, EPFL ([email protected]).


This paper is available on arxiv under CC BY 4.0 DEED license.