Omnibus

Semantic PRMs

Research Project

December 2024

Most AI models are judged solely on their final answers, especially in natural language tasks. While Process Reward Models (PRMs) have proven effective in formal mathematical domains—where steps can be verified against axiomatic truths—their application to non-empirical tasks remains largely unexplored. Using LLaMA 3.1 (8B parameters), I developed an approach that extends PRM principles from mathematical proof verification to semantic reasoning tasks, specifically The New York Times' Connections game. The game presents players with a 4x4 grid of 16 words that must be categorized into four distinct groups of four words each, with each group sharing a common theme or relationship (e.g., "Types of Birds", "Words Ending in 'AT'", "Chess Terms"). The challenge lies in the semantic ambiguity—words can often appear to belong to multiple categories, requiring iterative hypothesis testing and backtracking, which makes it an ideal testbed for studying non-deterministic reasoning processes that combine elements of both logical deduction and semantic understanding. The experiment demonstrates how reward shaping can incentivize step-by-step reasoning in domains where "correctness" isn't binary. Unlike traditional approaches that optimize for end-state accuracy through metrics like F1-score or accuracy, this system implements a novel reward function R(s,a) that decomposes the solving process into verifiable sub-steps, where s represents the current state and a represents the action taken. Traditional PRM applications excel in domains with formal verification methods like theorem proving, symbolic mathematics, and logical inference chains where each step can be validated against established axioms. The innovation here lies in extending PRM methodology to semantic spaces where intermediate states can't be validated through formal logic alone. Consider the contrast: while mathematical PRMs involve formally verifiable steps through calculus rules with clear success metrics for intermediate states and deterministic validation, the proposed semantic PRM approach employs probabilistic state evaluation where P(category|words) ∈ [0,1], fuzzy set membership for word groupings, and Bayesian updating of confidence scores as context evolves. Mapping the puzzle-solving process step by step is valuable because it generalizes beyond this specific application. We already know stepwise explainability is critical in math and logic-based domains, but training an AI to reason about something as open-ended as grouping words by theme—where creativity and interpretation are crucial—shows how to handle reasoning in broader contexts like creative brainstorming, complex decision-making in law or finance, and medical diagnosis where systematic analysis of evidence is essential. The core challenge was adapting PRM frameworks—typically grounded in formal logic—to handle semantic uncertainty. While mathematical PRMs can rely on proof assistants like Coq or Lean for verification, semantic reasoning requires a more nuanced approach to intermediate state validation. The implications of extending PRMs beyond mathematical domains suggest a pathway toward more robust reasoning capabilities in language models, potentially impacting enhanced chain-of-thought generation (by decomposing complex reasoning tasks into verifiable sub-steps, reducing hallucinations and improving output consistency), cross-domain reasoning transfer (suggesting potential for transfer learning between formal and informal reasoning domains), and scalable architecture improvements (opening possibilities for architectural modifications in future language models, particularly in attention mechanisms and state representation layers). This experiment demonstrates how mathematical rigor typically associated with formal proof systems can be adapted for natural language tasks. By implementing PRM principles in the Connections puzzle domain, we've shown a potential pathway for improving model reasoning capabilities across both deterministic and probabilistic problem spaces. The key insight isn't just about solving puzzles—it's about developing more sophisticated reasoning architectures that can handle both formal logic and semantic uncertainty with equal rigor. As language models continue to evolve, this fusion of mathematical precision with semantic flexibility could become a crucial component in advancing their cognitive capabilities.

AILLMNLP
↓ More

A Hitchhiker's Guide to X

xAI Hackathon

October 2024

xAI Hackathon Winner; Grok's embedding model transforms Twitter/X posts into a personalized, ever-expanding knowledge graph that lets users explore ideas and insights in depth, sparking new inspirations.

AILLMKnowledge Graphs
↓ More

PlanForm

Venture

June 2024 - September 2024

Empowering educators with AI-driven, one click personalization, PlanForm transforms students' learning experiences through dynamic personalization of activities powered by fine-tuned LLMs.

AILLMEdTech
↓ More

Rabbit-Hole

HackMIT

September 2024

Rabbit-Hole is an AI-driven learning tool that simulates the immersive experience of "going down a rabbit hole," guiding users through layered topics and questions adapting to user curiosity by continuously offering deeper insights and related content. The tool integrates with a text-to-Manim model, enabling visual learning through educational animations.

AILLMEdTech
↓ More

Hephaestus Robotics

MATE ROV

December 2020 - June 2023

Led a team of 12 to design and build an underwater ROV (Remotely Operated Vehicle) for marine research and conservation. The ROV features advanced capabilities including 3D scanning of coral reefs, laser-based distance measurement, and precision manipulation tools.

AIRoboticsComputer Vision
↓ More