Omnibus
Semantic PRMs
December 2024
Most AI models are judged solely on their final answers, especially in natural language tasks. While Process Reward Models (PRMs) have proven effective in formal mathematical domains—where steps can be verified against axiomatic truths—their application to non-empirical tasks remains largely unexplored. Using LLaMA 3.1 (8B parameters), I developed an approach that extends PRM principles from mathematical proof verification to semantic reasoning tasks, specifically The New York Times' Connections game. The game presents players with a 4x4 grid of 16 words that must be categorized into four distinct groups of four words each, with each group sharing a common theme or relationship (e.g., "Types of Birds", "Words Ending in 'AT'", "Chess Terms"). The challenge lies in the semantic ambiguity—words can often appear to belong to multiple categories, requiring iterative hypothesis testing and backtracking, which makes it an ideal testbed for studying non-deterministic reasoning processes that combine elements of both logical deduction and semantic understanding. The experiment demonstrates how reward shaping can incentivize step-by-step reasoning in domains where "correctness" isn't binary. Unlike traditional approaches that optimize for end-state accuracy through metrics like F1-score or accuracy, this system implements a novel reward function R(s,a) that decomposes the solving process into verifiable sub-steps, where s represents the current state and a represents the action taken. Traditional PRM applications excel in domains with formal verification methods like theorem proving, symbolic mathematics, and logical inference chains where each step can be validated against established axioms. The innovation here lies in extending PRM methodology to semantic spaces where intermediate states can't be validated through formal logic alone. Consider the contrast: while mathematical PRMs involve formally verifiable steps through calculus rules with clear success metrics for intermediate states and deterministic validation, the proposed semantic PRM approach employs probabilistic state evaluation where P(category|words) ∈ [0,1], fuzzy set membership for word groupings, and Bayesian updating of confidence scores as context evolves. Mapping the puzzle-solving process step by step is valuable because it generalizes beyond this specific application. We already know stepwise explainability is critical in math and logic-based domains, but training an AI to reason about something as open-ended as grouping words by theme—where creativity and interpretation are crucial—shows how to handle reasoning in broader contexts like creative brainstorming, complex decision-making in law or finance, and medical diagnosis where systematic analysis of evidence is essential. The core challenge was adapting PRM frameworks—typically grounded in formal logic—to handle semantic uncertainty. While mathematical PRMs can rely on proof assistants like Coq or Lean for verification, semantic reasoning requires a more nuanced approach to intermediate state validation. The implications of extending PRMs beyond mathematical domains suggest a pathway toward more robust reasoning capabilities in language models, potentially impacting enhanced chain-of-thought generation (by decomposing complex reasoning tasks into verifiable sub-steps, reducing hallucinations and improving output consistency), cross-domain reasoning transfer (suggesting potential for transfer learning between formal and informal reasoning domains), and scalable architecture improvements (opening possibilities for architectural modifications in future language models, particularly in attention mechanisms and state representation layers). This experiment demonstrates how mathematical rigor typically associated with formal proof systems can be adapted for natural language tasks. By implementing PRM principles in the Connections puzzle domain, we've shown a potential pathway for improving model reasoning capabilities across both deterministic and probabilistic problem spaces. The key insight isn't just about solving puzzles—it's about developing more sophisticated reasoning architectures that can handle both formal logic and semantic uncertainty with equal rigor. As language models continue to evolve, this fusion of mathematical precision with semantic flexibility could become a crucial component in advancing their cognitive capabilities.
Rabbit-Hole
September 2024
Rabbit-Hole is an AI-driven learning tool that simulates the immersive experience of "going down a rabbit hole," guiding users through layered topics and questions adapting to user curiosity by continuously offering deeper insights and related content. The tool integrates with a text-to-Manim model, enabling visual learning through educational animations.
Hephaestus Robotics
December 2020 - June 2023
Led a team of 12 to design and build an underwater ROV (Remotely Operated Vehicle) for marine research and conservation. The ROV features advanced capabilities including 3D scanning of coral reefs, laser-based distance measurement, and precision manipulation tools.