a Python library that uses Reinforcement Learning (RL) to train LLMs.
retrainUpgrade Your LLM, The Simple Way.
retrain is a Python library that uses Reinforcement Learning (RL) to help your Large Language Models (LLMs) learn new skills, master specific tasks, or follow complex instructions. Our goal is to provide a flexible framework that can work with various RL backends (like TRL, Verl, etc.) in the future.
retrain?Imagine you want to teach your LLM to write a special kind of cat poem. Here's the basic idea:
You Give a Task (Prompt):
"Compose a short poem about a happy cat."The LLM Tries (Generation & Interaction):
Environment. In an environment, the LLM could use tools (like a search engine, or MCP) and take several steps to gather info before writing. This whole interactive attempt is called a rollout.The LLM Produces an Output:
You Judge the Result (Your Rules, Your Rewards):
retrain helps your LLM learn. You define what makes a good cat poem for your specific needs:
Is it actually a poem? Does it mention a cat? These are like hardcoded rules.The LLM Learns:
Typically, you'd describe your task and rules in a configuration file (often YAML):
# Conceptual Task: Happy Cat Poem (No "s"!)
# 1. Which LLM are you teaching?
model_to_train: "your-favorite-llm"
# 2. What's the overall task?
learning_task: "Write happy cat poems, specifically avoiding the letter 's'."
# 3. How do interactions start?
example_prompts:
- "Compose a joyful feline lyric."
- "Draft a cheerful ode to a purring companion."
# 4. Your rules for a good poem:
verifiers: # Basic pass/fail checks
- rule: "is_actually_a_poem"
- rule: "mentions_a_cat"
rewards: # Scoring the good and penalizing the unwanted
goal: "Poem is happy, creative, and about a cat."
constraints:
- type: "penalize_text_containing"
text_to_avoid: "s"
penalty_score: -100.0 # Make it really learn to avoid "s"!
(Remember: This is a simplified concept...)
Then, a small Python script kicks off the training:
import asyncio
from retrain import run
# Assume your_poem_task_setup is loaded from a YAML like above
# (using retrain.config_models.TrainingConfig.from_yaml(...).model_dump())
async def main(your_poem_task_setup):
print(f"Starting to train LLM for: {your_poem_task_setup.get('learning_task')}")
await run(config=your_poem_task_setup)
print("Training finished! Your LLM is now a better poet (hopefully without \"s\"s).")
# if __name__ == "__main__":
# conceptual_config = { ... }
# asyncio.run(main(conceptual_config))
With retrain, you can guide your LLMs to:
retrain for it!Ready to build something real or understand the nuts and bolts?
examples/ directory for practical, working code that shows full configurations.docs/guide.md.retrain/config_models.py.uv add retrain
This retrain library was initially developed over a focused 5-day period, primarily during free time after work. It began as an exploration of ideas, partly inspired by other projects in the community, and evolved into this standalone package. While a lot of effort has gone into making it useful, please approach it with the understanding that it's a young project. Your patience and feedback are highly appreciated!
The development of retrain has been significantly influenced by and builds upon the great work of others in the open-source community:
retrain initially started as an exploration (and even an early PR attempt) related to the concepts found in willccbb/verifiers, particularly around ensuring reliability in LLM interactions within environments. retrain represents my own path in exploring this approach to RL with LLMs.retrain aims to provide a higher-level framework that can leverage robust backends like TRL and upgrade it with additional features like environment, reward, and verifier.retrain is designed to be compatible with Unsloth, allowing for significantly faster training and reduced memory usage. I will explore even more of their features in the future.Thank you for trying out retrain! If you find it helpful, have ideas for improvement, or would like to contribute, your input is warmly welcomed. Community contributions are key to making this tool even better.
Contributions welcome! (TODO: Add CONTRIBUTING.md link)
No configuration available
Related projects feature coming soon
Will recommend related projects based on sub-categories