1.0 Overview
aptselect is a local-first environment for prompt engineering. It is organized into three primary views:
- Home View: Your dashboard for initiating new prompts and checkpoints.
- LLM Task Explorer: The retrospective interface for auditing and grading individual, ad-hoc prompt executions across multiple models side-by-side.
- Eval Task Explorer: The analytics interface for reviewing batch dataset evaluations, tracking pass rates, and comparing aggregate model performance via leaderboards.
- Provider View: Where you connect LLMs (Anthropic, Gemini, MISTRAL, OpenAI, XAI) and manage keys.
2.0 Managing Providers (Provider View)
Before running prompts, you must configure your model providers. aptselect stores all keys locally in SQLite database.
2.1 Adding a Provider
- Navigate to the Provider View (Network Icon).
- Select a service (e.g., OpenAI, Anthropic).
- Paste your API Key. The connection is tested immediately.
- Note: You can enable/disable specific models (e.g., turn off
gpt-3.5-turboif you only want to testgpt-4o).
3.0 The Prompt Explorer
This is where you build and test. aptselect treats prompts like code, automatically saving checkpoints so you never lose an iteration.
3.1 Variables & Templating
You can inject dynamic data into your prompts using double curly braces (e.g., {{user_input}}). The sidebar inspector will automatically detect these variables so you can provide test
values before running the prompt.
3.2 Evaluations & Datasets
Instead of testing one prompt at a time, you can upload CSV datasets to run bulk evaluations. The app will automatically grade the outputs against your reference criteria and generate a leaderboard comparing model performance and latency.
4.0 The LLM Task Explorer
The LLM Task Explorer is your archive for ad-hoc prompt testing. It allows you to review every manual prompt you have ever run.
- Side-by-Side Comparison: Review how different models answered the exact same prompt simultaneously.
- Developer Details: Click any past response to inspect the raw JSON API response, exact token usage (Input/Output), and execution duration.
- Curation: Mark specific outputs as "Good" or "Bad", or bookmark them to build a dataset of high-quality responses.
5.0 The Eval Task Explorer
The Eval Task Explorer houses your quantitative benchmarking data.
- Leaderboards: Compare models head-to-head on pass rates, average token consumption, and response latency across hundreds of test cases.
- Failure Analysis: Drill down into specific failed test cases to identify consistent edge-case failures or formatting breaks for specific models.
6.0 Troubleshooting
6.1 "Provider Not Available"
If a model is unavailable when trying to run a task, return to the Providers view to ensure your API key is valid and the specific model is toggled "ON".
6.2 Database Reset
If you need to completely reset your local data, you can find your encrypted SQLite database file here:
%APPDATA%\aptselect\aptselect.dbMac:
~/Library/Application Support/aptselect/aptselect.dbLinux:
~/.config/aptselect/aptselect.db