✅ Overview
Scimax VS Code includes a powerful database and search system that indexes your org, markdown, and Jupyter notebook files. The database provides:
Full-text search with FTS5 (SQLite's Full-Text Search) and BM25 ranking
Semantic search using vector embeddings for meaning-based queries
Hybrid search combining keyword and semantic approaches
Advanced search with query expansion, weighted RRF, and LLM reranking (SOTA)
Structured queries for headings, TODOs, tags, properties, and links
Agenda views for scheduled items and deadlines
Code block search filtered by programming language
The database is built on SQLite (via @libsql/client) with support for vector similarity search, making it both fast and capable of sophisticated semantic queries.
✅ What Gets Indexed
✅ File Types
The database automatically indexes three types of files:
`.org' files - Org-mode documents
`.md' files - Markdown documents
`.ipynb' files - Jupyter notebooks
✅ Indexed Content
For each file, the database extracts and indexes:
✅ Headings
Heading text and level (*, **, ***, etc.)
TODO states (TODO, DONE, IN-PROGRESS, etc.)
Priority markers ([#A], [#B], [#C])
Tags (both direct and inherited)
Properties (CUSTOMID, CATEGORY, etc.)
Scheduling information (SCHEDULED, DEADLINE, CLOSED)
Line numbers for navigation
✅ Source Blocks
Programming language
Complete code content
Header arguments (:results, :exports, etc.)
Line numbers
For notebooks: cell indices
✅ Links
Link type (file, http, https, id, etc.)
Target path or URL
Optional description text
Line number
✅ Full Text
Complete document content for full-text search
Indexed with FTS5 virtual tables
Porter stemming and Unicode normalization
Here is a hashtag #FullTextSearch
✅ Text Chunks (for Semantic Search)
Document divided into ~2000 character chunks
3-line overlap between chunks for context
Vector embeddings (if embedding service configured)
Line ranges for each chunk
✅ File Watching
The database automatically watches for file changes:
New files are indexed when created
Modified files are re-indexed (debounced with 500ms delay)
Deleted files are removed from the index
Changes are queued and processed sequentially
✅ Ignore Patterns
By default, the following patterns are ignored:
**/node_modules/**
**/.git/**
**/dist/**
**/build/**
**/.ipynb_checkpoints/**
Configure additional patterns in settings: scimax.db.exclude
✅ Full-Text Search
✅ Overview
Full-text search uses SQLite's FTS5 (Full-Text Search version 5) with:
BM25 ranking - Industry-standard relevance scoring
Porter stemming - Matches word variations (e.g., "run" matches "running")
Unicode normalization - Handles accented characters correctly
Snippet generation - Shows matching context with highlighting
✅ Usage
✅ Command
Run Scimax: Search All Files (FTS5) or use command scimax.db.search
✅ Query Syntax
FTS5 supports rich query syntax:
✅ Basic Queries
machine learning # Match both words (in any order)
"machine learning" # Match exact phrase
neural OR artificial # Match either word
Boolean Operators
python AND jupyter # Must contain both
python NOT tensorflow # Contains python but not tensorflow
deep OR machine learning # Contains "deep" or the phrase "machine learning"
Prefix Matching
comput* # Matches: computer, computing, computation
data* # Matches: data, database, dataset
Column Queries
title: introduction # Search only in titles
content: python # Search only in content
Proximity Queries
NEAR(neural network, 5) # Words within 5 tokens of each other
Results
Search results include:
File path and basename
Line number
Preview snippet with <mark> tags highlighting matches
BM25 relevance score
Up to 100 results (configurable)
Example Searches
# Find all references to "gradient descent"
gradient descent
# Find Python-related TODO items
TODO python
# Find documents about data science or machine learning
"data science" OR "machine learning"
# Find recent mentions of TensorFlow (excluding Keras)
tensorflow NOT keras
# Find all documents with "introduction" in title
title: introduction
✅ Semantic Search
✅ Overview
Semantic search finds content by meaning rather than exact words. It uses vector embeddings to represent text in a high-dimensional space where semantically similar content is close together.
Benefits:
Find content even when using different words
Discover related concepts
More natural query language
Example: Searching for "machine learning algorithms" will also find documents about "neural networks", "deep learning models", and "classification methods" even if they don't contain those exact words.
✅ Requirements
Semantic search requires:
Vector search support in libsql - The database must support vector operations
Embedding provider configured - Ollama must be running locally
Files indexed with embeddings - Run scimax.db.reindex after configuring
Checking Availability
Run Scimax: Show Database Stats (scimax.db.stats) to see:
Whether vector search is supported
Number of embeddings stored
Any error messages
If vector search is unavailable, the stats will show:
Semantic search: Unavailable (vector search not supported)
Fallback to Full-Text Search
If semantic search is unavailable, use full-text search (`scimax.db.search') instead. FTS5 is always available and very fast for keyword-based queries.
Embedding Provider
Scimax uses Ollama for embeddings:
Ollama
Pros: Free, private, local control, high quality
Cons: Requires Ollama installed and running
Models:
Setup: Install Ollama, then ollama pull nomic-embed-text
✅ Configuration
✅ Interactive Setup
Run Scimax: Configure Embedding Service (scimax.db.configureEmbeddings)
This wizard will:
Let you choose a provider
Select a model
Test the connection
Update your settings
Prompt you to reindex files
✅ Manual Configuration
Add to your VS Code settings (settings.json):
Ollama Configuration
{
"scimax.db.embeddingProvider": "ollama",
"scimax.db.ollamaUrl": "http://localhost:11434",
"scimax.db.ollamaModel": "nomic-embed-text"
}
✅ Usage
✅ Command
Run Scimax: Semantic Search or use command scimax.db.searchSemantic
✅ Query Examples
Semantic search uses natural language:
# Conceptual searches
explain neural network architecture
how to optimize database queries
project management best practices
# Related concept discovery
clustering algorithms # Finds: k-means, hierarchical, DBSCAN
data visualization # Finds: matplotlib, charts, plots, graphs
# Question answering
what is gradient descent
how does attention mechanism work
Results
Results include:
File path and line number
Preview (first 200 characters of chunk)
Similarity score (0-100%, higher is more relevant)
Cosine distance (lower is more similar)
Up to 20 results by default
Reindexing for Semantic Search
After configuring an embedding provider, you must reindex your files:
Run Scimax: Reindex Files (scimax.db.reindex)
Wait for indexing to complete
The database will generate embeddings for all text chunks
Semantic search will now be available
Hybrid Search
Overview
Hybrid search combines full-text (keyword) and semantic (vector) search using Reciprocal Rank Fusion (RRF). This approach:
Gets the best of both worlds
Balances exact matches with conceptual similarity
Provides more robust results
Usage
Command
Run Scimax: Hybrid Search or use command scimax.db.searchHybrid
How It Works
Query runs through both FTS5 and vector search
Results from each are ranked
RRF algorithm combines rankings:
Final results sorted by combined score
Weights
Default weights are 50/50, but can be adjusted:
// Internal API (for extension developers)
db.searchHybrid(query, {
limit: 20,
ftsWeight: 0.5, // 50% weight on keywords
vectorWeight: 0.5 // 50% weight on semantics
});
When to Use Hybrid
Default choice for general searches
When you want both exact matches and related concepts
When query has both specific terms and broad concepts
Example: "python sklearn classification accuracy" - finds both sklearn-specific code and general ML accuracy discussions
Results
Results include:
Combined ranking score
Source indicator (Keywords, AI, or both)
File location and preview
Up to 20 results
✅ Advanced Search (SOTA Pipeline)
Overview
Advanced search implements a state-of-the-art (SOTA) search pipeline inspired by modern search engines like qmd. It combines multiple techniques for maximum recall and precision:
Query Expansion - Generates alternative query formulations
Parallel Retrieval - Runs FTS5 and vector search concurrently
Weighted Reciprocal Rank Fusion (RRF) - Combines results intelligently
LLM Reranking - Uses AI to improve final ranking (optional)
Usage
Command
Run Scimax: Advanced Search ([[cmd:scimax.db.searchAdvanced]])
When to Use
Complex research queries
When you want maximum recall
When hybrid search isn't finding relevant content
For important searches where accuracy matters more than speed
Query Expansion
Query expansion improves recall by generating alternative formulations of your query.
Pseudo-Relevance Feedback (PRF)
How it works:
Runs initial search with original query
Extracts key terms from top 5 results
Creates expanded query with additional terms
Searches again with expanded query
Example:
Original: "machine learning"
Expanded: "machine learning neural network training models"
LLM Query Expansion
How it works:
Sends query to LLM (e.g., qwen3:1.7b)
LLM generates 3 alternative phrasings
All variants searched in parallel
Original query gets 2× weight in ranking
Example:
Original: "database optimization"
Variants:
- "improve SQL query performance"
- "speed up database queries"
- "index tuning for databases"
Weighted RRF
Standard RRF assigns scores based on rank position. Advanced search enhances this with:
Position Bonuses
Top-ranked results get bonuses:
Rank 1: +15% bonus
Rank 2: +10% bonus
Rank 3: +5% bonus
Original Query Weight
The original query's results receive 2× weight compared to expanded query results.
Score Normalization
Different backends produce different score ranges:
BM25: Negative values (normalized to 0-1)
Vector: Cosine distance (converted to similarity)
All scores normalized before fusion
LLM Reranking
How It Works
Takes top 30 candidates from RRF fusion
LLM scores each document's relevance (0-10)
Scores blended with retrieval scores using position-aware weights
Position-Aware Blending
High-confidence retrieval matches are preserved:
Ranks 1-3: 75% retrieval, 25% reranker
Ranks 4-10: 60% retrieval, 40% reranker
Ranks 11+: 40% retrieval, 60% reranker
Performance Considerations
Reranking adds latency (~1-2 seconds for 30 documents). Disable for fast searches.
Capabilities Check
Run Scimax: Show Search Capabilities (scimax.db.searchCapabilities) to see:
✓ Full-Text Search (FTS5/BM25) - Available
✓ Semantic/Vector Search - Available (Ollama)
✓ Query Expansion (PRF) - Available (no LLM required)
✗ Query Expansion (LLM) - Unavailable - check Ollama
✗ LLM Reranking - Unavailable - pull qwen3:0.6b
Graceful Degradation
Advanced search works without all features:
| If Unavailable | Fallback |
|---|---|
| Vector search | FTS-only |
| LLM expansion | PRF-only |
| Reranking | Skip reranking |
| All LLM features | Equivalent to hybrid search |
Configuration
Search Mode
{
"scimax.search.defaultMode": "hybrid", // or "fast", "semantic", "advanced"
"scimax.search.defaultLimit": 20
}
Query Expansion
{
"scimax.search.queryExpansion.enabled": true,
"scimax.search.queryExpansion.method": "prf", // or "llm", "both"
"scimax.search.queryExpansion.prfTopK": 5,
"scimax.search.queryExpansion.prfTermCount": 5,
"scimax.search.queryExpansion.llmModel": "qwen3:1.7b"
}
Reranking
{
"scimax.search.reranking.enabled": false, // Enable for better accuracy
"scimax.search.reranking.model": "qwen3:0.6b",
"scimax.search.reranking.topK": 30,
"scimax.search.reranking.usePositionBlending": true
}
Hybrid Weights
{
"scimax.search.hybrid.ftsWeight": 0.5,
"scimax.search.hybrid.vectorWeight": 0.5,
"scimax.search.hybrid.usePositionBonus": true,
"scimax.search.hybrid.k": 60 // RRF constant
}
Caching
{
"scimax.search.caching.enabled": true,
"scimax.search.caching.ttlSeconds": 900, // 15 minutes
"scimax.search.caching.maxEntries": 500
}
Setting Up LLM Features
To enable query expansion and reranking with Ollama:
Install Ollama: https://ollama.ai
Start Ollama:
ollama servePull models:
Enable in settings:
Performance Comparison
| Mode | Speed | Recall | Precision | When to Use |
|---|---|---|---|---|
| Fast | <50ms | Low | High | Exact matches |
| Semantic | ~200ms | High | Medium | Conceptual queries |
| Hybrid | ~300ms | High | High | General purpose |
| Advanced | 1-3s | Highest | Highest | Important searches |
Structured Queries
Beyond free-text search, the database supports structured queries for specific content types.
Heading Search
Search specifically in headings, with optional filtering:
Command
Scimax: Search Headings (scimax.db.searchHeadings)
Features
Search by heading text
Filter by TODO state
Filter by tag
Shows heading level, tags, TODO state
Displays deadlines and scheduled dates
Example Use Cases
# Find all headings about Python
python
# Find TODOs with specific tag (use Tag Search)
:work:
# Browse document structure
<empty query>
Tag Search
Search headings by org-mode tags:
Command
Scimax: Search By Tag (scimax.db.searchByTag)
Features
Lists all tags found in indexed files
Shows heading count per tag
Supports both direct and inherited tags
Tags displayed as :tagname:
Property Search
Search headings by property drawer values:
Command
Scimax: Search By Property (scimax.db.searchByProperty)
Common Properties
CUSTOM_ID # Unique identifiers for linking
ID # Auto-generated UUIDs
CATEGORY # Classification
CREATED # Creation timestamp
MODIFIED # Last modification time
Examples
# Find all entries with CATEGORY property
Property: CATEGORY
Value: <empty>
# Find entries with specific CATEGORY
Property: CATEGORY
Value: research
# Find entries with CUSTOM_ID
Property: CUSTOM_ID
Value: <empty or specific>
TODO Search
Browse and filter TODO items:
Command
Scimax: Show TODOs (scimax.db.showTodos)
Features
Lists all TODO items across workspace
Filter by state (TODO, DONE, IN-PROGRESS, etc.)
Shows priority, tags, scheduling
Excludes DONE and CANCELLED by default in other views
Common TODO States
TODO # Not started
IN-PROGRESS # Currently working on
NEXT # Up next
WAIT # Waiting on something
DONE # Completed
CANCELLED # Abandoned
✅ Source Block Search
Search code blocks by language:
✅ Command
Scimax: Search Code Blocks (scimax.db.searchBlocks). [[cmd:scimax.db.searchBlocks]]
✅ Features
Filter by programming language
Optional text search within code
Shows first line of code as preview
Includes both org files and notebook cells
✅ Example Workflow
1. Select language (or "All languages")
2. Optionally enter search text
3. Browse matching code blocks
4. Jump to file location
✅ Hashtag Search
Find files by inline hashtags:
✅ Command
Scimax: Search Hashtags (scimax.db.searchHashtags). [[cmd:scimax.db.searchHashtags]]
✅ Features
Lists all hashtags found (`#tagname')
Shows file count per hashtag
Displays files containing selected hashtag
Case-insensitive matching
✅ Hashtag Format
# In your documents
This is a #research note about #machinelearning
# Database indexes as
research
machinelearning
✅ File Browser
Browse all indexed files:
✅ Command
Scimax: Browse Indexed Files (scimax.db.browseFiles). [[cmd:scimax.db.browseFiles]]
✅ Features
Lists all files in database
Shows last indexed date
Sorted by most recently indexed
Displays file type (org, md, ipynb)
✅ Agenda and Time Management
✅ Agenda View
The agenda shows scheduled items and deadlines:
✅ Command
Scimax: Show Agenda (scimax.db.agenda). [[cmd:scimax.db.agenda]]
✅ Time Periods
Next 2 weeks (default)
Next month
Next 3 months
All items (no time limit)
✅ Options
Include unscheduled TODOs
Filter by time range
Sorted by urgency (overdue first)
✅ Item Types
✅ Deadline
Items with DEADLINE timestamps:
,* TODO Submit report
DEADLINE: <2026-01-20 Mon>
✅ Scheduled
Items with SCHEDULED timestamps:
,* TODO Team meeting
SCHEDULED: <2026-01-15 Wed 14:00>
✅ Unscheduled TODOs
Items with TODO state but no scheduling
✅ Deadline View
Show only upcoming deadlines:
Command
Scimax: Show Deadlines (scimax.db.deadlines). [[cmd:scimax.db.deadlines]]
✅ Features
Next 2 weeks of deadlines
Overdue items highlighted
Shows days until deadline
Excludes DONE and CANCELLED
Display Format
⚠️ Overdue: Submit TPS Report (3 days ago)
🔔 Today: Code Review
🔔 Tomorrow: Documentation Update
🔔 In 5 days: Project Demo
Date Formats
Scheduling in Org Files
# Simple date
SCHEDULED: <2026-01-20>
# Date with time
SCHEDULED: <2026-01-20 Mon 14:00>
# Date with time range
SCHEDULED: <2026-01-20 Mon 14:00-16:00>
# Deadline with warning period
DEADLINE: <2026-01-20 Mon -3d>
# Closed timestamp
CLOSED: [2026-01-13 Mon 10:30]
Relative Dates
+2w # 2 weeks from now
+1m # 1 month from now
+3d # 3 days from now
+1y # 1 year from now
Search Scope
Limit searches to specific directories:
Commands
Scimax: Set Search Scope (scimax.db.setScope)
Scope Types
All Files (Default)
Searches entire indexed database
Includes all workspace folders
Includes additional configured directories
Current Directory
Limits to active file's directory
Includes subdirectories
Useful for project-focused searches
Current Scope Indicator
The current scope is shown when setting scope:
Search scope: all
Search scope: directory (my-project)
Database Management
Reindexing
Full Reindex
Command: Scimax: Reindex Files (scimax.db.reindex). [[cmd:scimax.db.reindex]]
Scans all workspace folders
Checks file modification times
Only reindexes changed files
Shows progress notification
Reports statistics on completion
✅ Auto-Indexing
{
"scimax.db.autoIndex": true
}
Warning: Disable for very large workspaces (>10,000 files) to prevent memory issues.
✅ Indexing Sources
By default, the database indexes:
Journal directory (
scimax.db.includeJournal: true)Workspace folders (
scimax.db.includeWorkspace: true)Scimax projects (
scimax.db.includeProjects: true)
Add additional directories with scimax.db.include:
{
"scimax.db.include": [
"/home/user/research",
"/home/user/notes",
"~/Documents/org"
]
}
Optimization
✅ Command
Scimax: Optimize Database (scimax.db.optimize). [[cmd:scimax.db.optimize]]
Operations
Removes entries for deleted files
Runs VACUUM to reclaim space
Rebuilds indexes for performance
Should be run periodically (monthly)
Clearing Database
Command
Scimax: Clear Database (scimax.db.clear)
Warning
This is destructive and requires confirmation:
Removes all indexed data
Clears embeddings
Resets statistics
Requires full reindex to restore
When to Clear
Database corruption
Major schema changes
Troubleshooting issues
Fresh start needed
Statistics
Command
Scimax: Show Database Stats (scimax.db.stats)
Information Displayed
Scimax DB: 127 files (98 org, 23 md, 6 ipynb),
1,234 headings, 456 code blocks, 789 links.
Semantic search: Enabled (243 chunks).
Last indexed: 2026-01-13 14:30:00
Stats Include
File count by type
Heading count
Code block count
Link count
Chunk count (for semantic search)
Embedding status
Last index timestamp
Performance Considerations
Indexing Performance
File Size
Small files (<100KB): ~10-50ms
Medium files (100KB-1MB): ~50-200ms
Large files (>1MB): ~200ms-1s
Batch Indexing
100 small files: ~2-5 seconds
1,000 small files: ~20-60 seconds
With embeddings: 2-5x slower
Optimization Tips
Use ignore patterns for large non-content directories
Disable auto-indexing for huge workspaces
Index incrementally (only changed files)
Run optimization monthly
Search Performance
Full-Text Search (FTS5)
Query time: 10-50ms (typical)
Scales well to 10,000+ files
BM25 scoring is highly optimized
Results returned in rank order
Semantic Search
Query time: 50-500ms depending on provider
Local embeddings: slower but private
Ollama: moderate speed
OpenAI: fastest but requires network
Hybrid Search
Query time: Combined FTS + vector time
Typically 100-600ms
Runs searches in parallel
RRF fusion adds ~10ms
Database Size
Typical Sizes
100 files: ~5-10 MB
1,000 files: ~50-100 MB
10,000 files: ~500 MB-1 GB
With Embeddings
+50-100% size increase for chunks and vectors
384-dim embeddings: ~1.5 KB per chunk
768-dim embeddings: ~3 KB per chunk
1536-dim embeddings: ~6 KB per chunk
Memory Usage
Indexing
Base: ~50-100 MB
Peak during large batch: ~200-500 MB
Embedding generation: +100-300 MB
Searching
FTS5: ~10-50 MB
Vector search: ~50-200 MB (loads embeddings)
Minimal memory footprint when idle
Scaling Guidelines
Small Workspace (<100 files)
Enable auto-indexing
Use any embedding provider
Full reindex in seconds
Medium Workspace (100-1,000 files)
Enable auto-indexing
Local or Ollama embeddings recommended
Full reindex in under a minute
Large Workspace (1,000-10,000 files)
Consider disabling auto-indexing
Ollama embeddings recommended
Reindex incrementally
Very Large Workspace (>10,000 files)
Disable auto-indexing (manual reindex)
Use selective directory indexing
Consider multiple smaller databases
Ollama with a fast model recommended
Configuration Reference
Database Settings
scimax.db.includeJournal
Type: boolean
Default: true
Include journal directory in database indexing.
scimax.db.includeWorkspace
Type: boolean
Default: true
Include workspace folders in database indexing.
scimax.db.includeProjects
Type: boolean
Default: true
Include all scimax projects in database indexing.
scimax.db.include
Type: string[]
Default: []
Additional directories or files to index (supports ~ for home directory).
{
"scimax.db.include": [
"/home/user/notes",
"~/Documents/research"
]
}
scimax.db.exclude
Type: string[]
Default: ["**/node_modules/**", "**/.git/**", "**/dist/**", "**/build/**"]
Patterns or paths to exclude from indexing (globs and absolute paths).
{
"scimax.db.exclude": [
"**/node_modules/**",
"**/.git/**",
"**/dist/**",
"**/temp/**",
"**/*.backup.org",
"~/notes/scratch.org"
]
}
`scimax.db.autoIndex'
Type: boolean Default: false
Automatically index workspace on activation. Disable for large workspaces.
{
"scimax.db.autoIndex": true
}
Embedding Settings
`scimax.db.embeddingProvider'
Type: enum Values: "none" | "ollama" Default: "ollama"
Embedding provider for semantic search.
{
"scimax.db.embeddingProvider": "ollama"
}
`scimax.db.ollamaUrl'
Type: string Default: "http://localhost:11434"
Ollama server URL.
{
"scimax.db.ollamaUrl": "http://localhost:11434"
}
`scimax.db.ollamaModel'
Type: string Default: "nomic-embed-text"
Ollama embedding model name.
{
"scimax.db.ollamaModel": "nomic-embed-text"
}
Command Reference
Search Commands
| Command | Description |
|---|---|
| scimax.db.search | Full-text search (FTS5) |
| scimax.db.searchSemantic | Semantic search (vector) |
| scimax.db.searchHybrid | Hybrid search (FTS + vector) |
| scimax.db.searchAdvanced | Advanced search (full pipeline) |
| scimax.db.searchCapabilities | Show search capabilities |
| scimax.db.searchHeadings | Search headings |
| scimax.db.searchByTag | Search by org tag |
| scimax.db.searchByProperty | Search by property value |
| scimax.db.searchBlocks | Search code blocks |
| scimax.db.searchHashtags | Search by hashtag |
View Commands
| Command | Description |
|---|---|
| scimax.db.showTodos | Show TODO items |
| scimax.db.agenda | Show agenda |
| scimax.db.deadlines | Show upcoming deadlines |
| scimax.db.browseFiles | Browse indexed files |
Management Commands
| Command | Description |
|---|---|
| scimax.db.reindex | Reindex all files |
| scimax.db.optimize | Optimize database |
| scimax.db.clear | Clear database |
| scimax.db.stats | Show database statistics |
| scimax.db.setScope | Set search scope |
| scimax.db.configureEmbeddings | Configure embedding service |
| scimax.db.backup | Backup database to file |
| scimax.db.restore | Restore database from file |
| scimax.db.rebuild | Rebuild database completely |
| scimax.db.verify | Verify database integrity |
✅ Database Maintenance
✅ Backup and Restore
The database can be backed up and restored to prevent data loss and enable migration between machines.
✅ Backup
Command: Scimax: Backup Database (scimax.db.backup)
Creates a portable backup file containing:
All indexed file paths (not file contents)
Project information
Database metadata
Backup is stored in JSON format for portability.
# Example backup location
~/.scimax/backup-2026-01-22.json
✅ Restore
Command: Scimax: Restore Database (scimax.db.restore)
Restores database from a backup file:
Imports project list
Queues files for reindexing
Preserves original creation timestamps
Note: Actual file content must still be reindexed after restore.
✅ Database Rebuild
Command: Scimax: Rebuild Database (scimax.db.rebuild)
Completely rebuilds the database from scratch:
Drops and recreates all tables
Re-scans all configured directories
Regenerates all indexes
Regenerates embeddings (if configured)
Use when:
Database appears corrupted
Major schema changes after update
Switching embedding providers
Performance issues after many incremental updates
Options
| Option | Description |
|---|---|
| Full rebuild | Complete reindex of all files |
| Projects only | Only rebuild project table |
✅ Database Verification
Command: Scimax: Verify Database (scimax.db.verify)
Checks database integrity and freshness:
Checks Performed
File existence - Verifies indexed files still exist on disk
Modification time - Detects files modified since indexing
Index integrity - Validates FTS5 and vector indexes
Project validity - Checks project directories exist
Result Format
Database Verification Results:
- Total files: 127
- Missing files: 2
- Stale files: 5
- Projects: 8 (7 valid, 1 missing)
- Status: NEEDS_REINDEX
Status Values
| Status | Meaning |
|---|---|
| OK | Database is current and valid |
| NEEDSREINDEX | Some files are stale or missing |
| CORRUPTED | Index integrity check failed |
✅ Project Integration
The database now stores project information, integrated with the Projectile project manager:
Benefits
Projects persist across VS Code restarts
Shared project list between Projectile and Database
Fast project switching using indexed data
Projects can be associated with indexed files
Project Commands
Projects are managed through Projectile commands (C-c p), but the database provides the persistence layer.
See Projectile for project management commands.
Troubleshooting
Semantic Search Not Working
Problem
Semantic search returns no results, shows "unavailable", or displays an error.
First: Check Vector Search Support
Run Scimax: Show Database Stats (scimax.db.stats) to see the semantic search status:
| Status Message | Meaning |
|---|---|
| Semantic search: Enabled (N chunks) | Working, N chunks with embeddings |
| Semantic search: Ready (no embeddings) | Supported, but no provider configured |
| Semantic search: Unavailable (error) | Vector search not supported by database |
Embedding Provider Issues
If vector search is supported but not working:
Check embedding provider is configured: scimax.db.embeddingProvider
Test connection: Run Scimax: Configure Embedding Service
Ensure files are reindexed after configuring embeddings
Check console for errors (Help: Toggle Developer Tools)
Local Provider Issues
First use downloads model (~30MB), wait for completion
Check extension cache directory has write permissions
Try different model if one fails
Ollama Issues
Ensure Ollama is running: ollama serve
Pull model: ollama pull nomic-embed-text
Check URL is correct in settings
Test connection: curl http://localhost:11434/api/embeddings
Search Returns No Results
Problem
Searches return empty results despite having files.
Solutions
Run Scimax: Show Database Stats to check file count
If files 0, run Scimax: Reindex Files
Check file extensions (.org, .md, .ipynb)
Verify files aren't in ignored directories
Check search scope is set to "All files"
Slow Indexing
Problem
Indexing takes very long or appears stuck.
Solutions
Check workspace size (number of files)
Add ignore patterns for large non-content directories
Disable embedding generation if not needed
Index directories incrementally
Check disk I/O and available memory
Database Corruption
Problem
Errors mentioning "database is locked" or "disk I/O error".
Solutions
Close other VS Code windows accessing same workspace
Restart VS Code
Run Scimax: Clear Database and reindex
Check disk space is available
Verify database file permissions
High Memory Usage
Problem
VS Code uses excessive memory during indexing or searching.
Solutions
Disable auto-indexing
Reduce number of indexed directories
Use more aggressive ignore patterns
Clear and reindex database
Restart VS Code between large indexing operations
Examples and Workflows
Research Paper Management
# Organize papers with properties
,* TODO Read: Attention Is All You Need
:PROPERTIES:
:CUSTOM_ID: vaswani2017attention
:AUTHOR: Vaswani et al.
:YEAR: 2017
:CATEGORY: research
:END:
#transformers #attention #nlp
SCHEDULED: <2026-01-15 Wed>
# Search by property
Property: CATEGORY
Value: research
# Search by hashtag
#nlp
# Semantic search
transformer architecture papers
Project Todo Management
# Use tags for organization
,* TODO Implement login feature :work:backend:
DEADLINE: <2026-01-20 Mon>
,* TODO Write API documentation :work:docs:
SCHEDULED: <2026-01-18 Sat>
# Search by tag
:work: -> shows all work items
:backend: -> shows backend tasks
# View agenda
Next 2 weeks -> prioritized by deadline
# Search TODOs
Filter by: TODO (in progress)
Code Snippet Library
# Store reusable code blocks
,* Data Processing Utils
,#+BEGIN_SRC python
def normalize_data(df):
"""Normalize numeric columns"""
return (df - df.mean()) / df.std()
,#+END_SRC
# Search code blocks
Language: python
Query: normalize
# Or semantic search
how to standardize dataframe columns
Personal Knowledge Base
# Use hybrid search for discovery
Query: "improve code performance"
Results will include:
- Exact matches: "code performance" articles
- Related concepts: optimization, profiling, caching
- Similar topics: algorithm efficiency, memory management
# Use properties for metadata
:PROPERTIES:
:CREATED: [2026-01-13 Mon 10:00]
:MODIFIED: [2026-01-13 Mon 15:30]
:CATEGORY: programming
:END:
Best Practices
Indexing Strategy
Start selective - Index specific directories first
Use ignore patterns - Exclude build artifacts and dependencies
Index incrementally - Don't reindex everything on changes
Schedule optimization - Run monthly for large databases
Monitor statistics - Check file counts and sizes regularly
Search Strategy
Start broad - Use semantic or hybrid search for exploration
Refine with keywords - Switch to FTS for specific terms
Use structured queries - Filter by tags/properties when possible
Set scope appropriately - Narrow to directories for focused work
Combine approaches - Use multiple search types for thorough research
Organization Tips
Use consistent tags - Establish tag naming conventions
Add properties - Include metadata for filtering
Set schedules - Use SCHEDULED/DEADLINE for time management
Include hashtags - Quick inline categorization
Write descriptive headings - Better search results
Performance Tips
Disable auto-indexing - For large workspaces (manual trigger)
Choose appropriate embeddings - Balance quality vs. speed
Limit result counts - Don't request thousands of results
Use search scope - Narrow searches to relevant directories
Cache frequent queries - Database has 15-minute result cache
Technical Architecture
Database Schema
Files Table
Tracks indexed files with modification tracking:
path, file_type, mtime, hash, size, indexed_at
Headings Table
Org/markdown headings with full metadata:
level, title, todo_state, priority
tags, inherited_tags, properties
`scheduled', `deadline', `closed'
line_number, begin_pos
Source Blocks Table
Code blocks with language and content:
`language', `content', `headers'
line_number, cell_index
Links Table
All link types (file, http, id, etc.):
link_type, target, description
line_number
Chunks Table
Text chunks for semantic search:
content, line_start, line_end
embedding (F32BLOB vector)
FTS Content (Virtual Table)
Full-text search index:
file_path, title, content
Porter stemming, Unicode normalization
BM25 ranking support
Indexes
Performance indexes on:
headings.file_id, headings.todo_state
`headings.deadline', `headings.scheduled'
source_blocks.language
`hashtags.tag'
`chunks.embedding' (vector index with cosine metric)
files.file_type
Vector Search
Using libsql's vector extension:
Cosine similarity metric
F32BLOB storage format
HNSW-like index structure
Efficient nearest neighbor queries
Parsers
Org Mode
UnifiedParserAdapter - Full AST parser compatible with org-element:
Recursive heading parsing with inheritance
Property drawer extraction
Timestamp parsing (scheduled, deadline, closed)
Source block with headers
Link extraction
Markdown
Simplified parser:
ATX heading syntax (`#')
Fenced code blocks
Inline links
Jupyter Notebooks
ipynbParser - Notebook-specific parser:
Markdown cells → headings and links
Code cells → source blocks
Cell indices tracked for navigation
Hashtags from markdown cells
See Also
Getting Started - Initial setup and first steps
Source Blocks - Code execution and literate programming
TODO Items - Task management features
Links - Link types and navigation
Configuration - Full settings reference