CiteBot — Intelligent Citation Assistant
CiteBot — Intelligent Citation Assistant
An open-source tool that automates academic citation workflows. Give it a .tex file, and it produces a complete references.bib with relevant, well-formatted references.
How It Works
- Parse LaTeX documents (supports multi-file thesis projects)
- Extract keywords using LLM semantic analysis + NLP ensemble (KeyBERT, YAKE, spaCy)
- Search 5 academic databases in parallel (OpenAlex, Semantic Scholar, PubMed, arXiv, BioRxiv)
- Rank results using composite scoring (keyword overlap, citations, recency, abstract similarity)
- Generate BibTeX via DOI content negotiation
- Insert
\cite{}commands with fuzzy title matching (optional)
Key Features
- LLM + NLP fusion: Combines semantic understanding with statistical term extraction; terms appearing in both get a 1.5x relevance boost
- Multi-file support: Recursive
\input/\includeresolution with per-chapter context-aware extraction - Async search: All database queries execute in parallel via asyncio
- Immutable data pipeline: Frozen dataclasses throughout, preventing mutation bugs in async code
- Graceful degradation: Every component has a fallback — LLM→NLP, DOI→metadata, source failures skip gracefully