AI & Tech·Jun 8, 2026

I built a semantic arXiv search engine with AI-generated TL;DRs, claim classification, and paper comparison

r/artificialJun 82 min readSingle source

The gist

3-point summary · 1 min

Support Development If this project helps your work, support ongoing maintenance and new features.
ETH Donation Wallet 0x11282eE5726B3370c8B480e321b3B2aA13686582 Scan the QR code or copy the wallet address above.
Fast semantic arXiv paper search with AI-powered summaries — no login required.

In this article

Support Development If this project helps your work, support ongoing maintenance and new features. ETH Donation Wallet 0x11282eE5726B3370c8B480e321b3B2aA13686582 Scan the QR code or copy the wallet address above. Fast semantic arXiv paper search with AI-powered summaries — no login required. "Research papers, decoded.." Video Demo Screenshots Landing Page Advanced Search Filters Search papers similar to abstracts Claim Assessment Author Pages Paper Comparison Explore & Discover Features Core Search & Discovery Hybrid Search — Combines FTS5 keyword search and Vectorize semantic search for accurate results Advanced Filtering — Filter by author (substring match), citation count, category, and date range Smart Caching — KV-based caching with 2h TTL for search results, 24h for embeddings Related Papers — Pre-computed top-8 semantically similar papers via Vectorize Topic Collections — Curated topics with category mappings (stored in topics table) Author Pages — Author statistics, timeline visualization, and all papers Full-Text Search — SQLite FTS5 virtual table with automatic triggers AI-Powered Features Pre-Generated Summaries — TL;DR, key contributions, methods, limitations, beginner/technical explanations Entity Extraction — Keywords, entities (models/datasets/benchmarks), paper type classification Claim Classification — AI-powered support/contradiction analysis for scientific claims Smart Abstracts — Enhanced paper metadata with prerequisites and follow-up questions Paper Management Bookmarks — Client-side collections with 90-day TTL (100 bookmark soft cap) Export Options — JSON and BibTeX export for collections Paper Comparison — Side-by-side comparison view (up to 6 papers) Revision History — Track paper updates and version differences Share & Copy — Quick copy for arXiv ID and BibTeX entries Enrichment & Metadata Citation Tracking — Semantic Scholar integration with citation count + influential citations Citation Snapshots — Historical citation data stored in table CrossRef Integration — Journal metadata, publisher, license, funders OpenAlex Data — Concepts, affiliations, institutional data (ROR IDs) Papers With Code — Code repositories, benchmarks, SOTA rankings (schema ready) User Engagement Achievements System — Gamified badges stored client-side with activity tracking Recent Searches — Search history with suggestions Personalized Feed — Recommendations based on bookmark history RSS Feed — /rss.xml with 20 recent papers (1h cache) Developer Tools CLI Interface — arxiv-cli for AI assistants (search, trending, topics, authors) Admin API — Vectorize bulk operations, maintenance endpoints, enrichment triggers SEO & Discoverability Dynamic Meta Tags — Open Graph and Twitter Card tags on all paper pages Sitemap.xml — Auto-generated sitemap with all papers, topics, and authors Robots.txt — Search engine crawler configuration Structured Data — JSON-LD schema markup for papers and authors SSR Content — Server-side rendered pages with full content for crawlers Canonical URLs — Proper canonical tags to prevent duplicate content AI Agent Discovery — /ai.txt and /llms.txt routes for LLM tool integration Performance Edge Caching — Cloudflare KV with intelligent TTL strategies ISR Rendering — Next.js ISR with 10-minute revalidation Zero Login — Instant access to all features Global CDN — Cloudflare Workers edge deployment Security

Integrity note · Xela does not rewrite or paraphrase article content. The excerpt above is the source publication's own words, sanitized for display. For the full piece — including any quotes, charts, or images — read it at r/artificial. Xela's rewritten version is off for this story, so there's no editorial angle attached — you're getting the source's reporting unfiltered. When the rewrite is on, we add a What this means block underneath with the operator/trader takeaway.

What people are saying

Discussion

Hot takes

0/280

Loading takes…

Comments

Discussion · 0

Loading comments…

I built a semantic arXiv search engine with AI-generated TL;DRs, claim classification, and paper comparison

What people are saying

Hot takes

Comments

Discussion · 0

"Chat is dead": OpenAI preps overhaul of ChatGPT

mtmd: add video input support by ngxson · Pull Request #24269 · ggml-org/llama.cpp

Gemma 4 Chat Template now has preserve thinking

Jeff Bezos Is Funding a Wild Hunt for the Brain’s ‘Core Algorithm’

Track ai & tech every morning.