LLM Papers: Automated Research Paper Collection

An automated system for collecting, categorising, and displaying research papers about Large Language Model applications from arXiv. Hosted on GitHub Pages with daily automated updates.

Posted Jan 16, 2025

By Sinan Koparan

1 min read

Overview

LLM Papers is an automated system for collecting, categorising, and displaying research papers about Large Language Model applications from arXiv. Hosted on GitHub Pages with daily automated updates.

View Live Site

View on GitHub

Key Features

Automated Collection: Uses 13 targeted arXiv search queries to gather relevant papers
Smart Categorisation: Auto-categorises papers into topics like agents, tool-use, reasoning, and RAG
Search & Filter: Fuzzy search and filtering capabilities using Fuse.js
Daily Updates: Automated updates via GitHub Actions ensure the latest papers are always available
Static Site: Lightweight static site hosted on GitHub Pages for fast performance
Manual Curation: Blocklist system for filtering unwanted papers

Technical Implementation

The system organises papers by year in separate JSON files and maintains a lightweight index for fast frontend loading. It implements rate-limiting for arXiv API requests and includes deduplication logic. The categorisation uses keyword-based matching across predefined topic categories.

Technical Stack

Frontend: JavaScript, HTML, CSS, Fuse.js
Automation: GitHub Actions
Hosting: GitHub Pages
Data Source: arXiv API

Why I Built This

As someone deeply interested in LLMs and their applications, I needed a way to stay current with the rapidly evolving research landscape. This tool automates the tedious process of manually searching arXiv and organises papers in a way that makes them easy to discover and explore.

Projects, Research Tools

This post is licensed under CC BY 4.0 by the author.