Post

LLM Papers: Automated Research Paper Collection

An automated system for collecting, categorising, and displaying research papers about Large Language Model applications from arXiv. Hosted on GitHub Pages with daily automated updates.

LLM Papers: Automated Research Paper Collection

Overview

LLM Papers is an automated system for collecting, categorising, and displaying research papers about Large Language Model applications from arXiv. Hosted on GitHub Pages with daily automated updates.

Key Features

  • Automated Collection: Uses 13 targeted arXiv search queries to gather relevant papers
  • Smart Categorisation: Auto-categorises papers into topics like agents, tool-use, reasoning, and RAG
  • Search & Filter: Fuzzy search and filtering capabilities using Fuse.js
  • Daily Updates: Automated updates via GitHub Actions ensure the latest papers are always available
  • Static Site: Lightweight static site hosted on GitHub Pages for fast performance
  • Manual Curation: Blocklist system for filtering unwanted papers

Technical Implementation

The system organises papers by year in separate JSON files and maintains a lightweight index for fast frontend loading. It implements rate-limiting for arXiv API requests and includes deduplication logic. The categorisation uses keyword-based matching across predefined topic categories.

Technical Stack

  • Frontend: JavaScript, HTML, CSS, Fuse.js
  • Automation: GitHub Actions
  • Hosting: GitHub Pages
  • Data Source: arXiv API

Why I Built This

As someone deeply interested in LLMs and their applications, I needed a way to stay current with the rapidly evolving research landscape. This tool automates the tedious process of manually searching arXiv and organises papers in a way that makes them easy to discover and explore.

This post is licensed under CC BY 4.0 by the author.