back to all projects

Archivebox

Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Archivebox | Screenshot
Visit Website
0

Summary

ArchiveBox is an open-source, self-hosted web archiving solution. It allows users to collect, save, and view websites offline, preserving digital content against link rot. The project supports various input formats, extracts different content types, and stores data in durable formats.

Target Audience

The target audience includes researchers, journalists, lawyers, and archivists who need to preserve and analyze online content. It also appeals to individuals who want to safeguard their personal bookmarks, social media, and other important web pages. The project is designed for technically proficient users who are comfortable with self-hosting and command-line tools.

Key Features

  • Self-hosted and open-source, giving users control over their data.
  • Supports various input formats, including browser history, bookmarks, and RSS feeds.
  • Extracts and saves content in multiple redundant formats like HTML, PDF, and media files.
  • Provides a CLI tool, web UI, and Python API for managing archives.
  • Uses standard, durable, and long-term storage formats.

Pain Points

  • Preserving online content from disappearing or degrading.
  • Maintaining control over archived data.
  • Archiving private web content.
  • Working around sites that block archiving.
  • Managing storage requirements for large archives.

Usage Instructions

  1. Install ArchiveBox using Docker, pip, or other package managers.
  2. Initialize a new archive directory using the archivebox init command.
  3. Add URLs to the archive using the archivebox add command, specifying input files or URLs directly.
  4. Configure ArchiveBox settings via the command line or configuration file.
  5. Run the web server to manage and view the archive through a browser.

Comments

To comment on this project please signup or login .

CodeRabbit AI - Ad

Cut Code Review Time & Bugs in Half!