Put links from the fediverse into ArchiveBox.
  • Go 90.5%
  • Shell 9.5%
Find a file
2026-01-20 01:45:45 +01:00
.gitignore feat: add YAML configuration support 2025-07-28 00:06:34 +02:00
config.json.backup feat: automatically convert JSON config to YAML 2025-07-28 00:08:13 +02:00
config.json.example refactor: simplify RSS feed configuration 2025-07-28 01:26:37 +02:00
config.yaml feat: use S3 bucket for URL import instead of flakey archivebox web access 2026-01-20 01:45:45 +01:00
config.yaml.example refactor: simplify RSS feed configuration 2025-07-28 01:26:37 +02:00
cookies.txt feat: add natur.23.nu to blacklist examples 2025-07-28 01:04:28 +02:00
fediarchive feat: add RSS feed discovery and internal link filtering 2025-07-28 01:42:55 +02:00
go.mod feat: use S3 bucket for URL import instead of flakey archivebox web access 2026-01-20 01:45:45 +01:00
go.sum feat: use S3 bucket for URL import instead of flakey archivebox web access 2026-01-20 01:45:45 +01:00
install.sh fix: archiving 2025-07-27 19:16:44 +02:00
main feat: add RSS feed discovery and internal link filtering 2025-07-28 01:42:55 +02:00
main.go feat: use S3 bucket for URL import instead of flakey archivebox web access 2026-01-20 01:45:45 +01:00
main_test.go fix: archiving 2025-07-27 19:16:44 +02:00
README.md feat: add RSS feed discovery and internal link filtering 2025-07-28 01:42:55 +02:00

Archive Mastodon

Automatically follows back Fediverse users, extracts URLs from posts, and archives them in ArchiveBox.

Tested with GoToSocial, compatible with Mastodon.

GoToSocial note: Redirect URI/Callback URL is urn:ietf:wg:oauth:2.0:oob

Prerequisites

  • Go 1.19+
  • Fediverse instance (GoToSocial/Mastodon)
  • ArchiveBox instance

Installation

./install.sh

Configuration

The application supports both JSON and YAML configuration formats. It will automatically detect and load the first available file in this order: config.yaml, config.yml, config.json.

JSON Configuration

sudo nano /opt/archive-mastodon/config.json
{
  "fediverse": {
    "instance_url": "https://your-instance.com",
    "username": "your-username",
    "password": "your-password",
    "token": "",
    "token_exp": ""
  },
  "archivebox": {
    "url": "http://localhost:8000",
    "username": "",
    "password": "",
    "tag": "fediarchive"
  },
  "settings": {
    "max_posts_per_user": 5000,
    "include_visibility": ["public", "unlisted", "private"]
  }
}

YAML Configuration

sudo nano /opt/archive-mastodon/config.yaml
fediverse:
  instance_url: "https://your-instance.com"
  username: "your-username"
  password: "your-password"
  token: ""
  token_exp: ""

archivebox:
  url: "http://localhost:8000"
  username: ""
  password: ""
  tag: "fediarchive"

settings:
  max_posts_per_user: 5000
  include_visibility:
    - "public"
    - "unlisted"
    - "private"

Configuration Options

  • fediverse: Fediverse instance configuration

    • instance_url: Your Fediverse instance URL
    • username: Your username/email
    • password: Your password
    • token: OAuth token (auto-generated)
    • token_exp: Token expiration (auto-managed)
  • archivebox: ArchiveBox configuration

    • url: ArchiveBox instance URL
    • username: ArchiveBox admin username
    • password: ArchiveBox admin password
    • tag: Tag to apply to archived URLs (configurable)
  • settings: Application settings

    • max_posts_per_user: Maximum posts to fetch per user
    • include_visibility: Which post visibilities to process
    • blacklisted_domains: List of domains to exclude from archiving (optional)
    • rss_feeds: List of RSS feeds to monitor and archive links from (optional)

RSS Feed Monitoring

The application can monitor RSS feeds from various sources (including Lemmy communities) and automatically archive all links found in the feed items. This is useful for:

  • Monitoring Lemmy communities for new posts with links
  • Following blog RSS feeds
  • Archiving links from news sources
  • Tracking content from any RSS-enabled platform

RSS Feed Configuration

The application can monitor a list of RSS feed URLs. You can provide either:

  • Direct RSS feed URLs (ending in .xml)
  • Community page URLs (the application will automatically discover the RSS feed)

Example RSS Configuration

settings:
  rss_feeds:
    - "https://natur.23.nu/feeds/c/kulturlandschaft.xml?sort=Active"
    - "https://natur.23.nu/c/kulturlandschaft"
    - "https://example.com/feed.xml"

RSS Feed Discovery

When you provide a community page URL (like https://natur.23.nu/c/kulturlandschaft), the application will:

  1. Fetch the page and look for the RSS feed link
  2. Automatically discover the correct RSS feed URL
  3. Process the feed and archive external links

The application automatically filters out internal links from RSS feeds to avoid archiving:

  • Lemmy post links (e.g., https://natur.23.nu/post/12345)
  • Links to the same hostname as the RSS feed source
  • Only external content URLs are archived

Lemmy Community RSS Feeds

Most Lemmy communities provide RSS feeds at URLs like:

  • https://instance.com/feeds/c/community.xml?sort=Active
  • https://instance.com/feeds/c/community.xml?sort=New
  • https://instance.com/feeds/c/community.xml?sort=Top

The RSS feed will extract URLs from:

  • Post links (the main URL being shared)
  • Enclosures (media attachments)
  • Content descriptions (if they contain URLs)

Usage

sudo systemctl start archive-mastodon
sudo systemctl status archive-mastodon
sudo journalctl -u archive-mastodon -f

Service management:

sudo systemctl enable archive-mastodon  # Start on boot
sudo systemctl restart archive-mastodon # Restart service
sudo journalctl -u archive-mastodon -n 50 # Last 50 log entries

API Endpoints

Fediverse (Mastodon API)

  • POST /api/v1/apps - OAuth app creation
  • GET /oauth/authorize - OAuth authorization
  • POST /oauth/token - Token exchange
  • GET /api/v1/accounts/verify_credentials - Account verification
  • GET /api/v1/accounts/{id}/followers - Get followers
  • POST /api/v1/accounts/{id}/follow - Follow user
  • GET /api/v1/timelines/home - Home timeline

ArchiveBox

  • POST /api/add - Add URL for archiving