- Go 90.5%
- Shell 9.5%
| .gitignore | ||
| config.json.backup | ||
| config.json.example | ||
| config.yaml | ||
| config.yaml.example | ||
| cookies.txt | ||
| fediarchive | ||
| go.mod | ||
| go.sum | ||
| install.sh | ||
| main | ||
| main.go | ||
| main_test.go | ||
| README.md | ||
Archive Mastodon
Automatically follows back Fediverse users, extracts URLs from posts, and archives them in ArchiveBox.
Tested with GoToSocial, compatible with Mastodon.
GoToSocial note: Redirect URI/Callback URL is urn:ietf:wg:oauth:2.0:oob
Prerequisites
- Go 1.19+
- Fediverse instance (GoToSocial/Mastodon)
- ArchiveBox instance
Installation
./install.sh
Configuration
The application supports both JSON and YAML configuration formats. It will automatically detect and load the first available file in this order: config.yaml, config.yml, config.json.
JSON Configuration
sudo nano /opt/archive-mastodon/config.json
{
"fediverse": {
"instance_url": "https://your-instance.com",
"username": "your-username",
"password": "your-password",
"token": "",
"token_exp": ""
},
"archivebox": {
"url": "http://localhost:8000",
"username": "",
"password": "",
"tag": "fediarchive"
},
"settings": {
"max_posts_per_user": 5000,
"include_visibility": ["public", "unlisted", "private"]
}
}
YAML Configuration
sudo nano /opt/archive-mastodon/config.yaml
fediverse:
instance_url: "https://your-instance.com"
username: "your-username"
password: "your-password"
token: ""
token_exp: ""
archivebox:
url: "http://localhost:8000"
username: ""
password: ""
tag: "fediarchive"
settings:
max_posts_per_user: 5000
include_visibility:
- "public"
- "unlisted"
- "private"
Configuration Options
-
fediverse: Fediverse instance configuration
instance_url: Your Fediverse instance URLusername: Your username/emailpassword: Your passwordtoken: OAuth token (auto-generated)token_exp: Token expiration (auto-managed)
-
archivebox: ArchiveBox configuration
url: ArchiveBox instance URLusername: ArchiveBox admin usernamepassword: ArchiveBox admin passwordtag: Tag to apply to archived URLs (configurable)
-
settings: Application settings
max_posts_per_user: Maximum posts to fetch per userinclude_visibility: Which post visibilities to processblacklisted_domains: List of domains to exclude from archiving (optional)rss_feeds: List of RSS feeds to monitor and archive links from (optional)
RSS Feed Monitoring
The application can monitor RSS feeds from various sources (including Lemmy communities) and automatically archive all links found in the feed items. This is useful for:
- Monitoring Lemmy communities for new posts with links
- Following blog RSS feeds
- Archiving links from news sources
- Tracking content from any RSS-enabled platform
RSS Feed Configuration
The application can monitor a list of RSS feed URLs. You can provide either:
- Direct RSS feed URLs (ending in
.xml) - Community page URLs (the application will automatically discover the RSS feed)
Example RSS Configuration
settings:
rss_feeds:
- "https://natur.23.nu/feeds/c/kulturlandschaft.xml?sort=Active"
- "https://natur.23.nu/c/kulturlandschaft"
- "https://example.com/feed.xml"
RSS Feed Discovery
When you provide a community page URL (like https://natur.23.nu/c/kulturlandschaft), the application will:
- Fetch the page and look for the RSS feed link
- Automatically discover the correct RSS feed URL
- Process the feed and archive external links
Internal Link Filtering
The application automatically filters out internal links from RSS feeds to avoid archiving:
- Lemmy post links (e.g.,
https://natur.23.nu/post/12345) - Links to the same hostname as the RSS feed source
- Only external content URLs are archived
Lemmy Community RSS Feeds
Most Lemmy communities provide RSS feeds at URLs like:
https://instance.com/feeds/c/community.xml?sort=Activehttps://instance.com/feeds/c/community.xml?sort=Newhttps://instance.com/feeds/c/community.xml?sort=Top
The RSS feed will extract URLs from:
- Post links (the main URL being shared)
- Enclosures (media attachments)
- Content descriptions (if they contain URLs)
Usage
sudo systemctl start archive-mastodon
sudo systemctl status archive-mastodon
sudo journalctl -u archive-mastodon -f
Service management:
sudo systemctl enable archive-mastodon # Start on boot
sudo systemctl restart archive-mastodon # Restart service
sudo journalctl -u archive-mastodon -n 50 # Last 50 log entries
API Endpoints
Fediverse (Mastodon API)
POST /api/v1/apps- OAuth app creationGET /oauth/authorize- OAuth authorizationPOST /oauth/token- Token exchangeGET /api/v1/accounts/verify_credentials- Account verificationGET /api/v1/accounts/{id}/followers- Get followersPOST /api/v1/accounts/{id}/follow- Follow userGET /api/v1/timelines/home- Home timeline
ArchiveBox
POST /api/add- Add URL for archiving