Managing Papers With Zotero For Free Using Caddy And Tailscale
TL;DR: I’ve chosen Zotero for papers as it has very nice annotation capabilities. 99% of this post shows you how to link your own secure storage server to store the pdfs and annotations.
Motivation
I read a lot of papers like anyone interested in the ML space would. Often, when reading papers, I’m on the go and rarely have my laptop with me. I’ll have either a tablet if I’m lucky, or likely simply my phone in a crowded subway. I need a solution that lets me keep track of what I’m reading and switch between devices easily.
I have been keeping it embarrassingly simple and using Apple Books as my viewer
and annotator, and icloud as my storage. However, I have grown to the point where I
have so many it’s hard to keep track and search through previous papers. It also
doesn’t help that the number of papers published per month grows exponentially,
with arXiv for example almost
doubling in 2020 (at about 14000) to 24000 in 2025:
.
I’m currently at 432 pdfs totalling at 2.4GB with this file size distribution (note the x-axis is logarithmic, in powers of two) 1:
This led to the need for a better solution. After doing some reading, I have found that Zotero meets all my needs. However, you still have to pay for storage. I’m not sure I’m ready to do that yet as I’m not sure how much I want to commit to Zotero yet so I wanted to find a better alternative. I found Zotero does allow for using an alternate storage option so long as it supports the WebDAV protocol. As a matter of fact, they list free providers here. However, the storage options are still a bit small for me, especially since I’m already at 2.4GB of mostly papers. Some are also not in the US, which I would hesitate.
Given it’s pretty easy to switch storage servers with Zotero (so I can change my mind later), I decided instead to run this out of my home server for now. However, doing so poses great risk and most of the work in getting this done right has nothing to do with the webserver itself but the security around it. This blog post will show you how to accomplish this.
Step 1: Choose Your Server
Before you begin, obviously, you need a server. It can be as simple as a raspberry pi connected to a 64GB usb drive, up to you.
Step 2: Install Tailscale On All Your Connected Devices
Get tailscale. It’s free and quite powerful. You’ll need it installed on both the device that hosts the webDAV server and the devices that connect to it.
On your tailscale network, your devices will have names like machine-name.tailnet-name.ts.net
.
TIP: I recommend naming your devices something simple, like workstation
or
raspberrypi
. Try avoiding personally identifiable information like johndoe
as this information will appear in a public ledger of https certificates (see
here).
Step 3: Get an HTTPS certificate for your server
This only needs to be run on the server that will host the webdav service.
-
Enable https certificates for your tailnet here.
-
On the machine that will serve the files, run:
tailscale cert
to get your machine name, then
sudo tailscale cert machine-name.tailnet-name.ts.net
with the machine name and tailnet name you got from the previous step information.
NOTE: This will only create a cert once, which at the time of this writing, are
valid for 90 days. When the cert expires, you can either call the tailscale
cert
command again or optionally, you can install this extension.
Step 4: Lock Down Your Server
DO NOT SKIP THIS STEP
Running a server that accepts requests from anyone, even if a local network is extremely risky. Fortunately, since we’re using our tailscale VPN which is already doing the security heavy lifting, we don’t actually have to worry about that. We can simply add firewall rules that only accepts incoming requests from the VPN itself. See this document for more details.
It will recommend using UFW (Uncomplicated Firewall) and the steps are:
1. Enable the firewall
sudo ufw enable
2. Deny everything incoming
# deny all incoming by default
sudo ufw default deny incoming
sudo ufw default allow outgoing
3. List all rules (to find the tailscale one)
sudo ufw status verbose
4. Allow connections from the tailscale vpn (which should be tailscale0
)
sudo ufw allow in on tailscale0
Restart everything
sudo ufw reload
sudo service ssh restart
NOTE: Please do not skip this step unless you absolutely know what you’re doing. The web is dangerous.
Step 4: Install caddy
You can find instructions here.
For example for ubuntu: sudo apt install caddy
.
Step 5: Allow caddy access to your tailscale certificate
In the configuration file /etc/default/tailscaled
for the tailscale daemon, add this line:
TS_PERMIT_CERT_UID=caddy
This will give the caddy
user (which caddy should run as) permission to access the certificates from tailscale. Interestingly, caddy out of the box knows to try to do this when it sees domains that end in *.ts.net
, so nothing else needs to be done.
After this edit, make sure to restart the daemon. For example, if using systemd
you would run:
sudo systemctl restart tailscaled
Step 6: Add webdav plugin to caddy
sudo caddy add-package github.com/mholt/caddy-webdav
(From instructions here).
Step 7: Edit your caddy file
Caddy’s configuration file is located in /etc/caddy/Caddyfile
.
Add these lines to it:
# configure webdav module
{
order webdav before file_server
}
# add webdav.
# (Note the 443 is optional as it's the default, but I find it clearer)
machine-name.tailnet-name.ts.net:443 {
# set up webdav for the host
handle_path /webdav/* {
root * /data/webdav
webdav
basicauth {
user some-password-hash
}
}
}
replace some-password-hash
with the hash of your desired password. You can get it by running:
caddy hash-password
and pasting that output.
NOTE: The password protection is not necessary for security. It is only here to help prevent accidental requests into the webdav server which are pretty easy to make.
(Again from instructions here).
NOTE: You can remove the nested handle_path
and simply have this if you prefer:
machine-name.tailnet-name.ts.net {
root * /data/webdav
webdav
basicauth {
user some-password-hash
}
}
I’m using the handle_path directive so that the address to my webdav server will be https://machine-name.tailnet-name.ts.net/webdav and not https://machine-name.tailnet-name.ts.net. I do this because my https server points to multiple servers (webdav is one, but code-server is another).
Step 8: Restart Caddy
sudo systemctl restart caddy
Step 9: Test your webdav connection
Just listing should be fine:
$ curl -X PROPFIND -u "user:pass" https://my-machine.tailnet-name.ts.net/webdav/
(or remove the webdav/
suffix if you didn’t use the handle_path
directive).
Step 10: Add this to Zotero
For example:
This should work just fine across all your devices!
Discussion
Why Caddy?
The goto choice for a webdav server tends to be apache2. I only chose caddy since it has a built in integration with tailscale certificates.
Questions/Comments?
If you’ve ever written your own instructions for anyone, you’ll know very well that it’s extremely difficult to get this with all the right details right so that it works for most people without sinking a lot of time into it.
If something doesn’t work for you or seems confusing, please add any questions or comments in the comments section below. These are rough guidelines of what worked for me, but it’s possible I missed some detail that seemed obvious to me but may not be obvious to someone with less context.
If you have a better suggestion for how to organize research papers as well, please share!
Thank you!
Future Plans
I would love to have a general queue of to-read items that include links to the web and papers (while still keeping the advanced annotations and reference tracking that Zotero provides for papers). I love pocket but unfortunately it shut down. Fortunately, it looks like a few alternatives have popped up, one that seems to be the very promising is readeck. It looks like someone requested zotero integration. Perhaps in the future this might be a nice way to integrate a generalized reading queue.
-
Yes, there are large pdfs, for example, Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild is a 50MB pdf, or one of my favorites, this 29MB pdf Generating Physically Stable and Buildable Brick Structures from Text. The top 4 are actually textbooks, with the largest at 152MB. ↩
Comments/Suggestions?
NOTE: You'll need to have a github account and give giscus comment access. This is necessary to allow it to post a comment on your behalf. If you don't feel comfortable giving giscus access, please find the corresponding topic and manually comment here.