From: Duje Mihanović Date: Mon, 15 Jan 2024 16:03:43 +0000 (+0100) Subject: New post: Matrix delegation and how it may... X-Git-Url: http://git.dujemihanovic.xyz/%22http:/www.sics.se/static/%7B%7B%20%24.Site.BaseURL%20%7D%7Dposts/%7B%7B%20%24image.RelPermalink%20%7D%7D?a=commitdiff_plain;h=cb54ae7dbce8a8b0a5a81a3e08965fc7ec32a5ce;p=dujemihanovic.xyz.git New post: Matrix delegation and how it may... --- diff --git a/content/posts/matrix-delegation/index.md b/content/posts/matrix-delegation/index.md new file mode 100644 index 0000000..76324e6 --- /dev/null +++ b/content/posts/matrix-delegation/index.md @@ -0,0 +1,76 @@ +--- +title: "Matrix delegation and how it may bite you" +date: 2024-01-14T11:19:48+01:00 +summary: One of the ways small details can cause big issues. +--- +For those who don't know, delegation in Matrix is used in server-to-server +communication to figure out which server serves a given domain. As an example, +if my own Matrix homeserver was running on `matrix.dujemihanovic.xyz` instead of +`dujemihanovic.xyz`, I could delegate the latter to the former to save anyone +wanting to contact me from having to type out the `matrix.`. + +Besides domain name, delegation can also be used to specify which port to use +for server-to-server communication. The default is `8448`, and if it's blocked +you can use delegation to use `443` for server-to-server as client-to-server +does by default. However, if you can, **I'd strongly suggest using `8448`!** I had +been delegating S2S to `443` almost the whole time I have had this server for no +reason and it seems that this caused an extremely weird issue with a certain +room: + +## What happened? + +Message fetching kept breaking **constantly**. What I mean by that is that when +I joined the room everything would work fine the first few messages, but at some +point I would start getting notifications without any new message being present +in that room. I have noticed that logging out and back in would get the missing +messages in my client, but then the forementioned cycle would repeat again no +matter how many times I logged out and back in *(this also happened on other +clients besides Element desktop)*. To confirm my homeserver was the issue, I +joined the room with my old matrix.org account and sure enough that worked just +fine. + +I tried the usual things such as restarting Dendrite and the whole VPS, but to +no avail. I was pretty insistent that the issue was not with my homeserver but +the main server hosting the room *(which, unsurprisingly, turned out to be +false)* and so I gave up on that. The eyeopening moment was me reading the +[conduit +documentation](https://gitlab.com/famedly/conduit/-/blob/next/DEPLOY.md) *(I had +considered migrating to it)*, specifically this: + +> If Conduit runs behind Cloudflare reverse proxy, which doesn't support port +> 8448 on free plans, + +This implies that routing server-to-server traffic to `443` should only be done +if it's **absolutely impossible** to use `8448` for this, and the [Synapse +documentation](https://matrix-org.github.io/synapse/latest/delegate.html#when-do-i-need-delegation) +said something similar: + +> **However**, if your homeserver's APIs aren't accessible on port 8448 and on +> the domain server_name points to, you will need to let other servers know how +> to find it using delegation. + +## Fixing the issue + +Encouraged by this, I fixed up my server: + +* allow port `8448` in `ufw` +* add something like this to `Caddyfile`: +``` +dujemihanovic.xyz:8448 { + reverse_proxy /_matrix/* localhost:8008 +} +``` +* change `/.well-known/matrix/server` to point to `dujemihanovic.xyz:8448` *(in + theory, I could have gotten rid of that `return` directive altogether as + `8448` is default anyway, but I still chose to specify it just to be safe)* +* reload `caddy` and restart `dendrite` *(the latter is, again, just to be + safe)* + +Once all this was done, the room finally started acting normally. + +## Small sidenote + +I must note that delegating federation to `443` **should not cause breakage like +this**. Despite this, it still did so in my case and for that reason I wrote +about it anyway. It's very unlikely that you will be affected by this issue, but +I still believe it should be pointed out in the event that it does.