Use "git ls-remote" to guess if "git fetch" is a no-op
Summary:
Ref T12296. Ref T12392. Currently, when we're observing a remote repository, we periodically run git fetch ....
Instead, periodically run git ls-remote (to list refs in the remote) and git for-each-ref (to list local refs) and only continue if the two lists are different.
The motivations for this are:
- In T12296, it appears that doing this is faster than doing a no-op git fetch. This effect seems to reproduce locally in a clean environment (900ms for ls-remote + 100ms for for-each-ref vs about 1.4s for fetch). I don't have any explanation for why this is, but there it is. This isn't a huge change, although the time we're saving does appear to mostly be local CPU time, which is good for us.
- Because we control all writes, we could cache git for-each-ref in the future and do fewer disk operations. This doesn't necessarily seem too valuable, though.
- This allows us to tell if a fetch will do anything or not, and make better decisions around clustering (in particular, simplify how observed repository versioning works). With git fetch, we can't easily distinguish between "fetch, but nothing changed" and "legitimate fetch".
If a repository updates very regularly we end up doing slightly more work this way (that is, if ls-remote always comes back with changes, we do a little extra work), but this is normally very rare.
This might not get non-bare repositories quite right in some cases (i.e., incorrectly detect them as changed when they are unchanged) but we haven't created non-bare repositories for many years.
Test Plan: Ran bin/repository update --trace --verbose PHABX, saw sensible construction of local and remote maps and accurate detection of whether a fetch would do anything or not.
Reviewers: chad
Reviewed By: chad
Maniphest Tasks: T12392, T12296
Differential Revision: https://secure.phabricator.com/D17497