Git-annex in 2020

I recently began searching for alternatives to sync my files between different devices. During my search I stumbled upon git-annex, which seemingly supports managing different file locations on computers, servers, pendrives etc. It sounds like a really interesting concept, but I haven't been able to find that many tutorials, let alone ones that are up to date.

So my question is: is anyone here using git-annex and what are your experiences with it? And should I even be using it in 2020?

Sorry if this question is out of scope of this subreddit, if so can you point me to a more suited community?

πŸ‘︎ 20
πŸ’¬︎
πŸ‘€︎ u/jabbermuggel
πŸ“…︎ Oct 01 2020
🚨︎ report
I don’t trust any storage-how my ❀ for Git Annex solved my backup paranoia. james-read.medium.com/i-d…
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/xconspirisist
πŸ“…︎ Jan 19 2021
🚨︎ report
using git-annex and mergerfs together

I am looking for gotchas or best practices.

I currently use mergerfs / snapraid and I want to incorporate git-annex.

I plan on making repos on each branch (setup as remotes of each other).

have any of you tried using these 2 tools together?

there is another post suggesting cache.files=partial for sqlite3, I am wondering what other issues I might run into.

honest, since git-annex allows for location tracking across drives (even offline), I may just stop using mergerfs altogether, I am curious if anyone has gone this route also.

Thanks!

πŸ‘︎ 5
πŸ’¬︎
πŸ‘€︎ u/alexwagner74
πŸ“…︎ Dec 04 2020
🚨︎ report
git-annex encrypted on rsync.net?

Hi! I have been aware of git-annex for quite some time, but didn't really find the need to use it until now.

In the last 2 years I just used restic, backing up files from the laptop to a specific directory and the one from the home pc to another one. Now, I basically run out of space on the laptop, but still want to have everything accessibile on need.

So probably git-annex is the perfect solution for data sync. The cloud solution that seems to fit best is rsync.net. However, I don't know if it possible to save everything encrypted over there.. Does anyone know if it is possible?

Also, as far as my understanding goes, git-annex does not provide any sort of backup solution out of the box. I'm not really sure how to handle this part. Maybe I could use restic from within rsync.net (as they provide ssh access) backup to gdrive or something.

But I'm sure there is a better approach.. Doing as I said is giving complete trust to rsync.net. If they, say, get hacked, I may lose access to both the git-annex repo and the backup (being restic set up there). Maybe there is a way to mount the rsync.net git-annex repo on my laptop and then use restic locally...

How do you handle this? Is there an effectively good method?

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/bobicino
πŸ“…︎ Jan 01 2021
🚨︎ report
Git Annex on two machines with 4x5TB drives each

I have two machines with 4x5TB. I heard that git annex is great tool to manage my data. Nowadays i keep all 4 drives pooled together using MergerFS and i rsync data from my master to slave.

There is a risk that i will not spot that some files are missing in time, so i wanted to secure myself from such loss using git annex. The idea is that it allows me to manage my files from laptop, which have just 1TB of storage space. The problem is that git annex does not work over MergerFS.

First i was thinking that's not big deal, so i created one repo on my laptop, then one on every drive per machine - that's 9 repos. But now i can not guarantee that numcopy 2 setting will secure my data - because both copies can land on the same machine.

Is there a way to tell git annex that 4 repos are on one machine? If not, how should i handle this problem? If yes, please, tell me how?
Maybe git annex is not good solution for my problem? I want my files easy to manage, but being replicated to both locations. I want add smaller nodes in future, so my storage will not be even. The smaller nodes will be outside my lan - most likely connected to gether by mobile network with capped bandwidth - most likely created into Virtual Lan using WireGuard.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/nikowek
πŸ“…︎ Sep 30 2020
🚨︎ report
πŸ‘︎ 20
πŸ’¬︎
πŸ‘€︎ u/jabbermuggel
πŸ“…︎ Oct 02 2020
🚨︎ report
Looking for git annex alternatives to track files on CIFS/SMB and DVDs

Hi,

I've started tracking files using git annex on my Linux laptop, several Linux PCs, and removable disks.

But I have a problem with tracking outside of git annex's working directory like DVD-R or on mounted CIFS/SMB storage where I can't move files or rename them.

I would like to track the same file exists at least in three copies e.g.:

/mount/shared-linux-cifs-server1/path1/FILE1.zip

/mount/windows-10-laptop/path2/FILE1.zip

/mount/DVD-26/path3/FILE1.zip

/mount/external-hdd/path4/FILE1.zip

Also, it would be good to have checksum verification.

Which tool could you recommend to run on Linux?

Thanks

πŸ‘︎ 11
πŸ’¬︎
πŸ‘€︎ u/userAdmin100
πŸ“…︎ Nov 03 2020
🚨︎ report
Question to fellow data hoarders: anyone using git-annex with bup and can share setup and layout?

Hello fellow data hoarders.

I have a lot of hard drives and backups, mostly because i am keeping my stuff for over 25 years of digital work, and i have a lot of media files (photos, texts and other stuff) that i need (and want to) keep track of.

I recently stumbled across git-annex, and i am trying to use it in combination with bup to be able to work through older harddrives (or other media) of mine and to archive it.

Anyone else here using bup and git-annex, and can give a bit of insight into setup and usage?

thanks!

πŸ‘︎ 5
πŸ’¬︎
πŸ“…︎ Jan 16 2021
🚨︎ report
Has anybody used git-annex extensively in their setup?

https://git-annex.branchable.com/

I'm just trying to get a feel for how useful it would be. If anybody can share their experiences with it, it would be much appreciated. Thanks!

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/Makhram
πŸ“…︎ Oct 08 2020
🚨︎ report
Storing files with Filecoin and git-annex

I'm an avid git-annex user (a tool that extends git to conveniently store large files and track their locations) and have been eyeing Filecoin since it was announced. I have ~10TB of files in a handful of annexes which are primarily stored on a NAS, with encrypted backups on Google Drive using rclone (with unlimited free storage through my university).

Since git-annex supports IPFS as a storage backend (and I do have a few files in my annexes with copies available through IPFS, although the availability can't be trusted) I was wondering if anyone else has been thinking of using Filecoin for cheap and decentralized storage with trusted availability.

A quick search gave no results, but I read briefly about lotus and am curious how hard it would be to write a special remote wrapper script that let's git-annex use it as a backend.

So, a few questions:

  • How much interop is there between IPFS and Filecoin? If I'm hosting a file with Filecoin, can I rely on it being available trough IPFS?
  • How do hosts get paid for highly requested files?
  • Is this a reasonable use-case for Filecoin at this time?

It'd be amazing if I could make this work, but I appreciate that I'm early here so not expecting a solution, just curious as to people's general thoughts on the matter.

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/ErikBjare
πŸ“…︎ Oct 19 2020
🚨︎ report
Storing files with Filecoin and git-annex /r/filecoin/comments/je4k…
πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/ErikBjare
πŸ“…︎ Oct 19 2020
🚨︎ report
Git Annex Emacs integrations git-annex.branchable.com/…
πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/negativeoilprice
πŸ“…︎ Oct 13 2020
🚨︎ report
Git Annex Assistant β€” Could this become the free, libre equivalent of Abstract (git/syncthing for designers)? Rough around the edges but conceptually promising! git-annex.branchable.com/…
πŸ‘︎ 21
πŸ’¬︎
πŸ‘€︎ u/kxra
πŸ“…︎ Aug 28 2019
🚨︎ report
are there any good reasons *not* to use git-annex?

I really do like some of the features, I tried it years ago and didn't love some of the bugs at the time, but I was considering giving it another chance... do any of you have any experience with it? (or alternatives such as perkeep, etc).

is haskell a pro or a con? seems like most other modern projects are either python or golang (or c or java /barf)... I haven't seen many haskell projects that I know of.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/alexwagner74
πŸ“…︎ Mar 09 2020
🚨︎ report
Git-LFS and Git Annex limitations for ML reproduciblity and its overcoming with DVC

DVC (Data Science Version Control) trumps these alternatives in the following key ways:

  • git-lfs is bound to a single upstream, e.g. github/bitbucket/gitlab β€” these require special servers that are limited in terms of storage space, even if you run them on premisis.

  • GitHub currently enforces a 2 GiB size limit per-object, even with LFS

  • On GitHub, beyond 1GB, you have to pay extra

  • git-annex is more flexible, but more challenging to work with and configure

And both git-lfs and git-annex suffer from using Git’s smudge and clean filters to show the real file on checkout. Git only stores that small text file and does so efficiently. The downside, of course, is that large files are not version controlled: only the latest version of a file is kept in the repository.

Whilst DVC doesn’t drop into projects as easily as the above options, it does offer improvements on the limitations of those tools. Furthermore, DVC offers key features (pipelines and reproducibility) which those alternatives do not include at all: First Impressions of Data Science Version Control (DVC) - How Does DVC Compare to Alternatives?

πŸ‘︎ 3
πŸ’¬︎
πŸ“…︎ May 22 2019
🚨︎ report
Issues with git-annex and redshift on arch arm

Has anyone tried installing git-annex or redshift on arch arm?

I have Asus C101P that I now use as my substitute pc while I am trying to repair my dell m6700.

I use arch with lxde most of the time instead of chrome os. Quite happy about the experience so far. The headphones are not recognised, though, but it's not a deal-breaker for me.

However, I have two minor issues:

- I cannot install git-annex. Although the package is listed in community repo (see https://archlinuxarm.org/packages/armv7h/git-annex ), running pacman -S git-annex only returns 'target not found'

- redshift installed with no complaints but I cannot get it working (and yes, I have my location and light temperature parameters set in the config). The process starts but nothing happens (I have tried both redshift and redshift-gtk).

TIA

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/4olleh
πŸ“…︎ Feb 02 2019
🚨︎ report
Git Annex Assistant β€” Ugly but conceptually interesting! Could this become the free/libre equivalent of Abstract (git/syncthing for designers)? git-annex.branchable.com/…
πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/kxra
πŸ“…︎ Aug 23 2019
🚨︎ report
Apparently git-annex supports youtube-dl, torrents, and rclone

Seems pretty cool. I haven't wrapped my head around git-annex yet, but this seems really useful for hoarders: https://git-annex.branchable.com/tips/using_the_web_as_a_special_remote/

It seems people are using it for, among other things, podcasts.

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/whoapestheapes
πŸ“…︎ Sep 16 2019
🚨︎ report
Help backup the Internet Archive using git-annex! archiveteam.org/index.php…
πŸ‘︎ 83
πŸ’¬︎
πŸ‘€︎ u/merry0
πŸ“…︎ Mar 02 2017
🚨︎ report
ML Reproducibility Challenges - How DVC Tool Helps Overcome Git-LFS and Git-Annex Limitations for Data Versioning medium.com/@christopher.s…
πŸ‘︎ 10
πŸ’¬︎
πŸ“…︎ May 23 2019
🚨︎ report
ML Reproducibility Challenges - How DVC Tool Helps Overcome Git-LFS and Git-Annex Limitations

When ML models need to be regularly updated in production, a host of challenges emerges. Paramount among ML reproducibility concerns are the following:

  • Effectively versioning your models
  • Capturing the exact steps in your data munging and feature engineering pipelines
  • Dependency management (including of your data and infrastructure)
  • Configuration tracking

No one tool can do it all for you - organizations using a mix of Git, Makefiles, ad hoc scripts and reference files for reproducibility. The following overview explains how DVC enters this mix offering a cleaner solution, specifically targeting data science challenges: First Impressions of Data Science Version Control (DVC)

πŸ‘︎ 6
πŸ’¬︎
πŸ“…︎ May 23 2019
🚨︎ report
git-annex alternatives?

are there any other solutions that will help you keep track of data that isn't all online at once or isn't necessarily connected (all of the time)?

I like git annex in theory but i don't love its quirks, i would love to see it reinvented in go or something like that...

does anyone use perkeep?

πŸ‘︎ 3
πŸ’¬︎
πŸ‘€︎ u/alexwagner74
πŸ“…︎ Nov 13 2018
🚨︎ report
Git Annex Assistant: generalized Owncloud? I can't find any reviews git-annex.branchable.com/…
πŸ‘︎ 9
πŸ’¬︎
πŸ‘€︎ u/enfascination
πŸ“…︎ Jan 10 2018
🚨︎ report
Any way to handle big repository with lots of files locally with git-annex?

So i'm trying to set up git-annex with lots of small files (locally of course), mostly between 20-200MBmost of them are text files, but knowing that once you get to 100k files mark, you start to run into problems.

I found https://github.com/ArchiveTeam/IA.BAK which from looking at their implementation, it seems they managed to somehow go around the same problem i'm having. (although i didn't really found a clue on how they split the repo in shard)Is there any command/way to go around this?
EDIT
nvm found it :)
https://git-annex.branchable.com/tips/splitting_a_repository/
if anyone wondered, as i know how it feel to struggle to find an answer.

πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/Nanodragon999
πŸ“…︎ May 10 2019
🚨︎ report
How to setup git-annex with multiple external USB drives? (Details inside)

Hi. I post this here because there is no git-annex specific subreddit.

What I currently have is a number of external (most of them USB) drives, each of them has four git-annex repositories: Music, movies, Books (as in epub/pdf) and images. I managed them by hand until now. The drives are different in size and some of them are too small to keep all data (like 120GB drive, very old and likely to fail, but 500GB of music).

What I want to have: Somehow plug all of these drives into one device (Raspberry or something like this) and let it be managed by git-annex automatically in a way that there are enough copies of each file so that a disk failure does not result in data loss. On my day-to-day devices (as of now a workstation with 2x3 TB data disks where each repository is on both of the drives, two notebooks, one GPD Pocket and one more notebook I'm about to buy in the next few months). On these devices I do not want to keep the data long-term (except for some of the music maybe), some of them do have very small drives (especially the GPD).

The "cluster" does not need to be online all the time and I do not care too much about speed - all important stuff will be on my day-to-day devices anyways, I don't mind if downloading a movie from the Custer takes one minute or fifteen!

My questions:

  1. What device can I use to plug in all these USB drives? I do not want a proper RAID because I think it is too much to set up, also I might add drives and some of the older drives might fail rather soon - not worth the hassle.
  2. How to instruct annex to keep copies on these drives and distribute files over the "cluster" (let's just call it that) automatically?
  3. How to move there from my current setup?

Any advice is welcome. I'll cross-post this to the datahoarders as well.

Thanks for your time.

Edit: Cross posting to datahoarders subreddit didn't work from my app, damn.

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/musicmatze
πŸ“…︎ Jan 22 2019
🚨︎ report
Bring back Git Annex support!

I've used Git LFS up until the point I needed to work off of a bare repository hosted on a USB drive. It was then I realized Git-LFS does not work this way. It requires a direct connection to a server. I've switched to Git Annex which I find works awesome and is more inline with Git philosophy, but now I lose support from Gitlab. GitLab just recently stopped supporting Git Annex, which is too bad since Git LFS seems popular but somewhat inferior (albeit simpler). Any alternatives to GitLab that support Git-Annex?

πŸ‘︎ 12
πŸ’¬︎
πŸ‘€︎ u/BuddhaBit
πŸ“…︎ Dec 07 2017
🚨︎ report
The git-annex Archivist or Nomad?

Does anyone consider themselves to be one of these?

The Archivist > Bob has many drives to archive his data, most of them kept offline, in a safe place. > With git-annex, Bob has a single directory tree that includes all his files, even if their content is being stored offline. He can reorganize his files using that tree, committing new versions to git, without worry about accidentally deleting anything. > When Bob needs access to some files, git-annex can tell him which drive(s) they're on, and easily make them available. Indeed, every drive knows what is on every other drive.

The Nomad > Alice is always on the move, often with her trusty netbook and a small handheld terabyte USB drive, or a smaller USB keydrive. She has a server out there on the net. She stores data, encrypted in the Cloud. > All these things can have different files on them, but Alice no longer has to deal with the tedious process of keeping them manually in sync, or remembering where she put a file. git-annex manages all these data sources as if they were git remotes.

via https://git-annex.branchable.com/

I'd love to hear the details of actual real world workings. The use-cases I find infinitely appealing but when I've tried to implement similar ones, I end up in a mess of files lodged in a repository and/or broken syncs between repositories.

πŸ‘︎ 2
πŸ’¬︎
πŸ‘€︎ u/whenwasyesterday
πŸ“…︎ Dec 20 2016
🚨︎ report
Decentralized Ultra-Library with Git Annex

It seems like every week there's a new decentralized data storage and exchange system. Off the top of my head, I can name IPFS, Patchwork/Secure Scuttlebutt, and plain old Bittorrent, and I know there are more.

After a while, it seems like every decentralized data storage system gets a Git wrapper.

What if we used Git as an abstraction layer over all of these systems, with something like Git Annex? For the use case of storing all the world's content, for example, we could have one master Git repository of all files ever. We would use Git Annex to keep track of where each file can be found across a variety of distributed data storage systems (by storing its web URL, IPFS hash, Bittorrent info hash, SHA256 hash, and whatever other content locators are required for future systems). When someone wants to actually get a file, they can retrieve it from their distributed system of choice (assuming that it is legal for them to do so in their jurisdiction, and that they have obtained any necessary licenses).

Has anyone ever done anything like that before? A decentralized master library, abstracted across content storage and distribution systems? Does it make sense to abstract across storage systems like that?

πŸ‘︎ 7
πŸ’¬︎
πŸ‘€︎ u/interfect
πŸ“…︎ Jul 12 2016
🚨︎ report
How to make a Dropbox clone using git-annex. harlo.github.io/2014/04/1…
πŸ‘︎ 15
πŸ’¬︎
πŸ‘€︎ u/retardo
πŸ“…︎ Apr 17 2014
🚨︎ report
Handling my music collection with git-annex julien.danjou.info/blog/2…
πŸ‘︎ 20
πŸ’¬︎
πŸ‘€︎ u/mariuz
πŸ“…︎ Apr 03 2014
🚨︎ report
Kickstarter for Git-annex Assistant: Like DropBox, but with your own cloud kickstarter.com/projects/…
πŸ‘︎ 18
πŸ’¬︎
πŸ‘€︎ u/the-fritz
πŸ“…︎ May 28 2012
🚨︎ report
What's your experience with Git Annex on Windows? Does it work with git hub?

Title is basically self explanatory, but here's some more context. I'm working on a game project, it requires a lot of large binary files. I've been pointed to a lot of different solutions.

One of these solutions was git annex, which seems promising but I have a few concerns. Specifically, I'm concerned with the state of its windows compilation. Is it safe to use? Is the service stable enough to guarantee that it won't corrupt files or, at the very least, prevent the loss of large amounts of data?

I have a github set up that has a 1gig limit that works very well for code, but would it be possible to set up GitAnnex so that it works with the local server when it comes to large binary files? Do you think that github would need to be dropped in order to use git annex? Can I, at the very least, use github's GUI with git annex once I set up the various settings?

Lastly, are there any good tutorials you can point toward for git annex set up with a centralized server? I've read a bit of the documentation but I still feel a little bit in the dark when it comes to the set up or implementation of the service. This needs to not be a pain for people working with art files.

My backup plan is to have a svn repository hosted on a local server that co-exists with the git repository, but if I could use one VCS, that would be amazing.

πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/TheYokai
πŸ“…︎ Feb 27 2015
🚨︎ report
Sustaining git-annex development campaign.joeyh.name/
πŸ‘︎ 37
πŸ’¬︎
πŸ‘€︎ u/bloodqc
πŸ“…︎ Jul 15 2013
🚨︎ report
git-annex as a Tor hidden service git-annex.branchable.com/…
πŸ‘︎ 4
πŸ’¬︎
πŸ‘€︎ u/liotier
πŸ“…︎ Jan 02 2017
🚨︎ report
Screencast showing git-annex coding in Haskell joeyh.name/screencasts/gi…
πŸ‘︎ 22
πŸ’¬︎
πŸ‘€︎ u/rycee
πŸ“…︎ Jan 10 2013
🚨︎ report
git-annex. backup/sync system based on git git-annex.branchable.com/
πŸ‘︎ 19
πŸ’¬︎
πŸ‘€︎ u/albertzeyer
πŸ“…︎ Nov 01 2012
🚨︎ report
Some questions on using git-annex

I have been reading up on git-annex. I want to use git as a file-sync with history. There are a good number of binary files, though none are all too big (mostly PDFs).

I get that git-annex is good for binaries, but I am confused about a few things

  1. Does it store file history? If so, does that really save me any space since it still needs to keep the full history of any changes?
  2. How limited would I be if I tried to use it without root?
  3. How future-safe is it? Unlike regular git with a solid standardization, there seems to be competing technologies for files. For example, would LFS be better? What happens if annex goes away?
  4. Some machines are sometimes behind a firewall. Any potential issues with that

And the biggest:

  • I intent to keep a fully copy of all files on all machines. Do I even need to use annex? Or can I just use regular git?
πŸ‘︎ 6
πŸ’¬︎
πŸ‘€︎ u/jwink3101
πŸ“…︎ Jan 24 2016
🚨︎ report
git-annex: store large files "in" git without actually storing them in git git-annex.branchable.com/
πŸ‘︎ 26
πŸ’¬︎
πŸ‘€︎ u/greenrd
πŸ“…︎ Dec 08 2010
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.