A list of puns related to "Git Annex"
I recently began searching for alternatives to sync my files between different devices. During my search I stumbled upon git-annex, which seemingly supports managing different file locations on computers, servers, pendrives etc. It sounds like a really interesting concept, but I haven't been able to find that many tutorials, let alone ones that are up to date.
So my question is: is anyone here using git-annex and what are your experiences with it? And should I even be using it in 2020?
Sorry if this question is out of scope of this subreddit, if so can you point me to a more suited community?
I am looking for gotchas or best practices.
I currently use mergerfs / snapraid and I want to incorporate git-annex.
I plan on making repos on each branch (setup as remotes of each other).
have any of you tried using these 2 tools together?
there is another post suggesting cache.files=partial for sqlite3, I am wondering what other issues I might run into.
honest, since git-annex allows for location tracking across drives (even offline), I may just stop using mergerfs altogether, I am curious if anyone has gone this route also.
Thanks!
Hi! I have been aware of git-annex for quite some time, but didn't really find the need to use it until now.
In the last 2 years I just used restic, backing up files from the laptop to a specific directory and the one from the home pc to another one. Now, I basically run out of space on the laptop, but still want to have everything accessibile on need.
So probably git-annex is the perfect solution for data sync. The cloud solution that seems to fit best is rsync.net. However, I don't know if it possible to save everything encrypted over there.. Does anyone know if it is possible?
Also, as far as my understanding goes, git-annex does not provide any sort of backup solution out of the box. I'm not really sure how to handle this part. Maybe I could use restic from within rsync.net (as they provide ssh access) backup to gdrive or something.
But I'm sure there is a better approach.. Doing as I said is giving complete trust to rsync.net. If they, say, get hacked, I may lose access to both the git-annex repo and the backup (being restic set up there). Maybe there is a way to mount the rsync.net git-annex repo on my laptop and then use restic locally...
How do you handle this? Is there an effectively good method?
I have two machines with 4x5TB. I heard that git annex is great tool to manage my data. Nowadays i keep all 4 drives pooled together using MergerFS and i rsync data from my master to slave.
There is a risk that i will not spot that some files are missing in time, so i wanted to secure myself from such loss using git annex. The idea is that it allows me to manage my files from laptop, which have just 1TB of storage space. The problem is that git annex does not work over MergerFS.
First i was thinking that's not big deal, so i created one repo on my laptop, then one on every drive per machine - that's 9 repos. But now i can not guarantee that numcopy 2 setting will secure my data - because both copies can land on the same machine.
Is there a way to tell git annex that 4 repos are on one machine? If not, how should i handle this problem? If yes, please, tell me how?
Maybe git annex is not good solution for my problem? I want my files easy to manage, but being replicated to both locations. I want add smaller nodes in future, so my storage will not be even. The smaller nodes will be outside my lan - most likely connected to gether by mobile network with capped bandwidth - most likely created into Virtual Lan using WireGuard.
Hi,
I've started tracking files using git annex on my Linux laptop, several Linux PCs, and removable disks.
But I have a problem with tracking outside of git annex's working directory like DVD-R or on mounted CIFS/SMB storage where I can't move files or rename them.
I would like to track the same file exists at least in three copies e.g.:
/mount/shared-linux-cifs-server1/path1/FILE1.zip
/mount/windows-10-laptop/path2/FILE1.zip
/mount/DVD-26/path3/FILE1.zip
/mount/external-hdd/path4/FILE1.zip
Also, it would be good to have checksum verification.
Which tool could you recommend to run on Linux?
Thanks
Hello fellow data hoarders.
I have a lot of hard drives and backups, mostly because i am keeping my stuff for over 25 years of digital work, and i have a lot of media files (photos, texts and other stuff) that i need (and want to) keep track of.
I recently stumbled across git-annex, and i am trying to use it in combination with bup to be able to work through older harddrives (or other media) of mine and to archive it.
Anyone else here using bup and git-annex, and can give a bit of insight into setup and usage?
thanks!
https://git-annex.branchable.com/
I'm just trying to get a feel for how useful it would be. If anybody can share their experiences with it, it would be much appreciated. Thanks!
I'm an avid git-annex user (a tool that extends git to conveniently store large files and track their locations) and have been eyeing Filecoin since it was announced. I have ~10TB of files in a handful of annexes which are primarily stored on a NAS, with encrypted backups on Google Drive using rclone (with unlimited free storage through my university).
Since git-annex supports IPFS as a storage backend (and I do have a few files in my annexes with copies available through IPFS, although the availability can't be trusted) I was wondering if anyone else has been thinking of using Filecoin for cheap and decentralized storage with trusted availability.
A quick search gave no results, but I read briefly about lotus and am curious how hard it would be to write a special remote wrapper script that let's git-annex use it as a backend.
So, a few questions:
It'd be amazing if I could make this work, but I appreciate that I'm early here so not expecting a solution, just curious as to people's general thoughts on the matter.
I really do like some of the features, I tried it years ago and didn't love some of the bugs at the time, but I was considering giving it another chance... do any of you have any experience with it? (or alternatives such as perkeep, etc).
is haskell a pro or a con? seems like most other modern projects are either python or golang (or c or java /barf)... I haven't seen many haskell projects that I know of.
DVC (Data Science Version Control) trumps these alternatives in the following key ways:
git-lfs is bound to a single upstream, e.g. github/bitbucket/gitlab β these require special servers that are limited in terms of storage space, even if you run them on premisis.
GitHub currently enforces a 2 GiB size limit per-object, even with LFS
On GitHub, beyond 1GB, you have to pay extra
git-annex is more flexible, but more challenging to work with and configure
And both git-lfs and git-annex suffer from using Gitβs smudge and clean filters to show the real file on checkout. Git only stores that small text file and does so efficiently. The downside, of course, is that large files are not version controlled: only the latest version of a file is kept in the repository.
Whilst DVC doesnβt drop into projects as easily as the above options, it does offer improvements on the limitations of those tools. Furthermore, DVC offers key features (pipelines and reproducibility) which those alternatives do not include at all: First Impressions of Data Science Version Control (DVC) - How Does DVC Compare to Alternatives?
Has anyone tried installing git-annex or redshift on arch arm?
I have Asus C101P that I now use as my substitute pc while I am trying to repair my dell m6700.
I use arch with lxde most of the time instead of chrome os. Quite happy about the experience so far. The headphones are not recognised, though, but it's not a deal-breaker for me.
However, I have two minor issues:
- I cannot install git-annex. Although the package is listed in community repo (see https://archlinuxarm.org/packages/armv7h/git-annex ), running pacman -S git-annex only returns 'target not found'
- redshift installed with no complaints but I cannot get it working (and yes, I have my location and light temperature parameters set in the config). The process starts but nothing happens (I have tried both redshift and redshift-gtk).
TIA
Seems pretty cool. I haven't wrapped my head around git-annex yet, but this seems really useful for hoarders: https://git-annex.branchable.com/tips/using_the_web_as_a_special_remote/
It seems people are using it for, among other things, podcasts.
When ML models need to be regularly updated in production, a host of challenges emerges. Paramount among ML reproducibility concerns are the following:
No one tool can do it all for you - organizations using a mix of Git, Makefiles, ad hoc scripts and reference files for reproducibility. The following overview explains how DVC enters this mix offering a cleaner solution, specifically targeting data science challenges: First Impressions of Data Science Version Control (DVC)
are there any other solutions that will help you keep track of data that isn't all online at once or isn't necessarily connected (all of the time)?
I like git annex in theory but i don't love its quirks, i would love to see it reinvented in go or something like that...
does anyone use perkeep?
So i'm trying to set up git-annex with lots of small files (locally of course), mostly between 20-200MBmost of them are text files, but knowing that once you get to 100k files mark, you start to run into problems.
I found https://github.com/ArchiveTeam/IA.BAK which from looking at their implementation, it seems they managed to somehow go around the same problem i'm having. (although i didn't really found a clue on how they split the repo in shard)Is there any command/way to go around this?
EDIT
nvm found it :)
https://git-annex.branchable.com/tips/splitting_a_repository/
if anyone wondered, as i know how it feel to struggle to find an answer.
Hi. I post this here because there is no git-annex specific subreddit.
What I currently have is a number of external (most of them USB) drives, each of them has four git-annex repositories: Music, movies, Books (as in epub/pdf) and images. I managed them by hand until now. The drives are different in size and some of them are too small to keep all data (like 120GB drive, very old and likely to fail, but 500GB of music).
What I want to have: Somehow plug all of these drives into one device (Raspberry or something like this) and let it be managed by git-annex automatically in a way that there are enough copies of each file so that a disk failure does not result in data loss. On my day-to-day devices (as of now a workstation with 2x3 TB data disks where each repository is on both of the drives, two notebooks, one GPD Pocket and one more notebook I'm about to buy in the next few months). On these devices I do not want to keep the data long-term (except for some of the music maybe), some of them do have very small drives (especially the GPD).
The "cluster" does not need to be online all the time and I do not care too much about speed - all important stuff will be on my day-to-day devices anyways, I don't mind if downloading a movie from the Custer takes one minute or fifteen!
My questions:
Any advice is welcome. I'll cross-post this to the datahoarders as well.
Thanks for your time.
Edit: Cross posting to datahoarders subreddit didn't work from my app, damn.
I've used Git LFS up until the point I needed to work off of a bare repository hosted on a USB drive. It was then I realized Git-LFS does not work this way. It requires a direct connection to a server. I've switched to Git Annex which I find works awesome and is more inline with Git philosophy, but now I lose support from Gitlab. GitLab just recently stopped supporting Git Annex, which is too bad since Git LFS seems popular but somewhat inferior (albeit simpler). Any alternatives to GitLab that support Git-Annex?
Does anyone consider themselves to be one of these?
The Archivist > Bob has many drives to archive his data, most of them kept offline, in a safe place. > With git-annex, Bob has a single directory tree that includes all his files, even if their content is being stored offline. He can reorganize his files using that tree, committing new versions to git, without worry about accidentally deleting anything. > When Bob needs access to some files, git-annex can tell him which drive(s) they're on, and easily make them available. Indeed, every drive knows what is on every other drive.
The Nomad > Alice is always on the move, often with her trusty netbook and a small handheld terabyte USB drive, or a smaller USB keydrive. She has a server out there on the net. She stores data, encrypted in the Cloud. > All these things can have different files on them, but Alice no longer has to deal with the tedious process of keeping them manually in sync, or remembering where she put a file. git-annex manages all these data sources as if they were git remotes.
via https://git-annex.branchable.com/
I'd love to hear the details of actual real world workings. The use-cases I find infinitely appealing but when I've tried to implement similar ones, I end up in a mess of files lodged in a repository and/or broken syncs between repositories.
It seems like every week there's a new decentralized data storage and exchange system. Off the top of my head, I can name IPFS, Patchwork/Secure Scuttlebutt, and plain old Bittorrent, and I know there are more.
After a while, it seems like every decentralized data storage system gets a Git wrapper.
What if we used Git as an abstraction layer over all of these systems, with something like Git Annex? For the use case of storing all the world's content, for example, we could have one master Git repository of all files ever. We would use Git Annex to keep track of where each file can be found across a variety of distributed data storage systems (by storing its web URL, IPFS hash, Bittorrent info hash, SHA256 hash, and whatever other content locators are required for future systems). When someone wants to actually get a file, they can retrieve it from their distributed system of choice (assuming that it is legal for them to do so in their jurisdiction, and that they have obtained any necessary licenses).
Has anyone ever done anything like that before? A decentralized master library, abstracted across content storage and distribution systems? Does it make sense to abstract across storage systems like that?
Title is basically self explanatory, but here's some more context. I'm working on a game project, it requires a lot of large binary files. I've been pointed to a lot of different solutions.
One of these solutions was git annex, which seems promising but I have a few concerns. Specifically, I'm concerned with the state of its windows compilation. Is it safe to use? Is the service stable enough to guarantee that it won't corrupt files or, at the very least, prevent the loss of large amounts of data?
I have a github set up that has a 1gig limit that works very well for code, but would it be possible to set up GitAnnex so that it works with the local server when it comes to large binary files? Do you think that github would need to be dropped in order to use git annex? Can I, at the very least, use github's GUI with git annex once I set up the various settings?
Lastly, are there any good tutorials you can point toward for git annex set up with a centralized server? I've read a bit of the documentation but I still feel a little bit in the dark when it comes to the set up or implementation of the service. This needs to not be a pain for people working with art files.
My backup plan is to have a svn repository hosted on a local server that co-exists with the git repository, but if I could use one VCS, that would be amazing.
I have been reading up on git-annex. I want to use git as a file-sync with history. There are a good number of binary files, though none are all too big (mostly PDFs).
I get that git-annex is good for binaries, but I am confused about a few things
And the biggest:
Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.