Change happens…and in software, change is the name of the game—add this feature, fix that bug, deprecate something else, etc. While DBAs and database developers may not usually have to worry about distributed version control such as Git provides, it’s a powerful tool that could really save your bacon…or at least some time and frustration.
The basics of Distributed Version Control Systems
In software development, distributed version control, as a quick definition, is where the full codebase (the repository) and full history is mirrored on every developer’s computer (learn more).
I’m not one of the technical guys at SSG, so here’s a list of advantages, when compared to centralized systems, of distributed version control systems (DVCS) that I’m borrowing from Wikipedia:
- Users can work when not connected to a network.
- Common operations (commits, viewing history, and reverting changes) are faster because there is no communications to central server.
- Allows private work.
- Working copies function as remote backups.
- Allows various development models to be used, such as development branches or a Commander/Lieutenant model.
- Permits centralized control of the “release version.”
- Much easier to create a project fork from a project that is stalled because of leadership conflicts or design disagreements (FOSS software projects).
As for disadvantages:
- Initial repository checkout is slower
- Lack of locking mechanisms
- Individual storage required
- Exposure of code base
So that’s DVCS.
What’s Git?
Git is a specific type of DVCS, and in fact far and away the most popular: As of 2018, 87.2% of respondents to a Stack Overflow annual developer survey reported they were using Git, with the second-place option at only 16.1%. Clearly, developers are seeing the value, and not just because Git is free (the second place software is also free).
From the Git website, unlike other VCS software,
Git doesn’t think of or store its data this way [delta-based version control]. Instead, Git thinks of its data more like a series of snapshots of a miniature filesystem. With Git, every time you commit, or save the state of your project, Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. Git thinks about its data more like a stream of snapshots.
Also from the Git site, “it’s impossible to change the contents of any file or directory without Git knowing about it,” because everything is checksummed. For visual learners, this quick (four minute) video on Git is a good primer of the basics.
What does it have to do with SQL Server? Git is for programmers, but it’s not exclusively so; anyone can use it to track changes in any set of files. Our friend Steve Jones at SQL Server Central has a great getting started post for DBAs (in fact, the start of a series) where he walks his readers through tracking SQL scripts and the ins and outs of making changes. In his words,
“One of the things Git enables is the tracking of history. What happened over time, who changed the files, and when. These are all questions I’ve wanted answered with files and code at different times in my career.”
If that sounds like something you need, distributed version control with Git could be be just what you need.