Removing Sensitive Information from Git
There may be a case that you have a git repository that you wish to make public, or distribute to others. In addition to this, there may be sensitive information on this repository that you do not wish to distribute.
The first step, that I believe is the best procedure, if using GitHub for the process, is illustrated in this post.
I will assume that your original repository is private and hosted on GitHub. Next, I will assume that you want to keep that original repository private and work on it into the future. In this case, the best procedure may be to replicate this repository, and make this newly replicated and cleaned a public one (or easily shareable).
The Tools
You will need git and BFG on your system in order for this to work.
The Procedure
The first step is to replicate your original repository.
Cloning
Some of these instructions are taken from here. There are a number of options that this page allows, for example if your copied repository should keep updated with the original one (the procedure that I will be using is the first one).
- Make a new private git repository that you are going to be using as the “distributable” version.
- Clone a bare copy of that repository:
$ git clone --bare https://github.com/exampleuser/old-repository.git
- Mirror-push to the new repository that you created
$ cd old-repository.git
$ git push --mirror https://github.com/exampleuser/new-repository.git
- Remove the temporary local repository you created.
$ cd ..
$ rm -rf old-repository.git
The next step is to find and replace/remove the sensitive files/information within the repository.
Cleansing
I will be making use of the BFG, which appears to be a better/alternative version to git-filter-branch.
The first step is to download the application, which can be found on the BFG website.
This is a java file, so make sure that you have the Java Runtime Environment installed.
The examples of how to use the program can be found on their website, however, take note that in order to run the application, make use of (take note of the file name that you downloaded):
java -jar bfg.jar
This will be the method on how you will run the bfg command as seen in the examples.
Now that this is out of the way, we can jump into the removal part.
This starts with first cloning the repository that you wish to cleanse:
$ git clone --mirror git://example.com/some-big-repo.git
Now you are ready to start cleaning up.
Passwords and Code
In order to replace text within the codebase, you can make use of a file with a list of the text that you wish to be removed/edited.
The best method to do this is to create a passwords.txt file with a list of the text that you wish to be removed/edited. The following illustrates the different ways this can happen (thanks to this guy, and some info from here):
PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass # replace with 'examplePass' instead
PASSWORD3==> # replace with the empty string
regex:password=\w+==>password= # Replace, using a regex
regex:\r(\n)==>$1 # Replace Windows newlines with Unix newlines
Take note, just like the files, you should have already removed the offending text from the repository, as these pieces of text will also be protected. One can use the command:
egrep -lRZ 'foo' . | xargs -0 -l sed -i -e 's/foo/bar/g'
This replaces ‘foo’ with ‘bar’.
Files
An important note, in the section for files from the BFG site, is that you should first delete your unwanted files and make a commit (and a push) before you run the BFG. The files (in the repo) within the latest commit are protected, which means that the BFG will not be able to delete the unwanted files.
There are cases where you may want to delete an entire folder. This can be done as follows (found here):
$ bfg --delete-folders "{folderA,folderB,folderC}" my-repo.git
Once this is completed, you have to navigate into the git repository.
cd some-repo.git
Then run the following command:
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Once this has completed, you are ready to push the changes to this “ready for public” repository.
This completes the process of cleansing your repository.