Securing Git: Remove Sensitive Information

16 Oct 2023 ⏱️ 5 min
Securing Git: Removing Sensitive Information with BFG

While working on projects, a common issue that many developers encounter at some point is accidentally pushing sensitive information like credentials or API keys, to a Git repository. Though, as a general practice everyone is taught to not push sensitive information to git but mistakes happen - we are only human after all 😅

Good news is that there are couple of ways to rectify this without too much hassle. In this article, we’ll explore the steps to safely remove sensitive information from your Git history while preserving the integrity of your project. So, let’s dive right in!


Why Remove Sensitive Information from Git History?

Accidentally pushing sensitive data to a Git repository can be a significant security risk. It’s crucial to act quickly to remove this information from the repository’s history for several reasons:

  1. Security: Sensitive data, such as passwords, credentials or API keys, should never be accessible to anyone in your repository’s history. Removing them ensures your project’s security.
  2. Compliance: Depending on your industry, there might be compliance requirements that mandate the removal of sensitive data from your version control history. Eg. Credit card numbers in fintech apps.
  3. Reputation: Security is a crucial part of development. Your reputation as a developer can be at stake if you don’t handle sensitive information properly. Demonstrating a commitment to security is essential.

Hence if some sensitive info, API tokens, credentials are pushed - it must be revoked and removed from git history. Also, to clear confusion, just updating code to remove such sensitive info won’t solve the concern since that information can still be accessed via older commits.


Step-by-Step Guide to Remove Git History:

For the scope of this article, we’ll use BFG Repo Cleaner tool. There are other ways to achieve the same.

BFG Repo-Cleaner basically rewrites your repository’s history. Since it rewrites history, this results in changes the SHAs for existing commits that you modify along with other dependent commits. As the SHAs change, this may also affect open pull requests in your repository.

Note: Before proceeding, ensure you have a backup of your repository in case anything goes wrong. These operations can be irreversible.

  1. Create a Backup Branch: Create a new branch in your repository to perform the cleanup without affecting the main branch. This ensures that you can always refer back to the original history if needed.

    cp -R <repo> <repo>.backup
    git checkout -b cleanup-branch
    
  2. Identify Sensitive Data: First, identify the files or commits that contain the sensitive information you want to remove. You can use tools like git log and git blame to trace back to the offending commits and files.

    # create a password.txt
    # add secrets you want to search in your repo separated by newline
    vim passwords.txt
    
    #search for passwords in your repo
    git grep -ni - f passwords.txt
    

    The last command will return all the instances of where passwords matched.

  3. Use BFG Repo-Cleaner: BFG Repo-Cleaner is a powerful tool for removing unwanted files and content from Git repositories. You can install it using:

    # for mac users
    brew install bfg
    

    Then, run the BFG Repo-Cleaner:

    # if you are using the passwords.txt way
    bfg --replace-text passwords.txt .git
    
    # if you already know the files to remove
    bfg --delete-files <filename>
    

    Replace <filename> with the name of the file containing the sensitive information. BFG output will show all the changes done and even display individual commits that would have changed.

    If you look into files that have changed, then you’ll notice all the sensitive information that matched would be replaced with **REMOVED** placeholder.

  4. Rewrite Git History: After using BFG Repo-Cleaner, you’ll need to rewrite your Git history to apply the changes. Use the following commands:

    git reflog expire --expire=now --all && git gc --prune=now --aggressive
    
  5. Force Push: To update your remote repository with the new, cleaned history, you’ll need to force push:

    git push origin cleanup-branch --force
    

    Be cautious when using --force as it overwrites the remote history. Make sure you have the necessary permissions to do this.

  6. Review and Merge: After successfully removing the sensitive information, review the changes in your cleanup branch to ensure everything is in order. If all looks good, you can merge the cleanup branch into your main branch.

  7. Destroy the Cleanup Branch: Once you’ve merged the changes, you can delete the cleanup branch:

    git branch -d cleanup-branch
    

Removing sensitive information from your Git history is a crucial step in maintaining the security and integrity of your project. By following the steps outlined in this guide, you can effectively clean up your repository while preserving your project’s reputation and compliance with security standards.

Remember always to exercise caution when dealing with Git history rewriting, especially when using force pushes. Make sure to communicate any changes to your team and follow best practices to prevent accidental pushes of sensitive data in the future. Happy coding!

Resources

Do explore articles on Golang and System Design. You’ll learn something new 💡


Liked the article? Consider supporting me ☕️

I hope you learned something new. Feel free to suggest improvements ✔️

I share regular updates and resources on Twitter. Let’s connect!

Keep exploring 🔎 Keep learning 🚀

Liked the content? Do support :)

Paypal - Mohit Khare
Buy me a coffee