Git Bisect and History Rewriting

Jan 12, 2023 |
Twitter

In this blog post, I cover the use of git bisect and then utilize the findings to remove any trace of the bad code from history

Git Bisect and Rewrite History

Git Bisect - Find the spot where your bug was introduced

This is a simple tutorial to show how to use bisect to find the first bad commit where a bug was introduced into your code

Find the repo

You'll want to go to this repo and fork it to get things kicked off GITBisectAtTheMovies

Getting Started

This section helps you get set up to work through the activity.

  1. Fork the repo

    Fork the repo so you have a copy to work with locally. You'll need to fork to keep commit history.

  2. Clone your fork locally

    Get your clone local to your machine

  3. Optional: Set up calls to the IMDB API.

    If you want to make live calls to the IMDB API, then you will need to go to this link and create an account. You can get a free account that lets you do 100 calls a day. Pretty sweet!

  4. Run the application

    You'll see there is a small problem: DataTables is not working as expected!

    "DataTables is not working on the site, so no sorting and filtering is possible".

GIT Bisect Tutorial

In this next section, you'll learn how to use GIT Bisect to find the exact commit where the "bug" was introduced.

As a "hint", I'll tell you the problem is your developer referenced the stylesheet and javascript in the wrong place. Instead of fixing the layout, for some reason your developer decided to just put the files in the place where they are referenced, but then added them to the .gitignore file so that they wouldn't be unnecessarily added.

To fix this problem, therefore, you just need to find the spot where the layout was changed to add the references, so you can patch that into your project.

Git Bisect can help you find the exact commit.

Bisect with Good and Bad Commits

The first thing you need to do is get git bisect started, and then you enter the known good commit and bad commit.

  1. Open a terminal to the location of your code.

    To get started with git bisect, type the command

    git bisect start

    "starting and resetting bisect"

    If at any point you need to just quit, you can type

    git bisect reset

    Make sure you started the bisect and continue to the next step.

  2. Get your commit history

    Review your commit history by running the command:

    git log --oneline

    Look over the commits to try to discern where you think a good commit is (no bug is present) and a commit that is somewhere after the bug was introduced.

    The narrower you can get this, the better, but bisect is going to utilize a binary search algorithm to try to find the commit that is bad.

    "The overall history"

    For this, you could use the first and last commits. To narrow this down a bit, it's certain that an error with datatables wouldn't exist before they were integrated (the added datatables file commit). Likely the code error is after that.

    To be safe, commit 0613ee3 should definitely be good, and commit 1c3e223 should be bad.

  3. Use the good commit

    To start the first bisect search, enter the good commit

    git bisect good 0613ee3

    GIT responds with a message: status: waiting for bad commit, 1 good commit known.

    "First Good Commit is entered"

  4. Enter the bad commit

    Next, enter the bad commit 1c3e223.

    git bisect bad 1c3e223

    With both a good and a bad commit, git bisect will then create a detached head at the commit that you need to test for good or bad status.

    "First check commit is in detached head state while bisecting"

    Note: The commit currently up for review is 478877f, as shown in the image. Also note the message `Bisecting: 2 revisions left ... (roughly 2 steps).

    Review this image with the commit history:

    "The commits are shown, and the commit chosen is in the middle of the good and bad commits"

    With the binary search in place, the middle commit is in question. We're fortunate we know where to look for the bug. Review the current state of the repo which is checked out to the current commit in a detached state, and see if the files for datatables are improperly referenced in the shared layout:

    "The file is shown with the two bad references selected"

    Since the files are referenced incorrectly, we can confirm this is a bad commit. Additionally, that means the only commits remaining for checking for the first bad commit are earlier in the commit history, so commits 3896bf7 and 6a9688d are all that is left to check. Which one bisect picks will mean one or possibly two more steps.

  5. Confirm the commit is bad

    Enter the command to confirm the commit is bad

    git bisect bad

    "The bisect bad command was run and the next commit is now checked out for confirmation

    Here you can see the next commit checked out wsa 3896bf7. It's likely this one is still bad, but it may be good.

Continue this process

In a larger repo, you may have to do this step by step search a few more times.

  1. Review the repository after the next commit is checked out.

    With the next commit checked out, look for the bug once again. It's important to make sure you are seeing the file as it currently stands (so make sure you don't have it in some sort of unsaved state or you might not see the changes).

    "The markup is shown and this time there are no bad references"

    Since no bad references exist, this is a good commit!

  2. Mark the commit as good

    Use the following command to set the commit as good

    git bisect good

    "The bad commit is located"

  3. End the operation

    This is weird to me, but with the commit found, this is all bisect can do. Furthermore, bisect didn't end itself, even though there are no more commits to check.

    End the process by resetting:

    git bisect reset

Now What

Great! we found the bad commit. Looking at the commit history, that means that all of these commits have the "bad" reference in them (in order from most recent commit to the first bad commit):

Note: Commit 1eb8721 was added after creation of the blog post. You can ignore it or you can keep it. It's just the change to the readme to contain the blog post.

  • 1eb8721 (the most recent commit)
  • 35ec697
  • ec8833d
  • 1c3e223
  • 6d949b8
  • e1e8104
  • 478877f (the first bad commit)

So how do we "fix" all of these commits without losing code? Do we even need to fix them all?

  1. We don't need to fix them all

    Look, this project is small, and there are no versions to support in the past. Clearly, fixing at the last release and just going forward is viable here.

    This may not always be the case though. What if you have to fix all the commits and keep things in tact.

    What if you were looking for the first time someone committed a secret and you needed to rewrite all the commits from that point on to remove the secret so that your security is not compromised?

  2. Discuss some possible fix strategies

    A simple fix could be to just check out a branch, reset back to that commit, fix the stuff, then apply all the other stuff we've done on top of it again with some cherry-pick and merge conflict resolution. Again, that might not be easy if you have hundreds of commits and you have a bug introduced early in the commit history.

    Another fix could be to use the command git filter-branch which allows you to rewrite all your history by mentioning branches to rewrite in the rev-list history. However, check out this documentation directly from https://git-scm.com/docs/git-filter-branch

    WARNING git filter-branch has a plethora of pitfalls that can produce non-obvious manglings of the intended history rewrite (and can leave you with little time to investigate such problems since it has such abysmal performance). These safety and performance issues cannot be backward compatibly fixed and as such, its use is not recommended. Please use an alternative history filtering tool such as git filter-repo. If you still need to use git filter-branch, please carefully read SAFETY (and PERFORMANCE) to learn about the land mines of filter-branch, and then vigilantly avoid as many of the hazards listed there as reasonably possible.

    A plethora El Guapo? https://youtu.be/b6E682C7Jj4?t=34 Oh yes, you have a plethora!

    Ok, so let's not do that.

  3. Alternative tools for history rewrite

    • BFG: this is the most popular tool and works well. It does require JRE 8 to work, which is a bad day for most Windows users.

    Note: Mac users will likely prefer BFG

    Unfortunately, BFG and GitRewrite seem more suited for just removing a file from existence, rather than making a simple change to a file and leaving the file. There might be ways to use this tool, but I think the way to fix all these commits is going to be a cherry-pick strategy.

Cherry-Pick FTW

This is going to be a tear-down and rebuild type-of operation.

First, we must checkout the code at the bad commit

Then we fix the code and ammend the commit to remove the bad stuff

Then we cherry pick the remaining commits on top of the existing commit that was just amended

Finally, we checkout main at the last good commit, then rebase or cherry-pick that commit chain in place

We'll finish by force-pushing the commits to fix it all on main

Step 1: Get the repo to the first bad commit and fix it

First, let's create a backup branch and have the data safely stored at GitHub so we can blow all the other stuff away and not lose anything if things go horribly wrong. Then we'll get the repo to the first bad commit, and finally, we'll perform the fix and then amend the commit history to make a new commit to replace the bad commit.

  1. First, create a branch to restore just in case and use for cherry-picking

    git checkout -b existing-tree-with-bug-jic

    If you want to be ultra-safe, push it to GitHub

    git push -u origin existing-tree-with-bug-jic

    "making a safe backup branch"

  2. Checkout the bad commit to a new branch

    git checkout main git checkout 478877f git checkout -b fix-bad-datatables-refs git log --oneline

    "the bad commit is the most recent commit on this branch"

    Note: You could have done this with a hard reset to the commit id as well. It's the same goal and end result either way.

  3. Open the code

    Open the project and fix the bad code:

    "replace the references for css and js to the correct folder"

  4. Amend the commit

    To "change" the history from this point on, amend the current commit so that it looks like this was the way the file was created all along (I'm also changing the message to reflect the files are here):

    git add GITBisectAtTheMovies/Views/Shared/_Layout.cshtml git commit --amend -m "updated display for movies data and reference datatables css and js files" git log --oneline

    Now we have a new commit in the tree but the files are referenced correctly.

    "The commit log is shown"

    Run the project to see that it's working as expected before moving on.

    "Datatables is working as expected"

Step 2: Cherry-Pick the good commits on top of the new commit

The rest of the commits can be cherry-picked into the current branch, and then each will have to be resolved for the conflict. However, this will destroy all the history and rewrite it so that it will look like the code was always correct.

  1. Get the commit history to find important commits

    Assuming you didn't write down commit ids, switch to main and run the git log --oneline command to see the history

    git switch main git log --oneline

    Make a note of all the commits above the commit we just changed and the bad commit id, which are (in order of most recent to least recent):

    • 1eb8721
    • 35ec697
    • ec8833d
    • 1c3e223
    • 6d949b8
    • e1e8104
    • 478877f (the bad commit)

    "Getting the history for the record"

  2. Perform the cherry-pick to get the commits onto the new fix commit in the fix branch

    Switch to the fix branch and run the cherry pick command

    git switch fix-bad-datatables-refs git cherry-pick 478877f..1eb8721

    "Cherry-picking the remaining commits"

    Note: The bad commit is included in the cherry pick! It's excluded, the first commit we want e1e8104 is the first commit picked, even though the command starts with the bad commit. Also note that once again the images are reflective of the top commit being 35ec697 and not 1eb8721.

  3. Run the code to see that it is working correctly before moving on

    The code is working now and the commit history is completely reworked so that the project looks like it was never incorrect

    "The filter is set to nolan and 25 movies are possibly shown"

    Push all changes to the remote.

    git push -u origin fix-bad-datatables-refs

Step 3: Rewrite Main

The next step is to rewrite the main branch so that it sets back to the last good commit and has a common ancestor with our changes. We could then pick the changes on to main or we could do a pull request.

A better solution (since this is main) could be to revert all the changes so that the history gets back to the current common good commit and then pick the good commits to the top or rebase them onto that commit.

However, what if this is being done to hide a secret from history forever? Revert is bad. Picking and picking seems bad.

Here is my humble suggestion. You're about to blow main into smithereens. You likely have other developers that have history on main set with the current commits. Also, if they have any feature branches, tell them to check it all in just in case, because those will have to be fixed after this is all said and done.

After getting set, take a minute and reset main back to the good commit, then have everyone on your team hard reset their main branch to that state so that there is a common starting point.

Next, create a pull request and move the changes into main via the PR

Then have everyone update from the new main. They could then fix their feature branches by picking any new commits into a new feature branch based on the new main.

  1. Reset main to the good commit

    You've got everyone set, correct? Ok, this is where it really gets dangerous, so have everything backed up if you're worried about making a mistake (that jic branch is still there).

    git switch main git reset --hard 3896bf7 git log --oneline

    The repo history on main should now be:

    "All the commits from the bad commit on are gone"

  2. Force push main to the remote

    Unfortunately, you need to destroy history at the remote to merge a pull request to it.

    NOTE: This is again another time to make sure your team is on board with what you are doing. This will destroy all history and cause any branches without a common ancestor to be tricky to fix (not impossible).

    Run the command to force the update on main

    git push --force-with-lease

    Validate history is rewritten at main on remote:

    "History is rewritten".

    You could still just merge the just in case (jic) branch if you are freaked out at this point, and you haven't lost anything. Also, if this is attached to a fork, you can sync with the fork, so you should not be in an unrecoverable state at this point if things are bad.

  3. Create a pull request for the new changes

    With the changes in place and the main reset, create a pull request to update with the new line of code from the fix branch:

    "The new changes are ready, and there are no conflicts"

  4. Merge the pull request

    Merge the pull request and delete the fix branch. If you are certain everything is where you want, you can also delete the jic branch (or you can do that later if you're still not 100% sure or want to wait for the code to be tested before deleting)

    "The fix is merged"

  5. Pull the changes to your local repo and run the code to validate everything works.

    Get the changes locally

    "Get the changes locally"

    "It's working, no more bug in history"

Conclusion

In this walkthrough, we covered Git Bisect and then showed how to utilize the findings to rewrite our history and remove a bug from existence.

Categories: : GIT