In this blog post, I cover the use of git bisect and then utilize the findings to remove any trace of the bad code from history
This is a simple tutorial to show how to use bisect to find the first bad commit where a bug was introduced into your code
You'll want to go to this repo and fork it to get things kicked off GITBisectAtTheMovies
This section helps you get set up to work through the activity.
Fork the repo
Fork the repo so you have a copy to work with locally. You'll need to fork to keep commit history.
Clone your fork locally
Get your clone local to your machine
Optional: Set up calls to the IMDB API.
If you want to make live calls to the IMDB API, then you will need to go to this link and create an account. You can get a free account that lets you do 100 calls a day. Pretty sweet!
Run the application
You'll see there is a small problem: DataTables is not working as expected!
In this next section, you'll learn how to use GIT Bisect to find the exact commit where the "bug" was introduced.
.gitignore file so that they wouldn't be unnecessarily added.
To fix this problem, therefore, you just need to find the spot where the layout was changed to add the references, so you can patch that into your project.
Git Bisect can help you find the exact commit.
The first thing you need to do is get
git bisect started, and then you enter the known good commit and bad commit.
Open a terminal to the location of your code.
To get started with
git bisect, type the command
git bisect start
If at any point you need to just quit, you can type
git bisect reset
Make sure you started the bisect and continue to the next step.
Get your commit history
Review your commit history by running the command:
git log --oneline
Look over the commits to try to discern where you think a good commit is (no bug is present) and a commit that is somewhere after the bug was introduced.
The narrower you can get this, the better, but bisect is going to utilize a binary search algorithm to try to find the commit that is bad.
For this, you could use the first and last commits. To narrow this down a bit, it's certain that an error with datatables wouldn't exist before they were integrated (the
added datatables file commit). Likely the code error is after that.
To be safe, commit
0613ee3 should definitely be good, and commit
1c3e223 should be bad.
Use the good commit
To start the first bisect search, enter the good commit
git bisect good 0613ee3
GIT responds with a message:
status: waiting for bad commit, 1 good commit known.
Enter the bad commit
Next, enter the bad commit
git bisect bad 1c3e223
With both a good and a bad commit,
git bisect will then create a
detached head at the commit that you need to test for good or bad status.
Note: The commit currently up for review is 478877f, as shown in the image. Also note the message `Bisecting: 2 revisions left ... (roughly 2 steps).
Review this image with the commit history:
With the binary search in place, the middle commit is in question. We're fortunate we know where to look for the bug. Review the current state of the repo which is checked out to the current commit in a detached state, and see if the files for datatables are improperly referenced in the shared layout:
Since the files are referenced incorrectly, we can confirm this is a bad commit. Additionally, that means the only commits remaining for checking for the first bad commit are earlier in the commit history, so commits 3896bf7 and 6a9688d are all that is left to check. Which one bisect picks will mean one or possibly two more steps.
Confirm the commit is bad
Enter the command to confirm the commit is bad
git bisect bad
Here you can see the next commit checked out wsa 3896bf7. It's likely this one is still bad, but it may be good.
In a larger repo, you may have to do this step by step search a few more times.
Review the repository after the next commit is checked out.
With the next commit checked out, look for the bug once again. It's important to make sure you are seeing the file as it currently stands (so make sure you don't have it in some sort of unsaved state or you might not see the changes).
Since no bad references exist, this is a good commit!
Mark the commit as good
Use the following command to set the commit as good
git bisect good
End the operation
This is weird to me, but with the commit found, this is all bisect can do. Furthermore, bisect didn't end itself, even though there are no more commits to check.
End the process by resetting:
git bisect reset
Great! we found the bad commit. Looking at the commit history, that means that all of these commits have the "bad" reference in them (in order from most recent commit to the first bad commit):
Note: Commit 1eb8721 was added after creation of the blog post. You can ignore it or you can keep it. It's just the change to the readme to contain the blog post.
So how do we "fix" all of these commits without losing code? Do we even need to fix them all?
We don't need to fix them all
Look, this project is small, and there are no versions to support in the past. Clearly, fixing at the last release and just going forward is viable here.
This may not always be the case though. What if you have to fix all the commits and keep things in tact.
What if you were looking for the first time someone committed a secret and you needed to rewrite all the commits from that point on to remove the secret so that your security is not compromised?
Discuss some possible fix strategies
A simple fix could be to just check out a branch, reset back to that commit, fix the stuff, then apply all the other stuff we've done on top of it again with some cherry-pick and merge conflict resolution. Again, that might not be easy if you have hundreds of commits and you have a bug introduced early in the commit history.
Another fix could be to use the command
git filter-branch which allows you to rewrite all your history by mentioning branches to rewrite in the rev-list history. However, check out this documentation directly from https://git-scm.com/docs/git-filter-branch
WARNING git filter-branch has a plethora of pitfalls that can produce non-obvious manglings of the intended history rewrite (and can leave you with little time to investigate such problems since it has such abysmal performance). These safety and performance issues cannot be backward compatibly fixed and as such, its use is not recommended. Please use an alternative history filtering tool such as git filter-repo. If you still need to use git filter-branch, please carefully read SAFETY (and PERFORMANCE) to learn about the land mines of filter-branch, and then vigilantly avoid as many of the hazards listed there as reasonably possible.
A plethora El Guapo? https://youtu.be/b6E682C7Jj4?t=34 Oh yes, you have a plethora!
Ok, so let's not do that.
Alternative tools for history rewrite
Note: Mac users will likely prefer BFG
Unfortunately, BFG and GitRewrite seem more suited for just removing a file from existence, rather than making a simple change to a file and leaving the file. There might be ways to use this tool, but I think the way to fix all these commits is going to be a cherry-pick strategy.
This is going to be a tear-down and rebuild type-of operation.
First, we must checkout the code at the bad commit
Then we fix the code and ammend the commit to remove the bad stuff
Then we cherry pick the remaining commits on top of the existing commit that was just amended
Finally, we checkout main at the last good commit, then rebase or cherry-pick that commit chain in place
We'll finish by force-pushing the commits to fix it all on main
First, let's create a backup branch and have the data safely stored at GitHub so we can blow all the other stuff away and not lose anything if things go horribly wrong. Then we'll get the repo to the first bad commit, and finally, we'll perform the fix and then amend the commit history to make a new commit to replace the bad commit.
First, create a branch to restore just in case and use for cherry-picking
git checkout -b existing-tree-with-bug-jic
If you want to be ultra-safe, push it to GitHub
git push -u origin existing-tree-with-bug-jic
Checkout the bad commit to a new branch
git checkout main git checkout 478877f git checkout -b fix-bad-datatables-refs git log --oneline
Note: You could have done this with a hard reset to the commit id as well. It's the same goal and end result either way.
Open the code
Open the project and fix the bad code:
Amend the commit
To "change" the history from this point on, amend the current commit so that it looks like this was the way the file was created all along (I'm also changing the message to reflect the files are here):
git add GITBisectAtTheMovies/Views/Shared/_Layout.cshtml git commit --amend -m "updated display for movies data and reference datatables css and js files" git log --oneline
Now we have a new commit in the tree but the files are referenced correctly.
Run the project to see that it's working as expected before moving on.
The rest of the commits can be cherry-picked into the current branch, and then each will have to be resolved for the conflict. However, this will destroy all the history and rewrite it so that it will look like the code was always correct.
Get the commit history to find important commits
Assuming you didn't write down commit ids, switch to main and run the
git log --oneline command to see the history
git switch main git log --oneline
Make a note of all the commits above the commit we just changed and the bad commit id, which are (in order of most recent to least recent):
Perform the cherry-pick to get the commits onto the new fix commit in the fix branch
Switch to the fix branch and run the cherry pick command
git switch fix-bad-datatables-refs git cherry-pick 478877f..1eb8721
Note: The bad commit is included in the cherry pick! It's excluded, the first commit we want
e1e8104is the first commit picked, even though the command starts with the bad commit. Also note that once again the images are reflective of the top commit being 35ec697 and not 1eb8721.
Run the code to see that it is working correctly before moving on
The code is working now and the commit history is completely reworked so that the project looks like it was never incorrect
Push all changes to the remote.
git push -u origin fix-bad-datatables-refs
The next step is to rewrite the main branch so that it sets back to the last good commit and has a common ancestor with our changes. We could then pick the changes on to main or we could do a pull request.
A better solution (since this is main) could be to revert all the changes so that the history gets back to the current common good commit and then pick the good commits to the top or rebase them onto that commit.
However, what if this is being done to hide a secret from history forever? Revert is bad. Picking and picking seems bad.
Here is my humble suggestion. You're about to blow main into smithereens. You likely have other developers that have history on main set with the current commits. Also, if they have any feature branches, tell them to check it all in just in case, because those will have to be fixed after this is all said and done.
After getting set, take a minute and reset main back to the good commit, then have everyone on your team hard reset their main branch to that state so that there is a common starting point.
Next, create a pull request and move the changes into main via the PR
Then have everyone update from the new main. They could then fix their feature branches by picking any new commits into a new feature branch based on the new main.
Reset main to the good commit
You've got everyone set, correct? Ok, this is where it really gets dangerous, so have everything backed up if you're worried about making a mistake (that jic branch is still there).
git switch main git reset --hard 3896bf7 git log --oneline
The repo history on main should now be:
Force push main to the remote
Unfortunately, you need to destroy history at the remote to merge a pull request to it.
NOTE: This is again another time to make sure your team is on board with what you are doing. This will destroy all history and cause any branches without a common ancestor to be tricky to fix (not impossible).
Run the command to force the update on main
git push --force-with-lease
Validate history is rewritten at main on remote:
You could still just merge the just in case (jic) branch if you are freaked out at this point, and you haven't lost anything. Also, if this is attached to a fork, you can sync with the fork, so you should not be in an unrecoverable state at this point if things are bad.
Create a pull request for the new changes
With the changes in place and the main reset, create a pull request to update with the new line of code from the fix branch:
Merge the pull request
Merge the pull request and delete the fix branch. If you are certain everything is where you want, you can also delete the jic branch (or you can do that later if you're still not 100% sure or want to wait for the code to be tested before deleting)
Pull the changes to your local repo and run the code to validate everything works.
Get the changes locally
In this walkthrough, we covered Git Bisect and then showed how to utilize the findings to rewrite our history and remove a bug from existence.