Monday, May 6, 2013

How to bridge git to clearcase

http://stackoverflow.com/questions/2342131/how-to-bridge-git-to-clearcase

I've recently used git svn and enjoyed it very much. Now I'm starting a new project at a different customer. At that site the SCM of choice is ClearCase. I haven't found a baked equivalent of git svnfor ClearCase. Is there anybody who has tried to use git locally as a front-end to ClearCase using some tricks, configuration or scripting with any measure of success? If so can you please explain the method used?

Answers:
Here's a method that avoids hijacks, which our team used this method quite successfully for over a year, until we retired ClearCase for Subversion (per company policy, although it is a backwards step for our team - we were basically just using ClearCase as a dumb file system, and virtually working natively in git, but now we're using the git-svn bridge which isn't as nice as native git.)
We used two directories, one for the ClearCase snapshot and master git repo, which we shared among the whole team and never edited files in, and one for our "working" directory.
The preparation in the ClearCase snapshot view is:
% git init
% git add **/*.cxx **/*.h **/Makefile (and so on)
% git commit -m "initial"
Then clone in your working directory:
% mkdir ~/work
% git clone /path/to/repo
Work in the working directory, on a branch:
% git checkout -b feature
% ...edit/compile...
% git add -u
% git commit
Make sure the ClearCase snapshot is up-to-date with pristine (which it always was for us, because we shared it among the team, and we all used git).
Then merge the branch onto the master by rebasing it, to avoid an automatic merge commit:
% git checkout master
% git pull
% git checkout feature
% git rebase master
% git checkout master
% git merge feature
% git branch -d feature

% git diff --name-status origin/master
Prepare the ClearCase view by checking out/mkelem/rmname any changed/new/removed files, based off the output of git diff --name-status. We used a hand-rolled script to do this. Don't forget to check out any directories that have added/removed files:
Then push the git stuff back, and check in with ClearCase:
% git push
% cd /path/to/repo
% git reset --hard
% cleartool ci `cleartool lsco -r -short -me`
It seems like a lot of commands, but this includes setup, and your daily workflow doesn't use many commands. You can trivially build a script around the push-back-to-ClearCase step, and discover/show your team all the cool extra git stuff gradually as everyone gets used to the basic workflow.
The real beauty of this system is, after a while when everyone's competent with git, you can trivially ditch ClearCase and all the associated individual monkey work and fees. Maybe give the company's ClearCase guy a much needed holiday and some retraining with the savings. (Sadly at my company the git stuff was all skunkworks, and we've moved to Subversion - forwards from ClearCase but backwards from git!)
strongly recommend you use the pristine script from ClearCase Globally, Git Locally, which runs in the ClearCase snapshot view and ensures it and git are in sync. We set this up as a cron job that ran twice daily, and also ran it manually whenever we were about to push back to git. Unfortunately the link to the blog post is no longer valid. However the script is still available on Github.

1 
Another advantage to this system is some team members can still keep using ClearCase if they want. It is a bit more fiddly while that's happening, as the git users will need to keep things in sync when a non-git user checks in. Eventually the hold-outs will see the advantages the git users have though, and this problem will disappear! – Matt Curtis Feb 26 '10 at 22:37
1 
Also, our push-back-to-ClearCase script generated a comment for ClearCase from the git log of changes since origin/master, which it used with "cleartool co -c" and so on, so our cleartool commit didn't need a comment at all! – Matt Curtis Feb 26 '10 at 22:40
2 
Interesting technique, +1. As the "ClearCase guy" in my shop, I could use the holidays;) But I am also the SVN guy. And Git Guy. And a bit Perforce and CM Synergy guy... No vacations for me, then. – VonC Feb 26 '10 at 23:52
2 
Part of it was accidental, because (PLEASE don't ask) we all worked in a shared CC static view. Ugh! Initially, our git solution was just a way of managing our own private working directories, but as a whole it worked pretty well because the shared directory gave us a natural place for our "master" git repo, and also a single central place to make sure git was in sync with CC. – Matt Curtis Feb 27 '10 at 0:14




While it may not be without a few warts (you have been warned), I feel I should mention I have written a bridge of sorts.
Bridging between the two systems isn't easy, and I wish my solution was as half as good as git-svn. A big limitation is that you're really confined to mirroring a single stream; git-cc can't clone all your Clearcase branches (as nice as that would be). However, given that most of the alternative scripts resolve around a single Clearcase view you are no worse off (IMO).
Personally I find history quite important and what other solutions lack is their importing of history into Git. Being able to run git-blame on files and see their real authors is quite useful from time-to-time.
If nothing else git-cc can handle the aforementioned 'git log --name-status' step in Matt's solution above.
I'm certainly curious to hear what VonC and Matt (and others) think of this, as I agree that any bridge to Clearcase is fraught with difficulties and may be more trouble than it's worth.

The one process I usually follow is:
  • snapshot cd within a ClearCase view/vobs/myComponent
  • git init .
That allows me to consider a ClearCase component as a Git repo.
I can then do all the branching and "private" commits I want within that repo, making the file writable as I need them (possible within a snapshot view).
Once I have a stable final commit, I update my snapshot view, which list all the "hijacked" file: I checkout them and check-in them back to ClearCase.
Considering the Git limits, a repo per ClearCase (UCM) component is the right size for a Git repo.
See also What are the basic clearcase concepts every developer should know? for a comparison between ClearCase and Git.
The idea remains:
  • no git-cc
  • no need to import all the history of ClearCase (which has no notion of repository baseline, unlike the SVN revisions)
  • creation of a Git repo within a ClearCase view for intermediate commits
  • final Git commit mirrored in the ClearCase view through a checkin of all modified files.








Subversion to Git


1. Subversion to Git
http://john.albin.net/git/convert-subversion-to-git


complete guide to git-svn conversions

Our goal is to do a complete conversion of our Subversion repository and end up with a bare Git repository acceptable for sharing with others (privately or publicly). Bare repositories are ones without a local working checkout of the files available for modifications. They are the recommended format for shared repositories.

1. Retrieve a list of all Subversion committers

Subversion simply lists the username for each commit. Git’s commits have much richer data, but at its simplest, the commit author needs to have a name and email listed. By default the git-svn tool will just list the SVN username in both the author and email fields. But with a little bit of work, you can create a list of all SVN users and what their corresponding Git name and emails are. This list can be used by git-svn to transform plain svn usernames into proper Git committers.
From the root of your local Subversion checkout, run this command:
svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors-transform.txt
That will grab all the log messages, pluck out the usernames, eliminate any duplicate usernames, sort the usernames and place them into a “authors-transform.txt” file. Now edit each line in the file. For example, convert:
jwilkins = jwilkins <jwilkins>
into this:
jwilkins = John Albin Wilkins <johnalbin@example.com>

2. Clone the Subversion repository using git-svn

git svn clone [SVN repo URL] --no-metadata -A authors-transform.txt --stdlayout ~/temp
This will do the standard git-svn transformation (using the authors-transform.txt file you created in step 1) and place the git repository in the “~/temp” folder inside your home directory.

3. Convert svn:ignore properties to .gitignore

If your svn repo was using svn:ignore properties, you can easily convert this to a .gitignore file using:
cd ~/temp
git svn show-ignore > .gitignore
git add .gitignore
git commit -m 'Convert svn:ignore properties to .gitignore.'

4. Push repository to a bare git repository

First, create a bare repository and make its default branch match svn’s “trunk” branch name.
git init --bare ~/new-bare.git
cd ~/new-bare.git
git symbolic-ref HEAD refs/heads/trunk
Then push the temp repository to the new bare repository.
cd ~/temp
git remote add bare ~/new-bare.git
git config remote.bare.push 'refs/remotes/*:refs/heads/*'
git push bare
You can now safely delete the ~/temp repository.

5. Rename “trunk” branch to “master”

Your main development branch will be named “trunk” which matches the name it was in Subversion. You’ll want to rename it to Git’s standard “master” branch using:
cd ~/new-bare.git
git branch -m trunk master

6. Clean up branches and tags

git-svn makes all of Subversions tags into very-short branches in Git of the form “tags/name”. You’ll want to convert all those branches into actual Git tags using:
cd ~/new-bare.git
git for-each-ref --format='%(refname)' refs/heads/tags |
cut -d / -f 4 |
while read ref
do
  git tag "$ref" "refs/heads/tags/$ref";
  git branch -D "tags/$ref";
done
This step will take a bit of typing. :-) But, don’t worry; your unix shell will provide a > secondary promptfor the extra-long command that starts with git for-each-ref.

7. Drink

If you’ve got just the one Subversion repo to convert…Congratulations! You’re done. Go party. Just take your “new-bare.git” folder and share it.
If, on the other hand, you’ve got a bunch of Subversion repositories to convert, you’ve got a long, long night in front of you if you want to convert them all by hand. You’re going to need a drink (or several).
Since I had 141 svn repositories that needed to be converted, I wrote a set of wrapper scripts to ease the work… which I’ll discuss in my next blog post.

Comments

Pro Tip for Windows users: Having been through this recently myself, don't bother with git-svn on Windows, instead get yourself a Linux VM and VMware Player and do your conversion on that. The scraping from Subversion ran about 10 times faster for me than running it "natively" on Windows and I had none of the quirks that I was finding with git-svn on Windows.
Hi John, I also didn't find the existing guides totally satisfactory, so I wrote my own, here:http://ao2.it/wiki/How_to_migrate_an_SVN_repository_to_Git
As you can see my use case involved gitosis for the repositories administration, and I had an unusual layout to convert too, but the base is the same after all.
The svn:ignore bits are interesting, I think I'll add that to my wiki, if you don't anticipate me :)
Regards,
Antonio
Switch and Drop Legacy? Or, you could do as I do and drop your past SVN history.
Not the best solution, of course, but you can keep SVN running somewhere if you need to go back in time. However, I picked a good point where development was at a slowdown, scrapped SVN, and set everything up in a fresh Git repository :)
Oops. I just noticed the awk command in step 1 is slightly off. If you have a space character in your SVN username (for example "(no author)", it will only include the part of the username before the space. This is the proper awk command:

svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors-transform.txt
I’ve corrected the article above.
Thanks for your post, it helped a lot in understanding what is going on and what to do.
Can anyone explain what --no-metadata does exactly? I've read that it excludes some info during transfers, but did not find what exactly.
Sometimes it helps to read the documentation. It is not recommended to use --no-metadata, even for one-way imports:
This gets rid of the git-svn-id: lines at the end of every commit.
This option can only be used for one-shot imports as git svn will not be able to fetch again without metadata. Additionally, if you lose your .git/svn/*/.rev_map. files, git svn will not be able to rebuild them.
The git svn log command will not work on repositories using this, either. Using this conflicts with the useSvmProps option for (hopefully) obvious reasons.
This option is NOT recommended as it makes it difficult to track down old references to SVN revision numbers in existing documentation, bug reports and archives. If you plan to eventually migrate from SVN to git and are certain about dropping SVN history, consider git-filter-branch(1) instead. filter-branch also allows reformating of metadata for ease-of-reading and rewriting authorship info for non-"svn.authorsFile" users."
@Balu
You need to re-read my blog post and the git-svn docs carefully, because you've somehow misread them.
The docs about --no-metadata you quoted directly say “This option can only be used for one-shot imports”. [emphasis mine] One shot imports are precisely the point of this blog post. I fully expect you to toss the svn repo in the bin after doing this conversion to git.
This option is NOT recommended as it makes it difficult to track down old references to SVN revision numbers in existing documentation, bug reports and archives.
That is actually a good point. But svn commit numbers are not something that I personally needed to preserve. For those of you who do need to retain svn commit numbers, I recommend following Balu’s advice.
hi, thanks for your guide. You may want to add, that it works from git 1.7, not from git 1.6, where, e.g., "git init" has no directory argument yet.
@Valery: Good point about Git 1.6. But I consider any version of Git below 1.7 barbaric. ;-)
Wouldn't it be better to actually tag the second to last commit in the tag: "refs/heads/tags/$ref"^ ? Otherwise, we're tagging the commit that says "Tagging for X version". The difference being that in Git the tags themselves are not commits and we'd then be able to see the tags in tools like GitX when looking at the branch history from where the tag was made.
Otherwise, we're tagging the commit that says "Tagging for X version".
Unfortunately, because of the way Subversion works, you can make changes while making a tag. So an svn tag may contain a changeset in addition to the tag name. :-p
That's why the underlying git-svn command makes a git tag where it does.
Hey John.
I converted 3 svn repos to git last night, to join the rest of the repos. I noticed that in step 3, I created the .gitignore file, however when I push to the -bare.git in step 4, this commit isn't pushed as well. You have any ideas what might be up?
(What I ended up doing is, in leu of drinking, was pulling from the bare, commiting the .gitignore, pushing back to -bare.git, then push --mirror to the central server.)
@Terin
Step 3 works fine. Did you miss the command to add and commit the .gitignore file to the ~/temp repository?
I got this error when tryng to create the .gitignore file.
$ git svn show-ignore > .gitignore
config --get svn-remote.svn.fetch :refs/remotes/git-svn$: command returned error: 1
when i added -i trunk , it worked fine. maybe this can help other if they have the same problem.
L ars
Thanks Lars D! I had the same error and your addition fixed it. The command that worked for me was:
git svn show-ignore -i trunk > .gitignore
Awesome, thanks John!
Thanks so much for this very useful article. I used it to successfully convert two large repos from Subversion to Git.
I ended up leaving off the --ignore-metadata as we have references in our BTS and other systems to the Subversion revision.
Also, I added the --shared option to init the bare repo, as this sets up the correct permissions for a repo shared within a group.
Again, thanks!
Please, note that the "while-do" cycle in step 6, as it is currently written, will only work if you're using a bash shell; you'll get a syntax error otherwise.
svn2git probably works great for most repos, but it didn't for me. When it finished I noticed I was missing several recent commits (and the changes from them!). I didn't investigate to see how deeply the problem went, I just used John's git-svn-migrate, which worked like a charm.
I see in your git-svn-migrate.sh script you have added another line that pushes the .gitignore commit. I had the same problem as Terin until I found that I had to do this after the git push bare command
git push bare master:trunk
Hey,
Could you perhaps explain why you need the bare repo step ? I found another article which did the same, and they used yours as a reference … what difference does it make, if I just add a remote to the "temporary" repo after conversion and cleanup ? Why would I need an intermediary one ?
Thanks !
Hi Greg!
Basically, the issue is that git-svn creates a lot of overhead in order to maintain the "svn-ness" of the repository. By pushing just the “refs/remotes/*:refs/heads/*” references to a bare repository, you end up purging all of the svn remnants and having a cleaner repository.
Hey thanks John.
Right, I hadn't realized you were doing a "selective" push there. I am using git-svn-abandon for cleaning up, I assume it does something similar, if not the exact same :)
Oh and one more thing; the svn:ignore > .gitignore conversion - am I missing something (again) or shouldn't you be doing this on all (or some) of your branches instead just master ?
Thanks for these great instructions, they worked for me when migrating from a beanstalk svn repo to a bitbucket git repo.
I followed your instructions and got the following error with "git push bare":
" No refs in common and none specified; doing nothing.
Perhaps you should specify a branch such as 'master'.
fatal: The remote end hung up unexpectedly
error: failed to push some refs to '/Users/bodirsky/new-bare.git' "
Any ideas?
Thank you for this article, it saved me a couple hours of my life.
Thanks John for this great tutorial. It was a great help, and i only hat to do some minor changes on this workflow, i.e. a new latest-svn tag.
But also I have problems with the .gitignore file which does not go into the repo (or into the correct branch).
Here you see what I did and that everything completed without error, but in the final ls there is just no .gitignore: http://pastebin.com/VhthD4VN
This is really AWESOME! Works like a charm!
Thanks a lot for the great tutorial.
FYI, only Windows this command does not work:
git config remote.bare.push 'refs/remotes/*:refs/heads/*'
It works if you remove the single quotes:
git config remote.bare.push refs/remotes/*:refs/heads/*
Would be fantastic if there were Windows methods of getting the username mappings and cleaning up branches and tags. Otherwise, thanks for the writeup!
Hi John,
Great article, it was a massive help when converting my old svn repos into git. It ran without any modifications on Cygwin. I just thought I'd mention that I got the following error when converting one of my svn repo to git:
'fatal: refs/remotes/trunk: not a valid SHA1'.
It happened in srep 2. The problem turned out the be the fact that my SVN repo wasn't standard as the root directory wasn't trunk but was a custom name. This confused git and it didn't know where the master branch needed to be. I fixed this problem by passing in the parameter --trunk= in step 2 where is the directory that is your primary branch.
Hope that helps someone!
I did the tag conversion like so:
git branch -ar | grep '^ tags/' | sed -r 's|^\s*tags/(\S*)\s*$|git checkout tags/\1 \&\& git tag \1|' | sh
git-svn also found some branches, which I took care of here:
sed -r 's|^\s*tags/(\S*)\s*$|git checkout tags/\1 \&\& git tag \1|'
You could also add something like this to the end of your migrate script to push repos to github, changing REPO to the placeholder for the current name:
#curl -u 'USER:PASS' https://api.github.com/user/repos -d '{"name":"REPO"}'
#git remote add origin git@github.com:USER/REPO.git
#git push origin master
#git push --all
#git push --tags


----------------------------------------------------------

A complete idiot’s guide to git-svn-migrate

3 steps to batch convert Subversion to Git

If you read my previous post about converting Subversion repositories to git, you’ll know that to do a proper Subversion-to-Git transformation on a batch of repositories is going to take some time (what with all that command line typing). I had 142 legacy project Subversion repositories lying around I wanted converted to Git and, since I’m lazy, I pulled on my bash boots and wrote me a script to do the work!
With the git-svn-migrate scripts I wrote, you can batch convert all of your Subversion repositories in just 3 steps. And I’ve GPLed them and put them on GitHub if you’d like to collaborate and improve them; see the git-svn-migrate project page.
svn boxes go into the factory; git ponies come out.
git-svn-migrate: a reverse glue factory

0. Download the git-svn-migrate scripts

This isn’t really one of the 3 steps, but obviously you need the scripts. You can either download thelatest official release from GitHub or you can get the most recent development release by cloning the repository:
git clone git://github.com/JohnAlbin/git-svn-migrate.git

1. Create a list of Subversion repositories to convert

Create a file called “repository-list.txt” with one Subversion URL per line:
svn+ssh://example.org/svn/awesomeProject
file:///svn/secretProject
https://example.com/svn/evilProject
With this format the name of the project is assumed to be the last part of the URL. So these repostitories would be converted into awesomeProject.git, secretProject.git and evilProject.git, respectively.
If the project name of your repository is not the last part of the URL, or you wish to have more control over the final name of the Git repository, you can specify the repository list in tab-delimited format with the first field being the name to give the Git repository and the second field being the URL of the Subversion repository:
awesomeProject    svn+ssh://example.org/svn/awesomeProject/repo
evilproject     file:///svn/evilProject
notthedroidsyourlookingfor  https://example.com/svn/secretProject
With this format you can use any name for the final Git repo. In the first example above, we’re using the second-to-last part of the URL instead of the last part of the URL. In the second example, we’re just changing the name to all lowercase (recommended). And in the final example, move along. Move along.

2. Create a list of transformations for Subversion usernames to Git committers

Using the repository list created in step 1, run the fetch-svn-author.sh script to create a list of unique usernames for all the commits in your repositories. The output of the script should be redirected to a file.
./fetch-svn-author.sh --url-file=repository-list.txt > author-transform.txt
Edit the raw list of Subverions usernames to provide full names and emails suitable for Git committers. The output of the fetch-svn-author.sh script will be of the form:
username = username <username>
You should edit each line to be:
username = Full name <email>
For example, change:
jwilkins = jwilkins <jwilkins>
into:
jwilkins = John Albin Wilkins <john@example.org>

3. Convert the Subverion repositories into bare Git repositories

This is the easiest step. To place all of your new bare Git repositories in /var/git, simply run:
./git-svn-migrate.sh --url-file=repository-list.txt --authors-file=author-transform.txt /var/git
This may take a while. (My 142 repos took about 6 hours to convert.) But you’ll see the progress as the underlying git-svn pulls commits out of all of your Subversion repositories.
Enjoy!

Comments

I just started working with git. Though, it is pretty safe against corruption and all, it is somewhat hard to follow. Git svn migration is painful when you are not thorough with git in the first place. But your three step conversion is totally idiot proof. A life saver it is. Even though it took me quiet a few days to figure it out completely, in the end it was worth it. And hey where can I find an idiot proof tutorial to git?! Mark
This graphic is fantastic!
I can't thank you enough for these scripts. After several unsuccessful tries w/ svn2git (which lost commits!), git-svn-migrate did the job of converting the Minify SVN repo nicely.
I had problems using this with an HTTPS-based SVN repo, as git-svn doesn't support a non-interactive password prompt. Problem was solved by making a small script which simply does: echo 'password', and running as follows:
GIT_ASKPASS=$PWD/askpass ./git-svn-migrate.sh --url-file=repolist.txt --authors-file=userlist.txt /var/git
Downside is it means (at least temporarily) storing your password on disk, but you can always shred the file later.
I run into a problem where I got following output:
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions
It happened when git-svn-migrate.sh issued 'git svn show-ignore --id trunk >> .gitignore;' command.
I don't know svn at all, and it took me some time to figure out that my repo url was wrong... I got that url from pjproject (http://www.pjsip.org/download.htm). Removing the /trunk prefix made my day. If anyone got into similar issue, check your svn repo url.
Thanks for those scripts, very useful :-).
Does this method convert the entire history or just the latest revision?
Of course! The entire history and any branches and tags you had in Subversion. :-)
John, your scripts literally saved me whole week of svn to git migration for all my svn repositories.
Thank you a lot! No issues encountered during migration!
Small hint: in order to use "git svn" migration your svn code should at least contain 'trunk'. I had a repo with random files in it's root, w/o any branches or trunk, so before migration had to move it all under 'trunk' subfolder.
Awesome...worked beautifully. Thanks soooo much for doing the work to put these together, and of course for making it available to everyone. Note that in your instructions on this page, in step 2 there is a small typo: the command should be "./fetch-svn-authors.sh", and not "./fetch-svn-author.sh" - author needs to be plural. Easy enough to fix, but I thought you might want to know.
Worked great! Can I make two suggestions? 1) add signal catching for ctrl-c stop execution and 2) allow for username and password to be set at script execution. Thanks for your work here! Saved me a ton of time!