-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathgit.qmd
568 lines (374 loc) · 19.8 KB
/
git.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
---
title: "Git version control for collaboration"
format: gfm
editor: source
date: today
author: UQ Library
---
## Installation
We will use Git inside a command-line shell called Bash.
Installation instructions are available on [this page](https://github.com/uqlibrary/technology-training/blob/master/Git/installation.md).
## What is Git?
If you need to collaborate on a project, a script, some code or a document, there are a few ways to operate. Sending a file back and forth and taking turns is not efficient; a cloud-based office suite requires a connection to the Internet and doesn't usually keep a clean record of contributions.
**Version control** allows users to:
* record a clean history of changes;
* keep track of who did what;
* go back to previous versions;
* work offline; and
* resolve potential conflicts.
Programmers use version control systems to collaborativelly write code all the time, but it isn’t just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can be stored in a version control system.
A version control system is a tool that keeps track of changes for us, effectively creating different versions of our files. It allows us to decide which changes will be made to the next version (each record of these changes is called a **commit**), and keeps useful metadata about them. The complete history of commits for a particular project and their metadata make up a **repository**. Repositories can be kept in sync across different computers, facilitating collaboration among different people.
## Configuring Git
On a command line, Git commands are written as `git verb`, where `verb` is what we actually want to do.
Before we use Git, we need to **configure** it with some defaults, like our credentials and our favourite text editor. For example:
```shell
git config --global user.name "Vlad Dracula"
git config --global user.email "[email protected]"
```
This user name and email will be associated with your subsequent Git activity, which means that any changes pushed to GitLab, GitHub, BitBucket or another Git host server in the future will include this information. This has to match your GitLab credentials.
```shell
git config --global core.editor "nano -w"
git config --list
```
You can always find help about git with the `--help` flag:
```shell
git --help
git config --help
```
## Creating a repository
First, let's make sure we're in the right directory. We can check the directory using the `pwd` command, and then change directory using the `cd` command. On windows we can change to our default home directory like so:
```
cd /c/Users/<yourusername>
```
We can using `ls` to get a list of everything that is in our current directory.
Now, let's create a directory for our project and move into it:
```
mkdir planets
cd planets
```
This is the same as creating a new folder.
Then we tell Git to make `planets` a repository — a place where Git can store versions of our files:
```
git init
```
Using the `ls` command won't show anything new, but adding the `-a` flag will show the hidden files and directories too:
```
ls -a
```
Git created a hidden `.git` directory to store information about the project (i.e. everything inside the directory where the repository was initiated).
Now that we've initialised the git repositry, we can start using commands to manage versions. We can now check the status of our project with:
```
git status
```
## Tracking changes
> How do we record changes and make notes about them?
You should still be in the `planets` directory, which you can check with the `pwd` command.
Let's create a new text file that contains some notes about the Red Planet’s suitability as a base. We'll use the `nano` text editor:
```
nano mars.txt
```
Type the following text into it:
```
Cold and dry, but everything is my favorite colour
```
Write out with <kbd>Ctrl</kbd>+<kbd>O</kbd> and exit nano with <kbd>Ctrl</kbd>+<kbd>X</kbd>.
We can now use `ls` to check that the file has been created.
You can also check the contents of your new file with the `cat` command:
```
cat mars.txt
```
Now, check the status of our project:
```
git status
```
Git noticed there is a new file. The "Untracked files” message means that there’s a file in the directory that Git isn’t keeping track of. We can tell Git to **track a file** using `git add`:
```
git add mars.txt
```
You may get a note saying `warning: LF will be replaced by CRLF in mars.txt.`
This is highlighting the difference in the way that Linux systems and Windows systems handle carriage returns. And this *can* be recorded as a change when you go between operating systems, but only if you change those lines, and there is now a lot more cross compatibility, so you can actually just safely ignore this.
Now we can use `git status` again to see what happenned:
```
git status
```
Git now knows that it's supposed to keep track of `mars.txt`, but it hasn’t recorded these changes as a commit yet. To get it to do that, we need to run one more command:
```
git commit -m "Start notes on Mars as a base"
```
When we run `git commit`, Git takes everything we have told it to save by using `git add` and stores a copy permanently inside the special .git directory. This permanent copy is called a **commit** (or **revision**) and it is given a short identifier.
We use the `-m` flag (for "message") to record a short descriptive comment that will help us remember what was done and why.
If we run `git status` now:
```
git status
```
... we can see that the working tree is clean.
To see the recent **history**, we can use `git log`:
```
git log
```
`git log` lists all commits made to a repository in reverse chronological order. The listing for each commit includes the commit’s full identifier (which starts with the same characters as the short identifier printed by the `git commit` command earlier), the commit’s author, when it was created, and the log message Git was given when the commit was created.
![](https://github.com/chrisburr/analysis-essentials-1/blob/master/git/fig/git-staging-area.svg)
Now, let's add a line to our text file:
```
nano mars.txt
```
After writing out and saving, let's check the status:
```
git status
```
We have changed this file, but we haven’t told Git we will want to save those changes (which we do with `git add`) nor have we saved them (which we do with `git commit`). So let’s do that now. It is good practice to always **review our changes** before saving them. We do this using `git diff`. This shows us the differences between the current state of the file and the most recently saved version:
```
git diff
```
There is a quite a bit of cryptic-looking information in there: it contains the command used to compare the files, the names and identifiers of the files, and finally the actual differences. The `+` sign indicates which line was added.
It is now time to commit it:
```
git commit -m "<your comment>"
```
That didn't work, because we forgot to use `git add` first. Let's fix that:
```
git add mars.txt
git commit -m "<your comment>"
```
Using `git add` allows us to select which changes are going to make it into a commit, and which ones won't. It sends them to what is called the **staging area**. In a way, `git add` specifies _what_ will go in a snapshot (putting things in the staging area), and git commit then actually _takes_ the snapshot.
**Challenge 1**
The staging area can hold changes from any number of files that you want to commit as a single snapshot.
1. Add some text to mars.txt noting your decision to consider Venus as a base
1. Create a new file venus.txt with your initial thoughts about Venus as a base for you and your friends
1. Add changes from both files to the staging area, and commit those changes as one single commit.
Adding and committing multiple files:
![](https://github.com/chrisburr/analysis-essentials-1/blob/master/git/fig/git-committing.svg)
## Exploring history
> How can we identify old versions of files, review changes and recover old versions?
As we saw in the previous lesson, we can refer to commits by their identifiers. You can refer to the _most recent commit_ of the working directory by using the identifier `HEAD`.
Let's add a line to our file:
```shell
nano mars.txt
```
We can now check the difference with the head:
```shell
git diff HEAD mars.txt
```
Which is the same as using `git diff mars.txt`. What is useful is that we can **refer to previous commits**, for example for the commit before `HEAD`:
```shell
git diff HEAD~1 mars.txt
```
Similarly, `git show` can help us find out what was changed in a specific commit:
```shell
git show HEAD~2 mars.txt
```
We can also use the unique 7-character identifiers that were attributed to each commit:
```shell
git diff XXXXXXX mars.txt
```
> How do we **restore older versions** of our file?
Overwrite your whole text with one single new line:
```shell
nano mars.txt
git diff
```
We can put things back the way they were by using `git checkout`:
```shell
git checkout HEAD mars.txt
cat mars.txt
```
`git checkout` checks out (i.e., restores) an old version of a file. In this case, we’re telling Git that we want to recover the version of the file recorded in `HEAD`, which is the last saved commit. If we want to go back even further, we can use a commit identifier instead:
```shell
git log -3
git checkout XXXXXXX mars.txt
cat mars.txt
git status
```
Notice that **the changes are on the staged area**. Again, we can put things back the way they were by using git checkout:
```shell
git checkout HEAD mars.txt
cat mars.txt
```
**Challenge 2**
Jennifer has made changes to the Python script that she has been working on for weeks, and the modifications she made this morning “broke” the script and it no longer runs. She has spent more than an hour trying to fix it, with no luck…
Luckily, she has been keeping track of her project’s versions using Git! Which commands below will let her recover the last committed version of her Python script called data_cruncher.py?
1. `git checkout HEAD`
1. `git checkout HEAD data_cruncher.py`
1. `git checkout HEAD~1 data_cruncher.py`
1. `git checkout <unique ID of last commit> data_cruncher.py`
1. Both 2 and 4
Checkout summary:
![](https://github.com/chrisburr/analysis-essentials-1/blob/master/git/fig/git-checkout.svg)
### Recap
* `git config`: configure git
* `git init`: initialise a git repository here
* `git status`: see information about current state of the repository
* `git add`: add a change from a file (or several) to the staging area
* `git commit -m "..."`: commit a change (or several) to our history
* `git log`: see history
* `git show`: show changes in one commit for one file
* `git checkout`: roll back to previous version
* `git diff`: difference between file on disk and commit in repository
## Ignoring things
> How can I tell git to ignore things?
Sometimes, we don't want git to track files like automatic backup files or intermediate files created during an analysis.
Say you create a bunch of `.dat` files like so:
```shell
touch a.dat b.dat c.dat
git status
```
If you don't want to track them, create a `.gitignore` file:
```shell
nano .gitignore
```
... and add the following line to it:
```shell
*.dat
```
That will make sure no file finishing with `.dat` will be tracked by git.
```shell
git status
git add .gitignore
git commit -m "Ignore data files"
git status
```
## Remotes
If you haven't already, now is the time to create a [GitHub](https://github.com) account. In our class, I'd ask you to share your username, so that we can collaborate later.
Git is the software.
GitHub is a platform to allow you to host the repository and share it with others.
There are others such as GitLab, BitBucket, GitTea, and GitBucket.
> How do I share my changes with others on the web?
Version control really becomes extra useful when we begin to **collaborate with other people**. We already have most of the machinery we need to do this; the only thing missing is to **copy changes from one repository to another**.
It is easiest to use one copy as a central hub, stored online.
Let's look at GitHub: https://github.com
Let's **share our repository with the world**. Log into GitHub and create a new repository called `planets` ("+ > New repository" in the top toolbar). Make sure you select "Public" for the visibility level.
Our local repository (on our computer) contains our recent work, but the **remote repository** on GitHub's servers doesn't.
We now need to connect the two: we do this by making the GitHub repository a **remote** for the local repository. The home page of the repository on GitHub includes the URL we need to identify it, under "…or push an existing repository from the command line". Copy it to your clipboard, and in your local repository, run the following command (note that you will need to right click when using the Shell):
```shell
git remote add origin https://github.com/<your_username>/planets.git
```
The name `origin` is a local nickname for your remote repository. We could use something else if we wanted to, but `origin` is by far the most common choice.
GitHub wants to make sure that we're using the same name for our main branch as they do on GitHub. You will note that our branch is currently called **master**, we can change the branch name to **main** with the next line of code:
```shell
git branch -M main
```
Now, we can **push** our changes from our local repository to the remote on GitHub. Try this:
```shell
git push
```
Git does not know where it should push by default. See the suggested command in the error message? We can set the default remote with a shorter version of that:
```shell
git push -u origin main
```
We only need to do that once: from now one, Git will know that the default is the `origin` remote and the `main` branch.
It may request your credentials, which used to simply be your username and password, however, GitHub now requires you to create a Personal Access Token.
1. Click on your avatar in the top right of GitHub.com
1. Click settings
1. Scroll to the bottom and on the left, and click Developer settings.
1. Click Personal access tokens (either type is fine)
1. Click `Generate new token` (either is fine, classic is simpler)
1. You may need to authenticate yourself using TFA (you can also choose to use your password)
1. You can select what you need to be able to edit. If you've chosen classic, you can skip this and scroll to the bottom to `Generate token`.
1. Make sure you copy this token immediately and save it somewhere so you can reuse it.
You can now see on GitHub that your changes were pushed to the remote repository.
You can edit files directly on GitHub if you want. Try editing your READ.me by clicking on "Edit".
If you do that, you will then need to **pull** changes from the remote repository to your local one before further editing:
```shell
git log
git pull
git log
ls
```
In summary: `git push` sends commited changes to a remote repository, whereas `git pull` gets commited changes from the remote to your local repository.
## Collaborating
> How do we use version control to collaborate?
Now, let's get into pairs: one person is the "Owner", the other is the "Collaborator".
First, the Owner needs to give the collaborator editing access to the repository. Go to the Settings tab in your GitHub repository. On the left panel, you can click Collaborators and then `Add people`. Here you can enter usernames and email addresses.
The Collaborator can then accept the invitation.
Next, the Collaborator needs to download a copy of the Owner's repository to their machine, which is called "**cloning a repository**". To do that, first make sure you move out of your personal repository:
```shell
cd ..
```
Now, you can clone the Owner's repository (you can do this by clicking the green `Code` button on a repository), and you can give it a recognisable local name:
```shell
git clone https://github.com/<owner_username>/planets.git partner-planets
```
The Collaborator can now make changes in their clone of the Owner's repository:
```shell
cd partner-planets
nano README.md
```
Add a section for your collaborators.
```
## Contributors
Name
Name2
```
Then add, commit, and push the change to the Owner's repository on GitLab:
```shell
git add README.md
git commit -m "added collaborators"
git push
```
We didn't have to create a remote called `origin`, or set the default upstream: that was done by default by Git when cloning the repository.
You can see that the changes are now live on GitHubb.
The Owner can now download the Collaborator's changes from GitHub:
```shell
git pull origin main
```
If you collaborate on a remote repository, **remember to `pull` before working**!
**Challenge 3**
Switch roles and repeat the process!
**Challenge 4**
Use the GitLab interface to add a comment to your partner's commit and suggest something. See your notifications in "Activity" afterwards.
## Conflicts
> What do I do when changes conflict with someone else's?
As soon as people can work in parallel, they’ll likely step on each other’s toes. This will even happen with a single person: if we are working on a piece of software on both our laptop and a server in the lab, we could make different changes to each copy. Version control helps us manage these conflicts by giving us tools to resolve overlapping changes.
To see how we can **resolve conflicts**, we must first create one. The file mars.txt is currently in the same state in both copies of the `planets` repository.
The Collaborator can add a line to their partner's copy, and push to GitLab:
```shell
nano mars.txt
git add mars.txt
git commit -m "Add a line in my friend's file"
git push
```
Now let’s have the Owner make a different change to their own copy _without pulling from GitHub beforehand_:
```shell
nano mars.txt
```
The Owner can commit the change locally:
```shell
git add mars.txt
git commit -m "Add a line in my own copy"
```
But Git won't let us push to GitHub:
```shell
git push
```
Git rejects the push because it detects that the remote repository has new updates that have not been incorporated into the local branch. What we have to do is (1) **pull** the changes from GitHub, (2) **merge** them into the copy we’re currently working in, and then (3) **push** that. Let’s start by pulling:
```shell
git pull
```
Git detects that changes made to the local copy overlap with those made to the remote repository, and therefore refuses to merge the two versions to stop us from trampling on our previous work. The conflict is marked in the affected file:
```shell
cat mars.txt
```
Our change is preceded by `<<<<<<< HEAD`. Git has then inserted `=======` as a separator between the conflicting changes and marked the end of the content downloaded from GitLab with `>>>>>>>`. (The string of letters and digits after that marker identifies the commit we’ve just downloaded.)
It is now up to the Owner to fix this conflict:
```shell
nano mars.txt
```
They can now add and commit to their local repo, and then push the changes to GitHub:
```shell
git add mars.txt
git commit -m "Merge changes from GitHub"
git push
```
Git keeps track of merged files. The Collaborator can now pull the changes from GitHub:
```shell
git pull
git log -3
```
## Hosting
GitHub? GitLab? BitBucket?
External company, purchased domain and host, or local server at the lab?
## Licence
This short course is based on the longer course _[Version Control with Git](http://swcarpentry.github.io/git-novice/)_ developped by the non-profit organisation [The Carpentries](http://carpentries.org/). The original material [is licensed](https://software-carpentry.org/license/) under a Creative Commons Attribution license ([CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode)), and this modified version uses the same license. You are therefore free to:
* **Share** — copy and redistribute the material in any medium or format
* **Adapt** — remix, transform, and build upon the material
... as long as you give attribution, i.e. you give appropriate credit to the original author, and link to the license.