Computing Skills for Biologists, Chapter 3 (Basic Programming), pgs. 81-119
- Interpreted, high-level language
- Very popular among computational biologists
- One of the most "human readable" programming languages
- One of the few languages where line indentation is actually interpreted as part of the code
- Python is much less picky than bash about spacing within lines, though
- Dynamically typed (don't worry, we'll get to this later)
- Unlike many programming languages (e.g., C++ and Java), Python can be executed interactively.
- To start the Python interpreter, simply open up a Terminal window and type
python
. - Once the Python prompt (
>>>
) appears, type:a = 3
print(a)
- What do you see?
- To exit the Python interpreter, type
quit()
.
- Now, put those same commands in a text file ending with a
.py
extension (e.g.,test.py
). - You can use this line at the top of the file to indicate that this is a Python script:
#! /bin/python3
- To run this, add execute permissions to the file, then use
./test.py
- Alternatively, you can type
python3 test.py
without the header line. - What do you see?
- The
.py
extension isn't required, but is conventional for Python code.
- Jupyter notebooks are a special type of file that allows formatted notes to be mixed with Python code.
- These notebooks can be run on smic through OnDemand.
- To open a Jupyter notebook, go to the "Interactive Apps" menu at the top and select "Jupyter Notebook".
-
Numbers
- Note that, unlike bash, Python can handle both integers and floating point (decimal) numbers.
- Integers -
myInt = 10
- Floating point numbers (decimals) -
myFloat = 3.21312
- If running Python interactively, variable values can be seen just by typing the name of the variable -
myFloat
. - If running Python in a script, variable values can be printed by using the
print()
funtion -print(myFloat)
. - Changing numbers from one type to another is one form of "type casting".
-
Strings -
myStr = "This is a string."
- Strings can be of any length, but are always defined with quotes.
- Individual characters, or subsets of characters, can be accessed using square brackets - []
- String indices (and all indices in Python) start at 0
myStr = "biology"
myStr[0]
- A range of indices can be defined with a colon
myStr[2:5]
- Strings can be concatenated together with the
+
operator.myStr = "biology"
newString = myStr + "_is_super_interesting"
newString
- It is very easy in Python to use
for
loops with strings. Try this:
for letter in myStr:
print(letter)
- Booleans -
True
orFalse
- These variables can take the values
True
orFalse
only.myBool = True
- These are the data types used in things like
if...else
statements andwhile
loops. - Logical statements, like
myNum > 3
ormyStr == "test"
, return a boolean variable indicating the result of the comparison - In Python, the value of a Boolean can be reversed by preceding it with
not
.
- These variables can take the values
- Functions are the verbs of a programming language
- Functions take some variables or objects as input (arguments) and either do something with them or return a new variable/object.
- Functions are always written like this -
myFunction(object1,object2,...)
- They always have parentheses, even if they don't need any arguments to be passed to them
- For example,
myFunction()
- For example,
- The number and type of arguments needed is specific to each function
- Perhaps the simplest function to use is
print()
.print()
can take an arbitrary number of variables as arguments and will print them to the screen. - Here are a few other really useful functions and what they do:
len(string)
- Takes one argument (a string) and returns its lengthabs(number)
- Takes one argument (either an integer or a float) and returns its absolute valueround(float)
- Takes one floating point number as an argument and rounds it to the nearest integertype(variable)
- Takes one variable as an argument and returns its type
- Methods are functions (sort of like actions or verbs) that belong to a variable
- Methods are accessed and used by appending them to the name of a variable, separated by a
.
- For instance, let's say we define a string -
myStr = "biOLogy "
. Here are some really useful methods that can be used with that string:myStr.upper()
- Converts all characters in the string to uppercasemyStr.lower()
- Converts all characters in the string to lowercasemyStr.strip()
- Removes whitespace from the beginning or ending of the string- Methods can also be chained! -
myStr.strip().lower()
myStr.replace("y","ical")
- This method functions like find and replace, where the first argument is the text to find and the second argument is the text to use when replacing.myStr.split("O")
- Breaks up the string, using the character provided as an argument as the delimiter. This method then returns a list of new strings. We'll talk more about lists later.
- Just like with bash, comments are essential in Python.
- Comments are included in Python by putting a
#
at the beginning of a line. - Many text editors have a shortcut keystroke to toggle lines between being commented and uncommented.
# This is an example of a comment
- Python has a special way to format strings so that the values of variables can be included
- The operator
%
is used to indicate that the values of certain variables should be used to fill in a string - The parts of the string where the values should go is indicated with placeholders (kinda like wildcards)
- The three most basic placeholders are:
%d
- An integer%f
- A floating point number (i.e., a decimal)%s
- A string
- Try this:
faveInt = 7
faveFloat = 3.14
faveString = phyleaux
print("Favorite integer is %d, favorite float is %f, and favorite string is %s." % (faveInt,faveFloat,faveString))
- Note how the variables holding the values are provided in the same order inside parentheses after the
%
.
- Python has a special function,
input("Message:" )
, to accept input from a user. - This function reads the user input and returns it. But be sure to store it in a variable!
userStr = input("Input some string: ")
- You can use the string methods outlined above to standardize user input - like stripping out excess whitespace, changing character case, etc.
- If you need help remembering what methods are available for a given object, use
dir(<OBJECT>)
- Note that methods starting and ending with "__" are not meant for users to call. They are methods to be used by the system.
- Try creating a string variable (e.g.,
myStr = "some text"
).- Now run
dir(myStr)
- Pick a method you haven't used before and see what it does with
myStr
.
- Now run
- These are similar in form to bash, but simpler.
- The general form is:
if <CONDITION>:
<DO_THIS>
- The
if
statement has to finish with a:
. - Also, in Python, the code block inside the
if
statement must be indented. You can use either tabs or a certain number of spaces. When using spaces, you must use the same number of spaces throughout your code. The official recommendations for Python style strongly suggest using 4 spaces, rather than tabs. - Note that the syntax for conditions is pretty intuitive in Python
myStr != "other text"
if
statements can be used on their own, but no code will be executed when the condition is false.
- To include code blocks that should be executed when the condition is false, you'll need to use an if...else statement. These have the general form:
if <CONDITION>:
<DO_THING_WHEN_TRUE>
else:
<DO_THING_WHEN_FALSE>
- Sometimes, you'll want to test a series of conditions to decide what to do. In that case you could use a nested set of
if...else
statements:
nuc = "A"
if nuc == "A":
print("Nucleotide is an A.")
else:
if nuc == "C":
print("Nucleotide is a C.")
else:
if nuc == "G":
print("Nucleotide is a G.")
else:
if nuc == "T":
print("Nucleotide is a T.")
- However, you'll notice that this can get pretty cumbersome pretty quickly. Thankfully, python offers another, more compact, way to write this:
nuc = "A"
if nuc == "A":
print("Nucleotide is an A.")
elif nuc == "C":
print("Nucleotide is a C.")
elif nuc == "G":
print("Nucleotide is a G.")
elif nuc == "T"
print("Nucleotide is a T.")
- The
elif
statements stand forelse...if
and capture what we tried to do above with the nesting. - It is important to realize that as we've currently written these statements, nothing would happen if the variable
nuc
didn't have the valueA
,C
,G
, orT
. This can be a big problem when accepting input from a user or file that might provide a nonsensical value. To capture these cases, it's always best to close with a simpleelse
:
nuc = "A"
if nuc == "A":
print("Nucleotide is an A.")
elif nuc == "C":
print("Nucleotide is a C.")
elif nuc == "G":
print("Nucleotide is a G.")
elif nuc == "T":
print("Nucleotide is a T.")
else:
print("This is not a valid nucleotide!")
In-Class Practice Exercise (Characterizing Sequences)
(1) Copy the sequences available in this repository.
(2) Write a Python script with code to answer these questions.
For each question, include comments describing how you're
addressing the question and have the script print out informative
messages with answers to each question.
- How long are the sequences?
- Are the sequences the same length?
- Convert the DNA sequences to equivalent RNA sequences (T -> U).
- What are the frequencies of the four nucleotides in each sequence?
- At how many positions do the sequences differ?
- Generate the reverse complement of each sequence.