One Codex implemented Jupyter Notebooks, to allow our users to do more complex analyses on their data within our platform. But we know that not all of our users are coders, and even many that are coders use a different language. This article gives a small introduction to Python, to help get you started. In a separate article, we'll also describe our public One Codex Python Client Library, to help you understand the different ways your data are stored and how to access them.
Get what you need to get started
There are so many different modules, packages and libraries available to make tasks easier in python. These usually contain definitions (functions) or classes (types of objects) that we can apply to our data. We've installed quite a few commonly used tools. All you need to do is import the ones you need into your notebook. Here are a few examples:
import numpy as np
# NumPy is useful for scientific programming. It allows us to deal with more complicated data structures, such as arrays. We import it as "np", so that when we call on it in future, we just need to type "np." and whichever definitions we want to call on from it.
import scipy
# SciPy is related to NumPy, and allows us to do things such as linear algebra.
import pandas as pd
# Pandas is another library which allows for different data structures. We use it for certain table-formatted data (particularly dataframes).
import altair as alt
# If you want to generate beautiful plots, altair can be very useful!
We at One Codex have also written our own packages, allowing you to access your data from your notebook, and providing functions that you will find quite useful. They allow us to structure data in different ways, in order to be able to call on samples of interest, or the different classifications or analyses that have been run for those samples. One example is our "SampleCollection" class, which includes lots of methods we commonly use to analyze samples or classifications results.
from onecodex.models import SampleCollection
If you're not sure what definitions or classes are available in a module, or already loaded in your notebook, you can list them out or get the documentation available for them, with the following:
dir(modulename)
# Lists the names of functions and variables in the module.
dir()
# Lists all loaded functions
help(modulename)
# Provides help documentation for the functions in the module.
Now you've imported all of the things you'll need to do your analysis. You may need fewer or more modules than these, but this is a great place to start.
Accessing Your One Codex Data
You may be used to viewing your data through a web browser, but you can also access it programmatically, via our API. To access the API through your notebook, you'll first need to import the functions we have built, and then create the connection between your environment (notebook), and our servers. These commands will take care of that:
from onecodex import Api
# This lets us use the "Api" class from the onecodex python client library, which will allow us to make requests to access our data.
# You'll need to instantiate a One Codex API object.
# If you're already logged in, you can do this by typing:
ocx = Api()
# If you're working in a different system, you'll instead need to specify your API key.
ocx = Api(api_key="YOUR_API_KEY_HERE")
Syntax
Comments
You'll see in some of the above code blocks, we have some lines beginning with "#". Most programming languages don't run anything on a line after a "#" symbol. This allows us to add comments to our code, without interrupting the execution of the code.
If you want to add multiple comment lines, for instance at the beginning of a script or definition, you have the option to do this using triple quotes before and after the comment lines, so that you don't need to use the "#" symbol repeatedly.
"""
This is a comment.
This is a great way to describe the overall function of your script or definition.
You can include multiple lines to make the comment more readable.
"""
Indentation
Compared to some other languages, the big benefit of Python is how readable it is! It doesn't rely on parentheses or brackets to contain blocks of code. However it does require indentations in blocks such as loops. Indentations can be any size.
if 10 > 5:
print('10 is greater than 5')
Data Formats
Variables
You can store data values for things you want to call on multiple times throughout a script or notebook. You create the variable the first time you assign a value to it.
x = 5 # This variable, named x, is type int
y = "One Codex" # This is type str
There are some rules to setting variable names, but in general:
Don't start a variable name with a number.
Don't have spaces or dashes in your variable name, but underscores are good.
Data Types
There are a lot of different built-in data types in python. This is not an extensive list, but a description of the most common ones we tend to use.
When using numeric type data, you can save them as type "int" (integer), as in the above example, or type "float", to allow for decimal places. Text can be saved as "str" (string). You can use the "bool" type (boolean) for "True" and "False" assignments.
We also deal with more complicated data types, such as lists, which you can create as below, with square brackets indicating that this is a list, and commas to separate each entry.
my_list = ['A', 'list', 'of', 'strings']
Lists are ordered, and can be changed (adding or subtracting elements, or modifying a particular element). You can call on an element by it's position in the list, starting with 0.
my_list[0]
# This will return 'A'
Lists are great for looping through each item, but sometimes you'll want to call on an item by a name, or key. Dictionaries (type "dict") allow you to save pairs of keys and values together, in a "key": "value" format.
# Create the dictionary
my_dictionary = {'Nick':'USA', 'Christopher':'Canada', 'Denise':'Ireland'}
# Print out the value associated with the key 'Nick'.
print(my_dictionary['Nick'])
# This will print 'USA'
Operators
Unsurprisingly, you can perform mathematical functions on numeric data, such as addition, subtraction, multiplication, division, and exponentiation.
Above, we assigned the number 5 to the variable "x". We can modify x with further assignments.
x = 5 # The original assignment
x += 3 # This is the same as x = x + 3.
x -= 3 # This is the same as x = x - 3.
x **= 3 # This is the same as x = x ** 3, or x cubed.
Comparison Operators
Sometimes you'll want to check if a variable equals some value. Below are some of the common comparison operators such as this.
x == 5 # Tests if x is 5, and returns true or false
x != 5 # Tests if x does NOT equal 5
x > 5 # Tests if x is greater than 5
x < 5 # Tests if x is less than 5
x >= 5 # Tests if x is greater than or equal to 5
x <= 5 # Tests if x is less than or equal to 5
You can combine comparisons with logical operators.
x < 10 and x > 2
# If both statements are true, this returns "True"
x < 10 or x > 20
# If one statement is true, this returns "True"
not(x < 10 and x > 2)
# If what is inside the parentheses is true, the "not" reverses the result, to return "False"
You can also test if a variable is in an object like a list.
y = ['this', 'is', 'a', 'list']
z = 'this'
z in y # This will return "True"
Conditions and Loops
Using the above comparison operators, we can create tests and have certain functions happen depending on the result of the test. "if", "elif" (short for "else if"), and "else" are what we use to do that.
x = 10
y = 20
if x < y:
print("x is less than y")
elif x == y:
# We jump into an "elif" condition if the first one is False.
print("x is equal to y")
else:
# If x is not < 5, and if it's also not equal to 5,
# then we do this
print("x is greater than y")
If you want to repeat a function until a statement becomes False, we can use "while" loops. Below, so long as x is below 20, we will print x and increase it by 1 for each iteration of the loop. As soon as x becomes 20, this loop no longer runs.
x = 10
while x < 20:
print(x)
x += 1
If you want to loop through all elements in a list, you can use "for". Here, we'll create a list, loop through it's elements, assign it to a temporary variable (in this case, 'l'), and print each element.
my_list = ['lists', 'keep', 'me', 'organized']
for l in my_list:
print(l)
Functions
Functions allow you to build a series of commands, and run them all in one. Creating a function allows you to do something repeatedly to different variables without having to write the command lines out repeatedly. You define a function using the "def" keyword.
def my_function():
print("Coding is fun!")
To run the function, you can call it like this:
my_function()
This function is pretty simple. It just prints "Coding is fun!" whenever you call it! You'll probably want functions to do something with a specific input. For that, you'll pass arguments to your functions, so we need to write the function so that it expects an argument.
def my_math_function(x):
y = 5
z = x + y
return z
Now you can pass any number (x) to this function. It will perform the math, and return the result. You can save this result to another variable, or you may want to print it.
my_result = my_math_function(10)
print(my_result)
Many functions will have a set number of arguments you must provide, and often expected in a certain order. If you want a function to take an unknown number of arguments, you can type * right before your parameter name. You can also use keyword arguments, which can allow you to pass the arguments to the function in a different order.
def my_function(city3, city2, city1):
print("The third city is " + city3)
# Then to run it, you can switch the order:
my_function(city1 = "New York", city2 = "London", city3 = "Paris")
You'll often come across functions that have arguments with default parameters. If a default exists, you don't need to pass it to the function when you call it, but you can choose to pass alternatives to the function, which it will use in place of the default in those situations. Here's how you write a function with a default parameter.
def my_function(city3 = "Hong Kong"):
print("The third city is " + city3)
# You can now run this with or without passing an argument.
my_function(city3 = "Berlin") # This will use "Berlin"
my_function() # But this will go with the default - "Hong Kong"
Classes
Classes are like templates for creating objects in Python. It's unlikely you'll need to create classes within a Jupyter Notebook on our system. One Codex has used classes to structure your data. We'll give a little overview of classes here, so that you know how to use them.
Classes usually have a function called __init__()
. This is used to do certain required things each time an object is created using this class, and as such, executes by default each time an object of this class is created. These "required things" could be setting properties for the object, or performing other functions.
# Here, we'll create a class (template) for a company
class Company:
def __init__(self, name, employees):
self.name = name
self.employees = employees
# Now we'll use that template to create a company
company1 = Company("Apple", 137000)
# Now we can call on specific properties of our object
print(company1.name)
print(company1.employees)
Examples of One Codex Classes
One example of a class that you've seen already is when we called on Api()
earlier. When we called on it earlier, we created the object ocx
. This Api()
class has a property called "api_key". When we created the ocx
object, we could it our API key. If we didn't explicitly provide the API key, there are functions within the __init__()
for this class which run automatically, telling the class where it should find the API key for this user.
Another example where One Codex uses Classes to store your data is Samples
. Each sample is created as an object of the Samples
class we created, with multiple parameters, such as project, tags, public (whether or not the sample is a public sample).
Next Steps
Now that you have an introduction to Python, why not take a look at our articles on Alpha Diversity and Beta Diversity, which contain many functions to help you access and analyze your data on the One Codex platform, through Jupyter Notebooks, or even in your own Python environment!