[post for post in posts if post == this]

Comprehending List Comprehensions

python List comprehensions are pretty cool. They make a lot of sense and are really well designed in Python. While working on some challenges today at Metis, I was able to shorten a lot of my code while using this technique. List comprehensions with Python are only now becoming more natural for me to use because they weren't part of my normal programmatic thinking with PHP or Javascript (though you could kind of force it with Javascript and some mapping).

But it is completely natural in the Python language and the syntax is rather beautiful (and logical).

In my code, I was taking data from a file and and turning it into a Pandas DataFrame to clean up. My initial code looked something like this before I took a minute to clean it up.

def clean_vote_data(file):  
    data = []
    for line in open(file):
        data.append(line.rstrip('\n').split(','))
    data = pd.DataFrame(data)
    cleaned_data = data.replace(['y', 'n', '?'], [1, 0, np.nan])
    cleaned_data = cleaned_data.fillna(cleaned_data.mean())
    return cleaned_data
cleaned_vote_data = clean_vote_data("house-votes-84.data")  

But using a list comprehension, I turned the first 3 lines of the function into 1 "pythonic" line:

def clean_vote_data(file):  
    data = pd.DataFrame([line.rstrip('\n').split(',') for line in open(file)])
    cleaned_data = data.replace(['y', 'n', '?'], [1, 0, np.nan])
    cleaned_data = cleaned_data.fillna(cleaned_data.mean())
    return cleaned_data
cleaned_vote_data = clean_vote_data("house-votes-84.data")  

Explanation

Now, let me explain this a bit.

[2*x for x in [2,4,6,8]]

This would return a new array of [4,8,12,16]. The code is saying for each value in the array, let's call each one x, then multiply it by 2. So the part before the "for" is the action you are doing to each array element.

You can also add an if statement to it like so:

[2*x for x in [2,4,6,8] if x < 5]

This would predictably return [4,8] because it will only apply the multiplication to 2 and 4.

So in my code at the top, the list comprehension is this:

[line.rstrip('\n').split(',') for line in open(file)]

Basically for each line in the file, we are returning an array of each element, divided by commas. Then we create a Pandas DataFrame with this!

Super simple and concise.

We can also make this pretty crazy by nesting more list comprehensions in it... because, what if the array that we were looping through was a list comprehension itself!

Let's look at this line again:

[2*x for x in [2,4,6,8] if x < 5]

Now, what if [2,4,6,8] was actually a list comprehension like this:

[2*x for y in [1,2,3,4]]

Then if we replaced it, it would like this:

[2*x for x in [2*x for y in [1,2,3,4]] if x < 5]

And it would still do the exact same thing and you could theoretically just keep on nesting... and nesting... and nesting!
nest

Till next time, nest on.

I <3 Python


What I Learned Today:
Importing antigravity into python links to http://xkcd.com/353/