One of the great things about Julia is the language’s extensibility. With Julia, all of the modules can utilize the functions provided by the Base
module to add new methods. In other words, modules can blend seemlessly with the Base
and often be treated like Base
types. This means that if we learn how these methods work with Base
we will probably be able to carry a lot of that knowledge with us into other modules. Today we will demonstrate this by starting with Base
and then expanding into filtering a different structure from a dependency, a DataFrame
from DataFrames
.
Filtering base types
There are a few different techniques that can be used to filter a simple Vector
. One feature that I think is relatively new to Julia is the ability to provide conditional masks as indexes. I am not sure how long this has been included with Base
, but this is certainly an awesome feature, as I love conditional masks. To create a conditional mask, we need to make one of the BitArrays
we talked about earlier. In this instance, we will broadcast a comparison operator again. Here we will filter any value above 14 out of x:
x = [5, 10, 15, 20]
xmask = x .< 14
x[xmask]
2-element Vector{Int64}:
5
10
Alternatively, we could utilize the filter
methods. These are filter
and filter!
. These two methods do the same exact thing, the only difference is that filter!
is a mutating method. This is precisely what the !
in function names is meant to represent. I find that to be a really cool standard as it does certainly make it easier to discern when things are being mutated and when they are not. I think that is a great thing to know, especially when it comes to Data Science. The filter
method is provided with a Function
as the first positional argument and then our Vector
as the second positional argument. This might change slightly if the type is not a Vector
, so keep that in mind.
filter(x::Int64 -> x < 14, x)2-element Vector{Int64}:
5
10
Given that we used filter
instead of filter!
here, we would need to set x
equal to the return to enforce these changes. Another thing we can filter using this technique is dictionaries. Rather than providing the type of each element in the Vector
, we instead work with a Pair
.
mydict = Dict(:A => [5, 10], :B => [4, 10])filter(k::Pair{Symbol, Vector{Int64}} -> k[2][1] != 5, mydict)
Dict{Symbol, Vector{Int64}} with 1 entry:
:B => [4, 10]
Because the function is the first positional argument, this also opens up the ability to utilize the do syntax, so definitely keep this in mind.
x = [5, 10, nothing, nothing, 40]filter!(x) do number
~(isnothing(number))
end
3-element Vector{Union{Nothing, Int64}}:
5
10
40
Filtering dataframes
Another common type of structure that might need to be filtered is the DataFrame
. This is a bit different because it is a dependency and a module, not just a portion of Base
.
using DataFramesdf = DataFrame(:X => [1, 2, 3, 4], :Y => [1, 2, 3, 4])
The filter
method when used on a DataFrame
will provide a DataFrameRow
to the function. This is a cool type, we can index it pretty easily and this makes filtering a breeze.
filter!(df) do row
if row[:X] > 3
return(false)
end
true
end
That really is all there is to it, and with the preexisting knowledge from Base
, it might be hard to find things that are not possible to filter with this technique!