# Manipulate data the Knet way with `KnetArray`

It's impossible to get anything done if we can't manipulate data. 
Generally, there are two important things we need to do with: 
(i) acquire it! and (ii) process it once it's inside the computer.
There's no point in trying to acquire data if we don't even know how to store it,
so let's get our hands dirty first by playing with synthetic data.

We'll start by introducing KnetArrays, Knet's primary tool for storing and transforming data with GPUs. Although Knet can use Julia's `Array` type for standard CPU computations, GPUs have become indispensable for training large deep learning models. Even the small examples implemented here run up to 17x faster on the GPU compared to the 8 core CPU architecture we use for benchmarking. However GPU implementations have a few potential pitfalls: (i) GPU memory allocation is slow, (ii) GPU-RAM memory transfer is slow, (iii) reduction operations (like sum) can be very slow unless implemented properly ([See Optimizing Parallel Reduction in CUDA](http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/doc/reduction.pdf)).
Knet implements [KnetArray](http://denizyuret.github.io/Knet.jl/latest/reference.html#KnetArray-1) as a Julia data type that wraps GPU array pointers. KnetArray is based on the more standard [CudaArray](https://github.com/JuliaGPU/CUDArt.jl) with a few important differences: (i) Garbage collection: KnetArrays have a custom memory manager, similar to [ArrayFire](https://arrayfire.com/), which reuse pointers garbage collected by Julia to reduce the number of GPU memory allocations, (ii) Slicing: contiguous array ranges (e.g. a[:,3:5]) are handled as views with shared pointers instead of copies when possible, and (iii) Broadcasting: a number of custom CUDA kernels written for KnetArrays implement element-wise, broadcasting, and scalar and vector reduction operations efficiently. As a result Knet allows users to implement their models using high-level code, yet be competitive in performance with other frameworks.


## Getting started

In this chapter, we'll get you going with the basic functionality. Don't worry if you don't understand any of the basic math, like element-wise operations or normal distributions. In the next two chapters we'll take another pass at KnetArray, teaching you both the math you'll need and how to realize it in code.

Before we get started with Knet and KnetArrays, you should be aware of two special types of arrays: matrices and vectors. In particular, note that for any type T:

In [6]:
a = Array{T,2} where T 
b = Matrix{T} where T
a == b

Array{T,2} where T

In [8]:
a = Array{T,1} where T 
b = Vector{T} where T
a == b

true

Now, to get started, let's import `Knet`. We’ll make a habit of setting a random seed with `srand` so that you always get the same results that we do.

In [1]:
using Knet
srand(1);
Knet.gpu(0);

Next, let's see how to create either KnetArray for gpu, without any values initialized:

In [2]:
x = KnetArray{Float64}(3, 4)
display(x)

3×4 Knet.KnetArray{Float64,2}:
 1.0  1.0  1.0   2.88255 
 1.0  1.0  1.0  -0.295943
 1.0  1.0  1.0  -1.86218 

Array(type, dims...) simply returns an uninitialized dense array without setting the values of any of its entries. This means that the entries can have any form of values, including very big ones! But typically, we'll want our arrays initialized. Commonly, we want all zeros:

In [3]:
x = KnetArray(zeros(3, 5))
display(x)

3×5 Knet.KnetArray{Float64,2}:
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0

Simmilarty, we can create an array of all ones:

In [4]:
x = KnetArray(ones(3, 5))
display(x)

3×5 Knet.KnetArray{Float64,2}:
 1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0

Often, we'll want to create arrays whose values are sampled randomly. This is especially common when we intend to use the array as a parameter in a neural network. In this snippet, we initialize a 2-element array with values drawn from a standard normal distribution with zero mean and unit variance, one element with shape 3x4 and the other with shape 1x1 (i.e. bias). Julia's `map` applies a function to each value of an array and returns a new array containing the resulting values:

In [5]:
x = map(KnetArray, [randn(3, 4), randn(1,1)])

2-element Array{Knet.KnetArray{Float64,2},1}:
 Knet.KnetArray{Float64,2}(Knet.KnetPtr(Ptr{Void} @0x000001020d600600, 96, 0, nothing), (3, 4))
 Knet.KnetArray{Float64,2}(Knet.KnetPtr(Ptr{Void} @0x000001020d600800, 8, 0, nothing), (1, 1)) 

In [6]:
display(x[1]), display(x[2]);

3×4 Knet.KnetArray{Float64,2}:
  0.297288  -0.0104452   2.29509   0.431422
  0.382396  -0.839027   -2.26709   0.583708
 -0.597634   0.311111    0.529966  0.963272

1×1 Knet.KnetArray{Float64,2}:
 0.458791

The pointer of a KnetArray is accessible via the `.ptr` attribte:

In [7]:
x[1].ptr, x[2].ptr

(Knet.KnetPtr(Ptr{Void} @0x000001020d600600, 96, 0, nothing), Knet.KnetPtr(Ptr{Void} @0x000001020d600800, 8, 0, nothing))

The dimensions of each KnetArray are accessible via the `.dims` attribute.

In [8]:
x[1].dims, x[2].dims

((3, 4), (1, 1))

We can also query its `length`, which is equal to the product of the components of the shape. Together with the precision of the stored values, this tells us how much memory the array occupies.

In [9]:
length(x[1])

12

## Operations

KnetArray supports a large number of standard mathematical operations. Such as element-wise addition:

In [28]:
x = 1:10;

In [26]:
print("ptr x: ", pointer_from_objref(x), "\n")
dump(x)

ptr x: Ptr{Void} @0x00007f61175757f0
UnitRange{Int64}
  start: Int64 1
  stop: Int64 10


In [27]:
x = reshape(x, 2,5);
print("ptr x: ", pointer_from_objref(x), "\n")
dump(x)

ptr x: Ptr{Void} @0x00007f6116eaa8f0
Base.ReshapedArray{Int64,2,UnitRange{Int64},Tuple{}}
  parent: UnitRange{Int64}
    start: Int64 1
    stop: Int64 10
  dims: Tuple{Int64,Int64}
    1: Int64 2
    2: Int64 5
  mi: Tuple{} ()


In [40]:
collect(x)

10-element Array{Int64,1}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10

In [41]:
z = KnetArray(collect(x))

Knet.KnetArray{Int64,1}(Knet.KnetPtr(Ptr{Void} @0x000001020d600e00, 80, 0, nothing), (10,))

In [30]:
dump(z)

Knet.KnetArray{Int64,1}
  ptr: Knet.KnetPtr
    ptr: Ptr{Void} Ptr{Void} @0x000001020d600c00
    len: Int64 80
    dev: Int64 0
    parent: Void nothing
  dims: Tuple{Int64}
    1: Int64 10


In [42]:
dump(z)

Knet.KnetArray{Int64,1}
  ptr: Knet.KnetPtr
    ptr: Ptr{Void} Ptr{Void} @0x000001020d600e00
    len: Int64 80
    dev: Int64 0
    parent: Void nothing
  dims: Tuple{Int64}
    1: Int64 10


In [31]:
using MappedArrays

In [32]:
M = reshape([1:12;], 3, 4)

3×4 Array{Int64,2}:
 1  4  7  10
 2  5  8  11
 3  6  9  12

In [33]:
M2 = mappedarray(√, M) # no floating points are stored

3×4 MappedArrays.ReadonlyMappedArray{Float64,2,Array{Int64,2},Base.#sqrt}:
 1.0      2.0      2.64575  3.16228
 1.41421  2.23607  2.82843  3.31662
 1.73205  2.44949  3.0      3.4641 

In [34]:
dump(M2)

MappedArrays.ReadonlyMappedArray{Float64,2,Array{Int64,2},Base.#sqrt}
  f: sqrt (function of type Base.#sqrt)
  data: Array{Int64}((3, 4)) [1 4 7 10; 2 5 8 11; 3 6 9 12]


In [35]:
M2gpu = KnetArray(M2)

Knet.KnetArray{Float64,2}(Knet.KnetPtr(Ptr{Void} @0x000001020d600000, 96, 0, nothing), (3, 4))

In [36]:
display(M2gpu)

3×4 Knet.KnetArray{Float64,2}:
 1.0      2.0      2.64575  3.16228
 1.41421  2.23607  2.82843  3.31662
 1.73205  2.44949  3.0      3.4641 

In [45]:
x = 1:2;
y = 1:12;
dump(x)
dump(y)

UnitRange{Int64}
  start: Int64 1
  stop: Int64 2
UnitRange{Int64}
  start: Int64 1
  stop: Int64 12


In [46]:
xx = KnetArray(x)
yy = KnetArray(y)
dump(xx)
dump(yy)

Knet.KnetArray{Int64,1}
  ptr: Knet.KnetPtr
    ptr: Ptr{Void} Ptr{Void} @0x000001020d601000
    len: Int64 16
    dev: Int64 0
    parent: Void nothing
  dims: Tuple{Int64}
    1: Int64 2
Knet.KnetArray{Int64,1}
  ptr: Knet.KnetPtr
    ptr: Ptr{Void} Ptr{Void} @0x000001020d601200
    len: Int64 96
    dev: Int64 0
    parent: Void nothing
  dims: Tuple{Int64}
    1: Int64 12


In [37]:
dump(M2gpu)

Knet.KnetArray{Float64,2}
  ptr: Knet.KnetPtr
    ptr: Ptr{Void} Ptr{Void} @0x000001020d600000
    len: Int64 96
    dev: Int64 0
    parent: Void nothing
  dims: Tuple{Int64,Int64}
    1: Int64 3
    2: Int64 4


In [43]:
square(x) = x^2

M3 = mappedarray((√, square), M)

3×4 MappedArrays.MappedArray{Float64,2,Array{Int64,2},Base.#sqrt,#square}:
 1.0      2.0      2.64575  3.16228
 1.41421  2.23607  2.82843  3.31662
 1.73205  2.44949  3.0      3.4641 

In [18]:
x = KnetArray(randn(3, 4));
y = KnetArray(randn(3, 4));

In [19]:
display(x .+ y)

3×4 Knet.KnetArray{Float64,2}:
 -2.31208     0.143274   -3.45468   2.88255 
 -0.144817   -1.75026    -1.06113  -0.295943
  0.0786531  -0.0403134  -1.56271  -1.86218 

In [20]:
display(x .* y)

3×4 Knet.KnetArray{Float64,2}:
  1.29937     -0.467873   2.90532    1.88425 
 -1.71794      0.0520587  0.280033  -1.72752 
 -0.00747342  -0.115824   0.474897   0.710209

In [21]:
display(exp.(x))

3×4 Knet.KnetArray{Float64,2}:
 0.259621  2.13697   0.134351  6.55758 
 3.45662   0.970187  0.566171  0.229782
 0.945873  0.696921  0.316755  0.585546

We can also grab a matrix's transpose to compute a proper matrix-matrix product:

In [22]:
display(x * y')

3×3 Knet.KnetArray{Float64,2}:
  5.62106   3.75899   -1.60386
 -1.82654  -3.11336    2.34333
  1.40378   0.635362   1.06181

We'll explain these operations and present even more operators in the [linear algebra](P01-C03-linear-algebra.ipynb) chapter. But for now, we'll stick with the mechanics of working with Arrays/KnetArrays.

## (i) garbage collection & In-place operations

In the previous example, every time we ran an operation, we allocated new memory to host its results. For example, if we write `y = x + y`, we will dereference the matrix that `y` used to point to and instead point it at the newly allocated memory. In the following example we demonstrate this with the `.ptr` attribute, which gives us the exact address of the referenced object in memory. After running `y = y + x`, we'll find that `y.ptr` points to a different location. That's because Knet first evaluates `y + x`, allocating new memory for the result and then subsequently redirects `y` to point at this new location in memory.

In [24]:
print("ptr y: ", y.ptr, "\n")
y = y + x
print("ptr y: ", y.ptr, "\n")

ptr y: Knet.KnetPtr(Ptr{Void} @0x000001020d601e00, 96, 0, nothing)
ptr y: Knet.KnetPtr(Ptr{Void} @0x000001020d602000, 96, 0, nothing)


This might be undesirable for two reasons. First, we don't want to run around allocating memory unnecessarily all the time. In machine learning, we might have hundreds of megabytes of paramaters and update all of them multiple times per second. Typically, we'll want to perform these updates in place. Second, we might point at the same parameters from multiple variables. If we don't update in place, this could cause a memory leak, and could cause us to inadvertently reference stale parameters.
Fortunately, performing in-place operations in MXNet is easy. We can assign the result of an operation to a previously allocated array with slice notation, e.g., y[:] = < expression >.

In [25]:
print("ptr y: ", y.ptr, "\n")
y[:] = y + x
print("ptr y: ", y.ptr, "\n")

ptr y: Knet.KnetPtr(Ptr{Void} @0x000001020d602000, 96, 0, nothing)
ptr y: Knet.KnetPtr(Ptr{Void} @0x000001020d602000, 96, 0, nothing)


Knet models do not overwrite arrays which need to be preserved for
gradient calculation.  This leads to a lot of allocation and regular
GPU memory allocation is prohibitively slow. Fortunately most models
use identically sized arrays over and over again, so we can minimize
the number of actual allocations by reusing preallocated but garbage
collected pointers.
When Julia gc reclaims a KnetArray, a special finalizer keeps its
pointer in a table instead of releasing the memory.  If an array with
the same size in bytes is later requested, the same pointer is reused.
The exact algorithm for allocation is:
1. Try to find a previously allocated and garbage collected pointer in
   the current device. (0.5 μs)
2. If not available, try to allocate a new array using cudaMalloc. (10
   μs)
3. If not successful, try running gc() and see if we get a pointer of
   the right size. (75 ms, but this should be amortized over all
   reusable pointers that become available due to the gc)
4. Finally if all else fails, clean up all saved pointers in the
   current device using cudaFree and try allocation one last
   time. (25-70 ms, however this causes the elimination of all
   reusable pointers)
   
For example, if we create new arrays x and y, Knet will try to find a previously allocated and garbace collected pointer in the current device (in this case GPU 0). Since the dimensions are larger, it will allocate a new array using cudaMalloc. 

In [26]:
x = KnetArray(randn(9, 16));
y = KnetArray(randn(9, 16));

As expected, notice that the pointer of y changes. 

In [27]:
print("ptr x: ", x.ptr, "\n")
print("ptr y: ", y.ptr, "\n")
y = y + x
print("ptr x: ", x.ptr, "\n")
print("ptr y: ", y.ptr, "\n")

ptr x: Knet.KnetPtr(Ptr{Void} @0x000001020dc00000, 1152, 0, nothing)
ptr y: Knet.KnetPtr(Ptr{Void} @0x000001020dc00600, 1152, 0, nothing)
ptr x: Knet.KnetPtr(Ptr{Void} @0x000001020dc00000, 1152, 0, nothing)
ptr y: Knet.KnetPtr(Ptr{Void} @0x000001020dc00c00, 1152, 0, nothing)


We can manually run garbage collection with `gc()` to ensure the unsued pointer is collected. Then, Knet will find this pointer and ensure it has the same size as the one requested for z. Note below that indeed this is the same pointer that was previously assigned to z. 

In [28]:
gc()
z = KnetArray(randn(9, 16));
print("ptr z: ", z.ptr, "\n")

ptr z: Knet.KnetPtr(Ptr{Void} @0x000001020dc00600, 1152, 0, nothing)


## (ii) Slicing
Knet KnetArrays support slicing in all the ridiculous ways you might imagine accessing your data. Here's an example of reading the second and third rows from `x`.

In [30]:
x = KnetArray(randn(3, 4));
display(x[1:3])

3-element Knet.KnetArray{Float64,1}:
 -1.56058 
 -1.39425 
  0.131417

Now let's try writing to a specific element.

In [31]:
x[1,2] = 9.0

9.0

In [32]:
display(x)

3×4 Knet.KnetArray{Float64,2}:
 -1.56058    9.0        -1.27235   -0.613233 
 -1.39425   -0.0608202   1.94879    0.0166144
  0.131417   0.746144   -0.562271   2.08423  

Multi-dimensional slicing is also supported.

In [34]:
display(x[1:2, 1:3])

2×3 Knet.KnetArray{Float64,2}:
 -1.56058   9.0        -1.27235
 -1.39425  -0.0608202   1.94879

In [35]:
x[1:2, 1:3] = 5

5

In [36]:
display(x)

3×4 Knet.KnetArray{Float64,2}:
 5.0       5.0        5.0       -0.613233 
 5.0       5.0        5.0        0.0166144
 0.131417  0.746144  -0.562271   2.08423  

## (iii) Broadcasting

You might wonder, what happens if you add a vector `y` to a matrix `X`? These operations, where we compose a low dimensional array `y` with a high-dimensional array `X` invoke a functionality called [broadcasting](https://docs.julialang.org/en/stable/manual/functions/#man-vectorized-1). Here, the low-dimensional array is duplicated along any axis with dimension $1$ to match the shape of the high dimensional array. Broadcasting operators supported include: (.*), (.+), (.-), (./), (.<), (.<=), (.!=), (.==), (.>), (.>=), (.^), max, min. (Boolean operators generate outputs with same type as inputs; no support for KnetArray{Bool}). Consider the following example.

In [46]:
x = KnetArray(ones(3,3))
y = KnetArray(0:2.);
display(x), display(y);

3×3 Knet.KnetArray{Float64,2}:
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0

3-element Knet.KnetArray{Float64,1}:
 0.0
 1.0
 2.0

In [47]:
display(x  .+ y)

3×3 Knet.KnetArray{Float64,2}:
 1.0  1.0  1.0
 2.0  2.0  2.0
 3.0  3.0  3.0

## Converting from KnetArray to Array
Converting KnetArray to and from Array is easy. The converted arrays do not share memory:

In [49]:
b = Array(x)

3×3 Array{Float64,2}:
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0

In [50]:
c = KnetArray(b)
display(c)

3×3 Knet.KnetArray{Float64,2}:
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0

## Next
[Linear algebra](../chapter01_crashcourse/linear-algebra.ipynb)

For whinges or inquiries, [open an issue on  GitHub.](https://github.com/moralesq/Knet-the-Julia-dope)