Python for Data Analysis Part-6

File Input and Output with Arrays

NumPy is able to save and load data to and from disk either in text or binary format. In this section I only discuss NumPy’s built-in binary format, since most users will prefer pandas and other tools for loading text or tabular data.

np.save and np.load are the two workhorse functions for efficiently saving and loading array data on disk. Arrays are saved by default in an uncompressed raw binary format with file extension .npy:

In [1]: arr = np.arange(10)
In [2]: np.save('some_array', arr)

If the file path does not already end in .npy, the extension will be appended. The array on disk can then be loaded with np.load:

In [3]: np.load('some_array.npy')
Output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You save multiple arrays in an uncompressed archive using np.savez and passing the arrays as keyword arguments:

In [4]: np.savez('array_archive.npz', a=arr, b=arr)

When loading an .npz file, you get back a dict-like object that loads the individual arrays lazily:

In [5]: arch = np.load('array_archive.npz')
In [6]: arch['b']
Output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

If your data compresses well, you may wish to use numpy.savez_compressed instead:

In [7]: np.savez_compressed('arrays_compressed.npz', a=arr, b=arr)

Linear Algebra

Linear algebra, like matrix multiplication, decompositions, determinants, and other square matrix math, is an important part of any array library. Unlike some languages like MATLAB, multiplying two two-dimensional arrays with * is an element-wise product instead of a matrix dot product. Thus, there is a function dot, both an array method and a function in the numpy namespace, for matrix multiplication:

In [8]: x = np.array([[1., 2., 3.], [4., 5., 6.]])
In [9]: y = np.array([[6., 23.], [-1, 7], [8, 9]])
In [10]: x
Output:
 array([[ 1., 2., 3.],
               [ 4., 5., 6.]])
In[11]: y
Output:
 array([[ 6., 23.],
               [ -1., 7.],
               [ 8., 9.]])
In [12]: x.dot(y)
Output:
 array([[ 28., 64.],
               [ 67., 181.]])

x.dot(y) is equivalent to np.dot(x, y):

In [13]: np.dot(x, y)
Output:
 array([[ 28., 64.],
               [ 67., 181.]])

A matrix product between a two-dimensional array and a suitably sized onedimensional array results in a one-dimensional array:

In [14]: np.dot(x, np.ones(3))
Output: array([ 6., 15.])

The @ symbol (as of Python 3.5) also works as an infix operator that performs matrix multiplication:

In [15]: x @ np.ones(3)
Out[230]: array([ 6., 15.])

numpy.linalg has a standard set of matrix decompositions and things like inverse and determinant. These are implemented under the hood via the same industry standard linear algebra libraries used in other languages like MATLAB and R, such as
BLAS, LAPACK, or possibly (depending on your NumPy build) the proprietary Intel MKL (Math Kernel Library):

In [16]: from numpy.linalg import inv, qr
In [17]: X = np.random.randn(5, 5)
In [18]: mat = X.T.dot(X)
In [19]: inv(mat)
Output: array([[ 10.98129066, -15.92038594,  15.72674408,  21.1310146 ,
         -6.51087108],
       [-15.92038594,  30.06284502, -24.84925283, -37.06739528,
         11.16662613],
       [ 15.72674408, -24.84925283,  23.88020665,  33.07205801,
        -10.13130535],
       [ 21.1310146 , -37.06739528,  33.07205801,  48.09766757,
        -14.48628627],
       [ -6.51087108,  11.16662613, -10.13130535, -14.48628627,
          4.51183536]])
In [20]: mat.dot(inv(mat))
Output: array([[ 1.00000000e+00,  4.63516236e-15, -7.90247705e-16,
        -2.52654829e-14,  3.51039567e-15],
       [ 2.37423690e-15,  1.00000000e+00, -4.97047729e-16,
        -2.92387271e-15,  3.31878540e-16],
       [-2.03815975e-14, -8.25999899e-16,  1.00000000e+00,
         1.28952488e-14,  3.55561725e-16],
       [-3.09793540e-16,  1.72968381e-14, -1.04505675e-14,
         1.00000000e+00,  2.08115082e-15],
       [ 1.56525172e-15,  1.46036808e-15, -1.44537894e-14,
         1.39818361e-14,  1.00000000e+00]])
In [21]: q, r = qr(mat)
In [22]: r
Output: array([[-5.57483748, -2.84807903,  9.20903703, -4.15877657,  6.39223007],
       [ 0.        , -1.23458268,  1.32555702, -0.99111824,  2.92884267],
       [ 0.        ,  0.        , -1.02302314, -1.18714977, -6.26127069],
       [ 0.        ,  0.        ,  0.        , -1.34927574, -4.44960019],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.04472416]])

The expression X.T.dot(X) computes the dot product of X with its transpose X.T.

Function	Description
diag	Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or convert a 1D array into a square matrix with zeros on the off-diagonal
dot	Matrix multiplication
trace	Compute the sum of the diagonal elements
det	Compute the matrix determinant
eig	Compute the eigenvalues and eigenvectors of a square matrix
inv	Compute the inverse of a square matrix
pinv	Compute the Moore-Penrose pseudo-inverse of a matrix
qr	Compute the QR decomposition
svd	Compute the singular value decomposition (SVD)
solve	Solve the linear system Ax = b for x, where A is a square matrix
lstsq	Compute the least-squares solution to Ax = b

Commonly used numpy.linalg functions

Tech insights for the curious mind