You are currently viewing Python for Data Analysis Part-6

Python for Data Analysis Part-6

File Input and Output with Arrays

NumPy is able to save and load data to and from disk either in text or binary format. In this section I only discuss NumPy’s built-in binary format, since most users will prefer pandas and other tools for loading text or tabular data.

np.save and np.load are the two workhorse functions for efficiently saving and loading array data on disk. Arrays are saved by default in an uncompressed raw binary format with file extension .npy:

In [1]: arr = np.arange(10)
In [2]: np.save('some_array', arr)

If the file path does not already end in .npy, the extension will be appended. The array on disk can then be loaded with np.load:

In [3]: np.load('some_array.npy')
Output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You save multiple arrays in an uncompressed archive using np.savez and passing the arrays as keyword arguments:

In [4]: np.savez('array_archive.npz', a=arr, b=arr)

When loading an .npz file, you get back a dict-like object that loads the individual arrays lazily:

In [5]: arch = np.load('array_archive.npz')
In [6]: arch['b']
Output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

If your data compresses well, you may wish to use numpy.savez_compressed instead:

In [7]: np.savez_compressed('arrays_compressed.npz', a=arr, b=arr)

Linear Algebra

Linear algebra, like matrix multiplication, decompositions, determinants, and other square matrix math, is an important part of any array library. Unlike some languages like MATLAB, multiplying two two-dimensional arrays with * is an element-wise product instead of a matrix dot product. Thus, there is a function dot, both an array method and a function in the numpy namespace, for matrix multiplication:

In [8]: x = np.array([[1., 2., 3.], [4., 5., 6.]])
In [9]: y = np.array([[6., 23.], [-1, 7], [8, 9]])
In [10]: x
Output:
 array([[ 1., 2., 3.],
               [ 4., 5., 6.]])
In[11]: y
Output:
 array([[ 6., 23.],
               [ -1., 7.],
               [ 8., 9.]])
In [12]: x.dot(y)
Output:
 array([[ 28., 64.],
               [ 67., 181.]])

x.dot(y) is equivalent to np.dot(x, y):

In [13]: np.dot(x, y)
Output:
 array([[ 28., 64.],
               [ 67., 181.]])

A matrix product between a two-dimensional array and a suitably sized onedimensional array results in a one-dimensional array:

In [14]: np.dot(x, np.ones(3))
Output: array([ 6., 15.])

The @ symbol (as of Python 3.5) also works as an infix operator that performs matrix multiplication:

In [15]: x @ np.ones(3)
Out[230]: array([ 6., 15.])

numpy.linalg has a standard set of matrix decompositions and things like inverse and determinant. These are implemented under the hood via the same industry standard linear algebra libraries used in other languages like MATLAB and R, such as
BLAS, LAPACK, or possibly (depending on your NumPy build) the proprietary Intel MKL (Math Kernel Library):

In [16]: from numpy.linalg import inv, qr
In [17]: X = np.random.randn(5, 5)
In [18]: mat = X.T.dot(X)
In [19]: inv(mat)
Output: array([[ 10.98129066, -15.92038594,  15.72674408,  21.1310146 ,
         -6.51087108],
       [-15.92038594,  30.06284502, -24.84925283, -37.06739528,
         11.16662613],
       [ 15.72674408, -24.84925283,  23.88020665,  33.07205801,
        -10.13130535],
       [ 21.1310146 , -37.06739528,  33.07205801,  48.09766757,
        -14.48628627],
       [ -6.51087108,  11.16662613, -10.13130535, -14.48628627,
          4.51183536]])
In [20]: mat.dot(inv(mat))
Output: array([[ 1.00000000e+00,  4.63516236e-15, -7.90247705e-16,
        -2.52654829e-14,  3.51039567e-15],
       [ 2.37423690e-15,  1.00000000e+00, -4.97047729e-16,
        -2.92387271e-15,  3.31878540e-16],
       [-2.03815975e-14, -8.25999899e-16,  1.00000000e+00,
         1.28952488e-14,  3.55561725e-16],
       [-3.09793540e-16,  1.72968381e-14, -1.04505675e-14,
         1.00000000e+00,  2.08115082e-15],
       [ 1.56525172e-15,  1.46036808e-15, -1.44537894e-14,
         1.39818361e-14,  1.00000000e+00]])
In [21]: q, r = qr(mat)
In [22]: r
Output: array([[-5.57483748, -2.84807903,  9.20903703, -4.15877657,  6.39223007],
       [ 0.        , -1.23458268,  1.32555702, -0.99111824,  2.92884267],
       [ 0.        ,  0.        , -1.02302314, -1.18714977, -6.26127069],
       [ 0.        ,  0.        ,  0.        , -1.34927574, -4.44960019],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.04472416]])

The expression X.T.dot(X) computes the dot product of X with its transpose X.T.

FunctionDescription
diag Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or convert a 1D array into a square matrix with zeros on the off-diagonal
dotMatrix multiplication
traceCompute the sum of the diagonal elements
detCompute the matrix determinant
eigCompute the eigenvalues and eigenvectors of a square matrix
invCompute the inverse of a square matrix
pinvCompute the Moore-Penrose pseudo-inverse of a matrix
qrCompute the QR decomposition
svdCompute the singular value decomposition (SVD)
solveSolve the linear system Ax = b for x, where A is a square matrix
lstsqCompute the least-squares solution to Ax = b
Commonly used numpy.linalg functions

Leave a Reply