Python for Data Analysis Part-6

File Input and Output with Arrays

NumPy is able to save and load data to and from disk either in text or binary format. In this section I only discuss NumPy’s built-in binary format, since most users will prefer pandas and other tools for loading text or tabular data.

np.save and np.load are the two workhorse functions for efficiently saving and loading array data on disk. Arrays are saved by default in an uncompressed raw binary format with file extension .npy:

In [1]: arr = np.arange(10)
In [2]: np.save('some_array', arr)

If the file path does not already end in .npy, the extension will be appended. The array on disk can then be loaded with np.load:

In [3]: np.load('some_array.npy')
Output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You save multiple arrays in an uncompressed archive using np.savez and passing the arrays as keyword arguments:

In [4]: np.savez('array_archive.npz', a=arr, b=arr)

When loading an .npz file, you get back a dict-like object that loads the individual arrays lazily:

In [5]: arch = np.load('array_archive.npz')
In [6]: arch['b']
Output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

If your data compresses well, you may wish to use numpy.savez_compressed instead:

In [7]: np.savez_compressed('arrays_compressed.npz', a=arr, b=arr)

Linear Algebra

Linear algebra, like matrix multiplication, decompositions, determinants, and other square matrix math, is an important part of any array library. Unlike some languages like MATLAB, multiplying two two-dimensional arrays with * is an element-wise product instead of a matrix dot product. Thus, there is a function dot, both an array method and a function in the numpy namespace, for matrix multiplication:

In [8]: x = np.array([[1., 2., 3.], [4., 5., 6.]])
In [9]: y = np.array([[6., 23.], [-1, 7], [8, 9]])
In [10]: x
Output:
 array([[ 1., 2., 3.],
               [ 4., 5., 6.]])
In[11]: y
Output:
 array([[ 6., 23.],
               [ -1., 7.],
               [ 8., 9.]])
In [12]: x.dot(y)
Output:
 array([[ 28., 64.],
               [ 67., 181.]])

x.dot(y) is equivalent to np.dot(x, y):

In [13]: np.dot(x, y)
Output:
 array([[ 28., 64.],
               [ 67., 181.]])

A matrix product between a two-dimensional array and a suitably sized onedimensional array results in a one-dimensional array:

In [14]: np.dot(x, np.ones(3))
Output: array([ 6., 15.])

The @ symbol (as of Python 3.5) also works as an infix operator that performs matrix multiplication:

In [15]: x @ np.ones(3)
Out[230]: array([ 6., 15.])

numpy.linalg has a standard set of matrix decompositions and things like inverse and determinant. These are implemented under the hood via the same industry standard linear algebra libraries used in other languages like MATLAB and R, such as
BLAS, LAPACK, or possibly (depending on your NumPy build) the proprietary Intel MKL (Math Kernel Library):

In [16]: from numpy.linalg import inv, qr
In [17]: X = np.random.randn(5, 5)
In [18]: mat = X.T.dot(X)
In [19]: inv(mat)
Output: array([[ 10.98129066, -15.92038594,  15.72674408,  21.1310146 ,
         -6.51087108],
       [-15.92038594,  30.06284502, -24.84925283, -37.06739528,
         11.16662613],
       [ 15.72674408, -24.84925283,  23.88020665,  33.07205801,
        -10.13130535],
       [ 21.1310146 , -37.06739528,  33.07205801,  48.09766757,
        -14.48628627],
       [ -6.51087108,  11.16662613, -10.13130535, -14.48628627,
          4.51183536]])
In [20]: mat.dot(inv(mat))
Output: array([[ 1.00000000e+00,  4.63516236e-15, -7.90247705e-16,
        -2.52654829e-14,  3.51039567e-15],
       [ 2.37423690e-15,  1.00000000e+00, -4.97047729e-16,
        -2.92387271e-15,  3.31878540e-16],
       [-2.03815975e-14, -8.25999899e-16,  1.00000000e+00,
         1.28952488e-14,  3.55561725e-16],
       [-3.09793540e-16,  1.72968381e-14, -1.04505675e-14,
         1.00000000e+00,  2.08115082e-15],
       [ 1.56525172e-15,  1.46036808e-15, -1.44537894e-14,
         1.39818361e-14,  1.00000000e+00]])
In [21]: q, r = qr(mat)
In [22]: r
Output: array([[-5.57483748, -2.84807903,  9.20903703, -4.15877657,  6.39223007],
       [ 0.        , -1.23458268,  1.32555702, -0.99111824,  2.92884267],
       [ 0.        ,  0.        , -1.02302314, -1.18714977, -6.26127069],
       [ 0.        ,  0.        ,  0.        , -1.34927574, -4.44960019],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.04472416]])

The expression X.T.dot(X) computes the dot product of X with its transpose X.T.

Python for Data Analysis Part-6

File Input and Output with Arrays

Linear Algebra

Leave a Comment Cancel Reply

Important links

Help & Support

Copyright 2024 kod2kids | All Rights Reserved

Contact Info

Email