Python for Data Analysis Part-6

File Input and Output with Arrays

NumPy is able to save and load data to and from disk either in text or binary format. In this section I only discuss NumPy’s built-in binary format, since most users will prefer pandas and other tools for loading text or tabular data.

np.save and np.load are the two workhorse functions for efficiently saving and loading array data on disk. Arrays are saved by default in an uncompressed raw binary format with file extension .npy:

In [1]: arr = np.arange(10)
In [2]: np.save('some_array', arr)

If the file path does not already end in .npy, the extension will be appended. The array on disk can then be loaded with np.load:

In [3]: np.load('some_array.npy')
Output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You save multiple arrays in an uncompressed archive using np.savez and passing the arrays as keyword arguments:

In [4]: np.savez('array_archive.npz', a=arr, b=arr)

When loading an .npz file, you get back a dict-like object that loads the individual arrays lazily:

In [5]: arch = np.load('array_archive.npz')
In [6]: arch['b']
Output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

If your data compresses well, you may wish to use numpy.savez_compressed instead:

In [7]: np.savez_compressed('arrays_compressed.npz', a=arr, b=arr)

Linear Algebra

Linear algebra, like matrix multiplication, decompositions, determinants, and other square matrix math, is an important part of any array library. Unlike some languages like MATLAB, multiplying two two-dimensional arrays with * is an element-wise product instead of a matrix dot product. Thus, there is a function dot, both an array method and a function in the numpy namespace, for matrix multiplication:

In [8]: x = np.array([[1., 2., 3.], [4., 5., 6.]])
In [9]: y = np.array([[6., 23.], [-1, 7], [8, 9]])
In [10]: x
Output:
 array([[ 1., 2., 3.],
               [ 4., 5., 6.]])
In[11]: y
Output:
 array([[ 6., 23.],
               [ -1., 7.],
               [ 8., 9.]])
In [12]: x.dot(y)
Output:
 array([[ 28., 64.],
               [ 67., 181.]])

x.dot(y) is equivalent to np.dot(x, y):

In [13]: np.dot(x, y)
Output:
 array([[ 28., 64.],
               [ 67., 181.]])

A matrix product between a two-dimensional array and a suitably sized onedimensional array results in a one-dimensional array:

In [14]: np.dot(x, np.ones(3))
Output: array([ 6., 15.])

The @ symbol (as of Python 3.5) also works as an infix operator that performs matrix multiplication:

In [15]: x @ np.ones(3)
Out[230]: array([ 6., 15.])

numpy.linalg has a standard set of matrix decompositions and things like inverse and determinant. These are implemented under the hood via the same industry standard linear algebra libraries used in other languages like MATLAB and R, such as
BLAS, LAPACK, or possibly (depending on your NumPy build) the proprietary Intel MKL (Math Kernel Library):

In [16]: from numpy.linalg import inv, qr
In [17]: X = np.random.randn(5, 5)
In [18]: mat = X.T.dot(X)
In [19]: inv(mat)
Output: array([[ 10.98129066, -15.92038594,  15.72674408,  21.1310146 ,
         -6.51087108],
       [-15.92038594,  30.06284502, -24.84925283, -37.06739528,
         11.16662613],
       [ 15.72674408, -24.84925283,  23.88020665,  33.07205801,
        -10.13130535],
       [ 21.1310146 , -37.06739528,  33.07205801,  48.09766757,
        -14.48628627],
       [ -6.51087108,  11.16662613, -10.13130535, -14.48628627,
          4.51183536]])
In [20]: mat.dot(inv(mat))
Output: array([[ 1.00000000e+00,  4.63516236e-15, -7.90247705e-16,
        -2.52654829e-14,  3.51039567e-15],
       [ 2.37423690e-15,  1.00000000e+00, -4.97047729e-16,
        -2.92387271e-15,  3.31878540e-16],
       [-2.03815975e-14, -8.25999899e-16,  1.00000000e+00,
         1.28952488e-14,  3.55561725e-16],
       [-3.09793540e-16,  1.72968381e-14, -1.04505675e-14,
         1.00000000e+00,  2.08115082e-15],
       [ 1.56525172e-15,  1.46036808e-15, -1.44537894e-14,
         1.39818361e-14,  1.00000000e+00]])
In [21]: q, r = qr(mat)
In [22]: r
Output: array([[-5.57483748, -2.84807903,  9.20903703, -4.15877657,  6.39223007],
       [ 0.        , -1.23458268,  1.32555702, -0.99111824,  2.92884267],
       [ 0.        ,  0.        , -1.02302314, -1.18714977, -6.26127069],
       [ 0.        ,  0.        ,  0.        , -1.34927574, -4.44960019],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.04472416]])

The expression X.T.dot(X) computes the dot product of X with its transpose X.T.

Leave a Comment

Your email address will not be published. Required fields are marked *

Shopping Cart

Join us for an exciting adventure into the world of robotics and coding! Our Robotics and Coding Class for Kids is designed to inspire young minds, foster creativity, and ignite a passion for technology.

Contact Info

Email

info@kod2kids.com