## Universal Functions: Fast Element-Wise Array Functions

A universal function, or ufunc, is a function that performs element-wise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.
Many ufuncs are simple element-wise transformations, like sqrt or exp:

In [1]: arr = np.arange(10)
In [2]: arr
Output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [3]: np.sqrt(arr)
Output: array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])
In[4]: np.exp(arr)
Output: array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
2.98095799e+03, 8.10308393e+03])



These are referred to as unary ufuncs. Others, such as add or maximum, take two arrays (thus, binary ufuncs) and return a single array as the result:

In [5]: x = np.random.randn(8)
In [6]: y = np.random.randn(8)
In [7]: x
Output: array([-0.02290426, -0.08916755,  0.88118988, -0.52292579,  1.13871   ,
0.67017463,  0.88709011, -1.37270741])
In [8]: y
Output: array([ 0.45301008,  1.56853366,  1.10285681, -0.25368907,  0.05783312,
1.77564351, -0.88618735, -0.64202772])
In [9]: np.maximum(x, y)
Output: array([ 0.45301008,  1.56853366,  1.10285681, -0.25368907,  1.13871   ,
1.77564351,  0.88709011, -0.64202772])

numpy.maximum computed the element-wise maximum of the elements in x and y.
While not common, a ufunc can return multiple arrays. modf is one example, a vectorized version of the built-in Python divmod; it returns the fractional and integral parts of a floating-point array:

In [10]: arr = np.random.randn(7) * 5
In [11]: arr
Output: array([-3.06158748,  5.54673478, 12.32597205,  1.44032041,  7.36989174,
-1.54463045,  0.83924527])
In [12]: remainder, whole_part = np.modf(arr)
In [13]: remainder
Output: array([-0.06158748,  0.54673478,  0.32597205,  0.44032041,  0.36989174,
-0.54463045,  0.83924527])
In [14]: whole_part
Output: array([-3.,  5., 12.,  1.,  7., -1.,  0.])


Ufuncs accept an optional out argument that allows them to operate in-place on arrays:

In [15]: arr
Output: array([-3.06158748,  5.54673478, 12.32597205,  1.44032041,  7.36989174,
-1.54463045,  0.83924527])
In [16]: np.sqrt(arr)
Output: array([       nan, 2.35515069, 3.51083637, 1.2001335 , 2.71475445,
nan,         0  .91610331]
In [17]: np.sqrt(arr, arr)
Output: array([       nan, 2.35515069, 3.51083637, 1.2001335 , 2.71475445,
nan, 0.91610331])
In [18]: arr
Output: array([       nan, 2.35515069, 3.51083637, 1.2001335 , 2.71475445,
nan, 0.91610331])


## Array-Oriented Programming with Arrays

Using NumPy arrays enables you to express many kinds of data processing tasks as concise array expressions that might otherwise require writing loops. This practice of replacing explicit loops with array expressions is commonly referred to as vectorization. In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations.

As a simple example, suppose we wished to evaluate the function sqrt(x^2 + y^2) across a regular grid of values. The np.meshgrid function takes two 1D arrays and produces two 2D matrices corresponding to all pairs of (x, y) in the two arrays:

In [19]: points = np.arange(-5, 5, 0.01) # 1000 equally spaced points
In [20]: xs, ys = np.meshgrid(points, points)
In [21]: ys
Output:array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
[-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
[-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
...,
[ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
[ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
[ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]])

Now, evaluating the function is a matter of writing the same expression you would write with two points:

In [22]: z = np.sqrt(xs ** 2 + ys ** 2)
In [23]: z
Output: array([[7.07106781, 7.06400028, 7.05693985, ..., 7.04988652, 7.05693985,
7.06400028],
[7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815, 7.05692568],
[7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354, 7.04985815],
...,
[7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 , 7.03571603, 7.04279774],
[7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354, 7.04985815],
[7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815, 7.05692568]])

I use matplotlib to create visualizations of this two dimensional array:

In [24]: import matplotlib.pyplot as plt
In [25]: plt.imshow(z, cmap=plt.cm.gray); plt.colorbar()
Output: <matplotlib.colorbar.Colorbar at 0x2aa2ce16280>

## Expressing Conditional Logic as Array Operations

The numpy.where function is a vectorized version of the ternary expression x if condition else y. Suppose we had a boolean array and two arrays of values:

In [26]: xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
In [27]: yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
In [28]: cond = np.array([True, False, True, True, False])

Suppose we wanted to take a value from xarr whenever the corresponding value in cond is True, and otherwise take the value from yarr. A list comprehension doing this might look like:

In [29]: result = [(x if c else y)
for x, y, c in zip(xarr, yarr, cond)]
In [30]: result
Output: [1.1, 2.2, 1.3, 1.4, 2.5]

This has multiple problems. First, it will not be very fast for large arrays (because all the work is being done in interpreted Python code). Second, it will not work with multidimensional arrays. With np.where you can write this very concisely:

In [31]: result = np.where(cond, xarr, yarr)
In [32]: result
Output: array([ 1.1, 2.2, 1.3, 1.4, 2.5])

The second and third arguments to np.where don’t need to be arrays; one or both of them can be scalars. A typical use of where in data analysis is to produce a new array of values based on another array. Suppose you had a matrix of randomly generated data and you wanted to replace all positive values with 2 and all negative values with –2. This is very easy to do with np.where:

In [33]: arr = np.random.randn(4, 4)
In [34]: arr
Output: array([[-0.44193858,  0.07157006,  2.2441608 , -0.37031764],
[-0.54374974, -2.62563459,  0.35334962,  1.45375499],
[-0.09094342,  0.79745038,  0.85707545, -0.6020405 ],
[-0.61344176, -0.92205991, -0.75273804, -1.69501996]])
In [35]: arr > 0
Output: array([[False,  True,  True, False],
[False, False,  True,  True],
[False,  True,  True, False],
[False, False, False, False]])
In [36]: np.where(arr > 0, 2, -2)
Output: array([[-2,  2,  2, -2],
[-2, -2,  2,  2],
[-2,  2,  2, -2],
[-2, -2, -2, -2]])

You can combine scalars and arrays when using np.where. For example, I can replace all positive values in arr with the constant 2 like so:

In [37]: np.where(arr > 0, 2, arr) # set only positive values to 2
Output: array([[-0.44193858,  2.        ,  2.        , -0.37031764],
[-0.54374974, -2.62563459,  2.        ,  2.        ],
[-0.09094342,  2.        ,  2.        , -0.6020405 ],
[-0.61344176, -0.92205991, -0.75273804, -1.69501996]])


The arrays passed to np.where can be more than just equal-sized arrays or scalars.