NumPy Functions Composed (2024)

Compare Fast Inverse Square Root Method to NumPy ufuncs, Numba JIT, and Cython — Which One Wins?


This article was originally published on medium.com.

Posted on behalf of: Bob Chesebrough, Solutions Architect, Intel Corporation

This article shows the efficient variations in the way we approach computingReciprocal Sqrtand these approaches can perform 3 to 4 orders of magnitude faster than older methods.

The approach you choose depends on your need foraccuracy, speed, reliability, maintainability. See which one scores highest in your needs assessment!

NumPy Functions Composed (1)

What is reciprocal sqrt?

It is a function composed of two operations:

  1. Reciprocal (Multiplicative Inverse)
  2. Square root of a value or vector

Where is it used?

Physics/ Engineering

  • Partial derivative of distance formula with respect to one dimensions, as follows:

NumPy Functions Composed (2)

Special Relativity Lorentz transformation

  • The gamma coefficient is used to compute time dilation or length contraction among other calculations in special relativity

NumPy Functions Composed (3)

3D Graphics or ML applications needing normalization in space

  • Vector normalization in video games and is mostly used in calculations involved in 3D programming — This is why the Fast Reciprocal Sqrt was invented for Quake III

This is an interesting topic, as it intersects recent history of algorithms, hardware and software library implementations. Methods for computing this quickly even as an approximation at first outstripped the instructions sets on modern x86 architecture. Two Berkeley guys,William Kahanand K.C. Ng wrote an unpublished paper in May 1986 describing how to calculate the square root using bit-fiddling techniques followed by Newton iterations. This was picked up by Cleve Moller and company of MATLAB* fame, Cleve’s associate Greg Walsh devised the now-famous constant and fast inverse square root algorithm. Read the fascinating history onWikipedia.

It is still common when searching on the web and you will find many articles and advise on using this algorithm today. This algorithm is an ingenious application of Newton-Rapson method to get fast approximations of this 1/sqrt().

Then the introduction by Intel of the Pentium III in 1999 saw a new Streaming SIMD Extensions (SSE) instruction which would compute reciprocal sqrt as part of SSE instruction set — a vectorized instruction!

What is the best algorithm now?

That all depends on perspective. The Fast Reciprocal Sqrt is a clever trick, and from what I see its accuracy is with about 1% of the actual value. SSE and Advanced Vector Extensions (AVX) are vectorized instructions that are based on IEEE floating point standards, but this is also an approximation. Its accuracy depends on the nature of the data type you choose but is typically far better than 1%. When dealing floating point calculations such as these — the order of operations and how you accumulate partial sums etc matter. So let me provide you the code and a little discussion of what I found, and you be the judge!

What I test in this notebook? (see GitHub link at the end)

In this workbook, we will test the following approaches:

PyTorch rsqrt:

  • Use torch built in function rsqrt()

NumPy_Compose RecipSqrt

  • Use NumPy np.reciprocal(np.sqrt())

NumPy_Simple

  • Use NumPy implicit vectors: b = a**-.5

Cython_Simple

  • Use Cython variant of Simple a**-.5

Numba_Simple

  • Use Numba njit variant of Simple a**-.5

BruteForceLoopExact

  • Brute force loop approach no vectorization at all

Fast Reciprocal Sqrt Newton-Raphson simple Loop

  • Fast Reciprocal Sqrt using Newton Raphson and Quake III approach

Fast Reciprocal Sqrt Newton-Raphson Vectorized

  • Fast Reciprocal Sqrt using Newton Raphson and Quake III approach vectorized with np.apply

Fast Reciprocal Sqrt Newton-Raphson Cython

  • Fast Reciprocal Sqrt using Newton Raphson and Quake III approach in Cython.

Where did the results finish?

It depends on the platform

On the new Intel® Tiber™ Developer Cloud: Ubuntu 22, I see the following (Intel® Xeon® Platinum 8480L, 224 core, 503GB RAM)

NumPy Functions Composed (4)

Testing Various algorithms and Optimizations for Inverse Square Root

My results tend to align with the observation of Doug Woo’s article, and I see orders of magnitude speedup using built in functions in NumPy and PyTorch.

Get the code for this article and the rest of the series is located onGitHub.

Next Steps

Try out this code sample using the standard free Intel Developer Cloud account and the ready-made Jupyter Notebook.

We encourage you to also check out and incorporate Intel’s other AI/ML Framework optimizations and end-to-end portfolio of tools into your AI workflow and learn about the unified, open, standards-based oneAPI programming model that forms the foundation of Intel’s AI Software Portfolio to help you prepare, build, deploy, and scale your AI solutions.

Intel Developer Cloud System Configuration as tested:

x86_64, CPU op-mode(s): 32-bit, 64-bit, Address sizes: 52 bits physical, 57 bits virtual, Byte Order: Little Endian, CPU(s): 224, On-line CPU(s) list: 0–223, Vendor ID: GenuineIntel, Model name: Intel® Xeon® Platinum 8480+, CPU family: 6, Model: 143, Thread(s) per core: 2, Core(s) per socket: 56, Socket(s): 2, Stepping: 8, CPU max MHz: 3800.0000, CPU min MHz: 800.0000

NumPy Functions Composed (2024)

FAQs

How is NumPy so efficient? ›

NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently. This behavior is called locality of reference in computer science. This is the main reason why NumPy is faster than lists.

What is the advantage of using the NumPy version of mathematical functions rather than the ordinary Python versions? ›

NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python's built-in sequences.

How many functions are there in NumPy? ›

Arithmetic Functions –
FUNCTIONDESCRIPTION
negative()Numerical negative, element-wise.
multiply()Multiply arguments element-wise.
power()First array elements raised to powers from second array, element-wise.
subtract()Subtract arguments, element-wise.
8 more rows
Jun 10, 2024

What makes computation on NumPy arrays so fast? ›

The following are the main reasons behind the fast speed of Numpy. Numpy array is a collection of similar data-types that are densely packed in memory. A Python list can have different data-types, which puts lots of extra constraints while doing computation on it.

What are the weaknesses of NumPy? ›

NumPy and Python lists are both mutable -- array contents can be appended, extended and combined. However, NumPy is inefficient in handling such tasks, and routines designed to change, add, combine or delete data within the array can suffer performance limitations because of how NumPy allocates and uses memory.

Is NumPy more efficient than Pandas? ›

The performance of Pandas vs NumPy is context-dependent. For tasks involving complex data manipulations, particularly with structured data, Pandas is the preferred choice. However, when it comes to heavy numerical computations, especially involving large arrays, NumPy is significantly faster and more memory-efficient.

Why are NumPy functions faster than loops? ›

A big part of NumPy's speed comes from using machine-native datatypes, instead of Python's object types. But the other big reason NumPy is fast is because it provides ways to work with arrays without having to individually address each element.

Why is NumPy so powerful? ›

Benefits of NumPy

NumPy's powerful N-dimensional array object integrates with a wide variety of libraries. NumPy arrays can execute advanced mathematical operations with large data sets more efficiently and with less code than when using Python's built-in lists.

Why use NumPy instead of math? ›

Use math if you are doing simple computations with only scalars (and no lists or arrays). Use numpy if you are doing scientific computations with matrices, arrays, or large datasets.

What is the maximum function in NumPy? ›

numpy. maximum() returns the maximum of x1 and x2 element-wise. The return type is ndarray or scaler depending on the input. If one of the elements being compared is NaN (Not a Number), then NaN is returned.

What is the main function of NumPy? ›

NumPy functions are used to create, manipulate, and analyze NumPy arrays. The syntax of NumPy functions generally involves calling the function and passing one or more parameters, such as the shape of the array, the range of values to generate, or the type of data to use.

What is the full function in NumPy? ›

full() function is used to return a new array of a given shape and data type filled with fill_value .

How to speed up NumPy functions? ›

Instead of using loops, apply operations directly to NumPy arrays or use NumPy's built-in functions, which are designed to be vectorized. For example, use np. sum() to sum elements instead of a for loop. This takes advantage of NumPy's speed and can result in significant performance improvements.

Is NumPy always faster than Python? ›

NumPy is a library that provides a high-performance multidimensional array object and tools for working with these arrays. NumPy is faster than pure python because it is written in C and uses the BLAS (Basic Linear Algebra Subprograms) library, which provides fast basic operations on arrays.

Why is NumPy better than Python list? ›

There are two main reasons why we would use NumPy array instead of lists in Python. These reasons are: Less memory usage: The Python NumPy array consumes less memory than lists. Less execution time: The NumPy array is pretty fast in terms of execution, as compared to lists in Python.

Why is NumPy faster than Python? ›

A big part of NumPy's speed comes from using machine-native datatypes, instead of Python's object types. But the other big reason NumPy is fast is because it provides ways to work with arrays without having to individually address each element.

What is the advantage of using NumPy? ›

NumPy arrays provide a significant speed advantage over traditional lists. This is because NumPy is built in C and uses optimized, pre-compiled code for numerical operations. This allows for vectorization, which means operations are applied to all elements in an array without the need for explicit loops in Python.

How is NumPy optimized? ›

NumPy allows arrays to only have a single data type and stores the data internally in a contiguous block of memory. Taking advantage of this fact, NumPy delegates most of the operations on such arrays to optimized, pre-compiled C code under the hood.

Top Articles
Latest Posts
Article information

Author: Sen. Emmett Berge

Last Updated:

Views: 5679

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Sen. Emmett Berge

Birthday: 1993-06-17

Address: 787 Elvis Divide, Port Brice, OH 24507-6802

Phone: +9779049645255

Job: Senior Healthcare Specialist

Hobby: Cycling, Model building, Kitesurfing, Origami, Lapidary, Dance, Basketball

Introduction: My name is Sen. Emmett Berge, I am a funny, vast, charming, courageous, enthusiastic, jolly, famous person who loves writing and wants to share my knowledge and understanding with you.