Hi everyone, this Abdul from Pythonest
the easiest way to get your    code to run
faster is to make it do less work
assuming you have already choosen good
algorithms and you have reduced the
amount of data you are processing the
easiest way to execute fewer
instructions is to compile your code
down to machine code Python offers a
number of options for this but before
exposing these options I want to mention
few things here keep in mind that you
will need to take care to balance the
requirement of code adaptability and
team velocity when deciding which route
to take and it's worth noting that
performing CPU and memory profiling your
code will probably start you thinking
about higher-level optimization that you
might apply in our previous videos we
have talked about different techniques
and algorithmic changes for
optimizations these algorithmic changes
could help you avoid doing unnecessary
work in your code and by the
expressivity help you to spot these
algorithmic opportunities ok the options
we have are following the very first one
is silent this is the most commonly used
tool for compiling to see covering both
numpy and normal Python code but it
requires some knowledge of C second one
is shed skin an automatic points onto C
converter for non numpy code the third
one is Python a new compiler for both
numpy or none numpy code and the fourth
one is Namba it is a new compiler
specialized for the numpy code and the
fifth one is py py. py py is a stable
JIT(just-in-time) compiler for non numpy
code there is a replacement for the
normal Python executables but we can
split all of these options into two
sets
the first one is tools for compiling
ahead of time
AOT. Cython, ShedSkin and Pythran can fit in this category
and the second one is compiling
just-in-time JIT number and py Py can
fit into this category if you are
dealing with Python code and batteries
included libraries we damage numpy then
Cython  shutter screen and PI play are
your main choices but if you're voting
with numpy then side on number and
python are the right choices all of
these tools is supposed Python 2.7 and
Python tree also let's try to explore
the basics of some of these tools so you
can get a better idea so the very first
tool we are going to discuss here is
site on site on is a compiler that
converts type annotated Python into a
compiled extension module the type
annotation RC like this extension can be
imported as a regular Python module
using import statement getting started
with sienten is simple but it does have
a learning curve that must be climbed
with each additional level of complexity
and optimization seitan is a fork of
Pyrex which has been announced in 2002
that expands the capability beyond the
original aims of Pyrex libraries that
use site on includes sci-fi scikit-learn
Alex ml and zmq soit on can be used by a
set up dot py script to compile a module
it can also be used interactively in
ipython via a magic command typically
the type are annotated by developers
although some automated annotation is
possible lestrade explore a different
ways to utilize a site on to speed up
your code by writing some code here we
will write a function for Pythagoras
theorem if you don't know about it
this yeren states that in a right angled
triangle the sum of square of the legs
of the triangle is equal to the square
of the hypotenuse why the gory entry
paws are any three positive integers
like a b and c so
that is cure plus B square is equal to C
square you can see the diagram here so
let's write a function that find all the
plant occurring tree poise whose members
are not greater than the provided limit
okay so I will define a function named
as count and disco triples and this
function takes a limit parameter here
and inside the function I will
initialize the result variable with zero
then I will write three four loops as
stated here for a B and C then inside
the most internal loop I will say if C
square is greater than B square plus B
Square we will break the loop here
otherwise if C square is equal to the e
square plus B square we will increment
the result then outs are all of these
loops we will return the result okay
then I will define the main function to
utilize this function and inside the
main function I will start the timer and
save the time in start variable then
call the count and diskette ripest
function and I save the result in
results variable and I'm passing here
1000 as a limit and once again we will
utilize the time module to just
calculate the duration and print the
result here so you can see the result
here apparently there are 881 triples
and you can see the time here that this
function has been taken that's not too
long but long enough to be annoying if
we want to find more triples up to a
higher limit we should find a way to
make it go quicker okay that's great if
we have done with our function let's try
to utilize different way from site and
module to boost of this code the very
first option we have is boosting with PI
X import this is the easiest way to use
the site on this is a statement that
compiles your site and code on the fly
and lets you enjoy the benefits of
native optimization without too much
trouble you need to put the code to send
to a noise in its own module right Vonn
line of setup in your main program and
then import it as usual let's see what
it look like I move the function to its
own file called the point underscore
triples that P Vai ax
the extension is important for sign time
and then inside this file the line that
activates the sewing time is import PI X
import and PI X import the install then
it just imposed the module with the
countin disco tripperz function and
later invokes it in the main function
okay now if we run this function this
time you can see that actually the pure
pointing function ran 50% longer we got
this bushed by adding a single line not
bad at all but fortunately that's not
the only way this other way is available
to optimize the record using science and
module while PI X import is really
convenient during development it works
only on pure Python modules often when
optimizing code you want to reference
native C libraries our point and
extension modules to support doors and
also to avoid it dynamically compiling
on every run you can build your own site
on extension module you need to add a
little setup that py file and remember
to build it before running your program
whenever you modify the site and code so
I will create a file a setup done py and
inside his file we have to import a
setup from this tree utils core and the
second module we need to import seitan
eyes from soybean build then inside the
setup function I will provide a extender
modules equal to certain eyes and inside
this I will provide my own file which is
actually an extension module named as
white underskirt ripples that py acts
then we simply needed to build our
extension module by running a simple
command Python setup that py build
underscore ext and we have to post - -
in place so it will build in the same
location as you can see from the output
site and generated a C file called
Python disco triple dot C and compiles
it up read from specific dot iso file
which is the extension module that point
and can now import like any other native
extension module okay let's drop the
Pyke's import module and run our program
once again
you can see the result here the result
is pretty much the same as with pi x
Impala however note that I'm wearing
only the runtime of the Sentinels code
I'm not measuring how long it takes
Pike's imports to compile the Satanist
code on the fly in big programs this can
be significant that's great we have
explored two different ways to utilize a
sienten module but just hold on let's
take it to the next level so item is
more than point on an adds optional
typing here another option we have is
adding types to your coil so we are
utilizing different variables query you
can see result a B and C so let me
define the types for all of these
variables here so I will define C def
int result equal to 0 C def int a equal
to 0 and C del int B equal to 0 and same
thing for C then rest of the code will
be same also in the pint and disco
triples file in the main that P by fire
you can see the result here that's
amazing by defining a couple of integers
the program runs T much faster that's
almost a 250 times improvement that's
great
these are some of the basic ways to
utilize a slightly more deal to push off
your coil Simon can produce the two
orders of magnitude of performance
improvement for little effort if you
develop the non-trivial software in
Python seitan is a no brainer it has
very little overhead and you can
introduce it graduate your code base
that's it let's try to explore another
option which is number number from
continum Analytics is a just-in-time
compiler that specialize in numpy code
which it compiles via the LLVM compiler
at runtime
it doesn't require a pre compliation
pass so when you run it against an
in-court it compiles each annotated
function for your hardware the beauty is
that you provide a decorator telling it
which function to focus on and then you
let number take over idioms drawn on all
standard numpy code
if you use number arrays and have non
vectorized code that it trades over many
items the number should give you a quick
and very painless win one drawback when
using number is the tool chain it uses
LLVM and this has many dependencies is
recommended to install numbers by using
anacondas distribution but you can
install it using simple creep command
also per any a command as pip install
number okay
let's try to write some code to
understand the number in more detailed
way so I'll write a function to
calculate in the standard deviation the
function named as STD underscore del
which takes the list of faxes an inside
is functionally initialized the mean
with zero and loop through the axis to
calculate the mean so the main will be
equal to the sum of all of the axes
divided by the length of the exes list
then I will initialize the variance as
m/s equal to zero and loop through the
axis once again to calculate the
variance variance is actually the square
of standard deviation so if we take the
square root of the variance we will get
the standard deviation and finally we
will return it okay so let's try to run
this function with 10 million numbers
but at this stage I want to mention that
have you noticed that I'm using the GP
to not go inside the pycharm IDE I'll
create a detailed video about GP to not
books in future ok so let's try to run
this function with 10 million numbers so
I will use the numpy to generate the
list as a x2 + P - random dot normal
okay
so let's call this function you can see
the results here the function takes a
couple seconds to compute under standard
deviation of the sample now let's import
the NJIT that created from number and
decorate our STD underscore the above
function to create a new function so I
will say from number import NJIT C in
disgrace to the equal to and JIT and
inside this I will pass my own function
as STD in this cold air so at this stage
we create
a new function but utilizing our
previous function so if we call this
function and past the axis list you can
see the results here the performance
improvement might not seem striking may
be due to some overhead related with
interpreting the code in the notebook
also please keep in mind there the first
time the function is called number we'll
need to compile the function which takes
a bit of time but we can quantify the
improvement using the time-it magic
function first for the interpreted
version of the STD underscore dev
function and then for the compiled
version for a C underscore STD function
so I will just the time it magic
function and call the STD in the script
their function and pass the list of
faxes you can see the results here none
let's try to call the scene discussed
the function waited the time it magic
function you can see the result of this
function here the combined function is
approximately 100 times faster but
obviously we didn't have to go into such
trouble to compute the standard
deviation of our array because we have a
function available in the non pilot
berry if we simply call a door STD it
will return the standard deviation you
can see the result but if we calculate
the time taken by the numpy as the d
function you can see the results here we
see that number is even faster than
numpy in this particular case and from
this statement you can imagine that how
number can improve the performance of
our code this is not the number specific
tutorial as we are exploring different
options to compile over Python code to
sea level but I will make detailed and
dedicated videos for soy con number and
other options if you want me to make a
video for any specific option just let
me know in the comments below if you're
working on a numeric project then each
of these technologies could be useful
for a detailed look just take a look on
this table here number may offer quick
Vin's for little effort but it too has
limitations that might stop it working
well on your code it is also a
relatively young project site and
probably offers the best result for the
virus
set of problems but it does require more
effort and has an additional support tax
due to its use of mix of Python with C
annotations Piper is a strong option if
you're not using numpy or other hard to
policy expansions shed skin may be
useful if you want to compile to see and
you are not using numpy or other
external libraries if you're deploying a
production tool then you probably want
to stick with well-understood tools and
seitan should be your main choice
overall Python and number are young but
very promising products very seitan is
very mature Piper is regarded as being
fairly mature now and should definitely
be evaluated for long chronic processes
as a conclusion to all this explanation
I have introduced various strategies to
allow you to specialize your code to
different degrees in order to reduce the
number of instructions the CPU most
execute and increase the efficiency of
your programs sometimes this can be done
algorithmically although often it must
be done manually furthermore sometimes
these methods must be employed is simply
to use library that have already been
written in other languages regardless of
the motivation point and allows us to
benefit from the speed ups that other
languages can offer on some problems
while still maintaining ferocity and
flexibility when it is needed it is
important to note that these
optimizations are done in order to
optimize the efficiency of CPU
instructions only if you have IO board
processes coupled to a CPU point problem
simply compiling your code may not
provide any reasonable speed ups for
these problems we must rethink our
solutions and potentially use
parallelism to run different tasks at
the same time I think that's enough for
this video I hope you enjoyed the
content of this video so if you like
this content this gives a thumbs up and
be sure to subscribe to my channel and
hit the bell icon so you
never missed any fantastic theory in the
future thanks for watching
