summaryrefslogtreecommitdiffstatshomepage
path: root/docs/reference/speed_python.rst
diff options
context:
space:
mode:
authorDamien George <damien.p.george@gmail.com>2017-08-30 11:04:47 +1000
committerDamien George <damien.p.george@gmail.com>2017-08-30 11:04:47 +1000
commitc7d334e047ce5a36bd6d4979a5331719dd480a2b (patch)
tree3dbb206e3075eb5f1778389f1399c28d3fd88d34 /docs/reference/speed_python.rst
parent25e24b2c3c63e035b4c145f743f5cd7b02a23fc0 (diff)
parent1f78e7a43130acfa4bedf16c1007a1b0f37c75c3 (diff)
downloadmicropython-c7d334e047ce5a36bd6d4979a5331719dd480a2b.tar.gz
micropython-c7d334e047ce5a36bd6d4979a5331719dd480a2b.zip
Merge tag 'v1.9.2' into parse-bytecode
Double precision math library and support on pyboard, and improved ussl This release brings general improvements and bug fixes to the core and various ports, as well as documentation additions, clean-ups and better consistency. And effort has been made to clean up the source code to make it more consistent across the core and all ports. There is a new tool "mpy_bin2res.py" to convert arbitrary (binary) files to Python resources for inclusion in source code (frozen or otherwise). The ussl module has seen improvements, including implementation of server_hostname (for axtls) and server_side mode (for mbedtls). There is now a double-precision float math library and stmhal has support to build firmware with software or hardware double-precision. A detailed list of changes follows. py core: - formatfloat: fix number of digits and exponent sign when rounding - modthread: raise RuntimeError in release() if lock is not acquired - compile: raise SyntaxError if positional args are given after */** - objint: support "big" byte-order in int.to_bytes() - objint: in to_bytes(), allow length arg to be any int and check sign - compile: fix bug with break/continue in else of optimised for-range - compile: optimise emitter label indices to save a word of heap - builtinimport: remove unreachable code for relative imports - objnamedtuple: simplify and remove use of alloca building namedtuple - mpprint: remove unreachable check for neg return of mp_format_float - binary: add missing "break" statements - runtime: mark m_malloc_fail() as NORETURN - objstr: remove unnecessary "sign" variable in formatting code - vm: make "if" control flow more obvious in YIELD_FROM opcode - modmath: check for zero division in log with 2 args - makeversionhdr.py: update to parse new release line in docs/conf.py - objdict: factorise dict accessor helper to reduce code size - change mp_uint_t to size_t in builtins code - repl: change mp_uint_t to size_t in repl helpers - compile: combine arith and bit-shift ops into 1 compile routine - compile: use switch-case to match token and operator - objgenerator: allow to hash generators and generator instances - gc: refactor assertions in gc_free function - vm: make n_state variable local to just set-up part of VM - asmx64: support moving a 64-bit immediate to one of top 8 registers - modmicropython: cast stack_limit value so it prints correctly - builtinevex: add typechecking of globals/locals args to eval/exec - py.mk: make berkeley-db C-defs apply only to relevant source files - mperrno: allow mperrno.h to be correctly included before other hdrs - mpz: make mpz_is_zero() an inline function - implement raising a big-int to a negative power - mkrules.mk: show frozen modules sizes together with executable size - objtuple: allow to use inplace-multiplication operator on tuples - objstr: raise an exception for wrong type on RHS of str binary op - modsys: initial implementation of sys.getsizeof() - binary.c: fix bug when packing big-endian 'Q' values - add verbose debug compile-time flag MICROPY_DEBUG_VERBOSE - binary: change internal bytearray typecode from 0 to 1 - objstringio: prevent offset wraparound for io.BytesIO objects - objstringio: fix regression with handling SEEK_SET - stream: seek: Consistently handle negative offset for SEEK_SET - mkrules.mk: use "find -path" when searching for frozen obj files - compile: remove unused pn_colon code when compiling func params - objcomplex: remove unnecessary assignment of variable - formatfloat: don't post-increment variable that won't be used again - use "static inline" for funcs that should be inline - asmthumb: use existing macro to properly clear the D-cache extmod: - modussl_axtls: update for axTLS 2.1.3 - modussl_axtls: implement server_hostname arg to wrap_socket() - move modonewire.c from esp8266 to extmod directory - modure: if input string is bytes, return bytes results too - modubinascii: add check for empty buffer passed to hexlify - modussl_axtls: allow to close ssl stream multiple times - modussl_mbedtls: support server_side mode - modussl_mbedtls: when reading and peer wants to close, return 0 - modframebuf: fix invalid stride for odd widths in GS4_HMSB fmt - modussl_mbedtls: make socket.close() free all TLS resources - modframebuf: consistently use "col" as name for colour variables - modussl_mbedtls: implement non-blocking SSL sockets - machine_signal: fix parsing of invert arg when Pin is first arg - modframebuf: use correct initialization for .locals_dict - modlwip: implement setsockopt(IP_ADD_MEMBERSHIP) - modussl_mbedtls.c: add ussl.getpeercert() method - modubinascii: rewrite mod_binascii_a2b_base64 - modubinascii: don't post-increment variable that won't be used - modonewire: rename public module to mp_module_onewire - for uos.stat interpret st_size member as an unsigned int - use "static inline" for funcs that should be inline lib: - axtls: upgrade to axTLS 2.1.3 + MicroPython patchset - libm/math: remove implementations of float conversion functions - add libm_dbl, a double-precision math library, from musl-1.1.16 drivers: - onewire: move onewire.py, ds18x20.py from esp8266 to drivers - onewire: enable pull-up when init'ing the 1-wire pin tools: - gen-cpydiff: use case description as 3rd-level heading - pyboard: add license header - mpy_bin2res: tools to convert binary resources to Python module - mpy-tool.py: don't generate const_table if it's empty - mpy-tool.py: fix missing argument in dump() function tests: - net_inet/test_tls_sites.py: integration test for SSL connections - net_inet: add tests for accept and connect in nonblocking mode - basics: add tests for for-else statement - net_inet: move tests which don't require full Internet to net_hosted - connect_nonblock: refactor towards real net_hosted test - auto detect floating point capabilites of the target - import: add a test for the builtin __import__ function - import: update comment now that uPy raises correct exception - basics/namedtuple1: add test for creating with pos and kw args - unix/extra_coverage: add test for mp_vprintf with bad fmt spec - basics: add tests for arithmetic operators precedence - cpydiff/modules_deque: elaborate workaround - cpydiff/core_class_mro: move under Classes, add workaround - cpydiff/core_arguments: move under Functions subsection - cpydiff/core_class_supermultiple: same cause as core_class_mro - cpydiff: improve wording, add more workarounds - cpydiff: add case for str.ljust/rjust - rename exec1.py to builtin_exec.py - basics/builtin_exec: test various globals/locals args to exec() minimal port: - Makefile: enable gc-sections to remove unused code - remove unused stmhal include from Makefile - use size_t for mp_builtin_open argument unix port: - modtime: replace strftime() with localtime() - mpconfigport.mk: update descriptions of readline and TLS options - Makefile: disable assertions in the standard unix executable - modjni: convert to mp_rom_map_elem_t - for uos.stat interpret st_size member as an unsigned int stmhal port: - mpconfigport.h: remove config of PY_THREAD_GIL to use default - make error messages more consistent across peripherals - add initial implementation of Pin.irq() method - add .value() method to Switch object, to mirror Pin and Signal - move pybstdio.c to lib/utils/sys_stdio_mphal.c for common use - add "quiet timing" enter/exit functions - make available the _onewire module, for low-level bus control - modules: provide sym-link to onewire.py driver - boards/stm32f405.ld: increase FLASH_TEXT to end of 1MiB flash - sdcard: allow a board to customise the SDIO pins - add possibility to build with double-precision floating point - boards: enable double-prec FP on F76x boards - Makefile: use hardware double-prec FP for MCUs that support it - Makefile: rename FLOAT_IMPL to MICROPY_FLOAT_IMPL to match C name - Makefile: add CFLAGS_EXTRA to CFLAGS so cmdline can add options - mpconfigport.h: allow MICROPY_PY_THREAD to be overridden - boards: add configuration files for NUCLEO_F429ZI - boards/NUCLEO_F429ZI: change USB config from HS to FS peripheral - reduce size of ESPRUINO_PICO build so it fits in flash - servo: make pyb.Servo(n) map to Pin('Xn') on all MCUs - servo: don't compile servo code when it's not enabled - use "static inline" for funcs that should be inline cc3200 port: - modusocket: simplify socket.makefile() function - make non-zero socket timeout work with connect/accept/send - modusocket: fix connect() when in non-blocking or timeout mode - use the name MicroPython consistently in code esp8266 port: - Makefile: bump axTLS TLS record buffer size to 5K - Makefile: allow FROZEN_DIR,FROZEN_MPY_DIR to be overridden - Makefile: add LIB_SRC_C variable to qstr auto-extraction list - make onewire module and support code usable by other ports - modonewire: move low-level 1-wire bus code to modonewire.c - modonewire: make timings static and remove onewire.timings func - reinstate 1-wire scripts by sym-linking to drivers/onewire/ - move mp_hal_pin_open_drain from esp_mphal.c to machine_pin.c - enable MICROPY_ENABLE_FINALISER - README: make "Documentation" a top-level section - machine_rtc: use correct arithmetic for aligning RTC mem len - mpconfigport_512k: use terse error messages to get 512k to fit - mpconfigport.h: make socket a weak link - modesp: remove unused constants: STA_MODE, etc - general: add known issue of WiFi RX buffers overflow - use size_t for mp_builtin_open argument - fix UART stop bit constants zephyr port: - Makefile: rework dependencies and "clean" target - Makefile: revert prj.conf construction rule to the previous state - remove long-obsolete machine_ptr_t typedef's - Makefile: explicitly define default target as "all" - modusocket: allow to use socketized net_context in upstream - modusocket: socket, close: switch to native Zephyr socket calls - modusocket: bind, connect, listen, accept: Swtich to native sockets - modusocket: send: switch to native sockets - modusocket: recv: switch to native sockets - modusocket: fully switch to native Zephyr sockets - modzephyr: add current_tid() and stacks_analyze() functions - prj_base.conf: enable CONFIG_INIT_STACKS - modusocket: update struct sockaddr family field name - prj_96b_carbon.conf: re-enable networking on Carbon - modzephyr: add shell_net_iface() function docs: - btree: add hints about opening db file and need to flush db - select: rename to uselect, to match the actual module name - license: update copyright year - esp8266/tutorial/intro: discourage use of 512kb firmwares - esp8266/tutorial/intro: Sphinx requires blank lines around literal blocks - conf.py: include 3 levels of ToC in latexpdf output - gc: mark mem_alloc()/mem_free() as uPy-specific - gc: document gc.threshold() function - builtins: list builtin exceptions - conf.py: set default_role = 'any' - lcd160cr: group related constants together and use full sentences - ref/speed_python: update and make more hardware-neutral - library/gc: fix grammar and improve readability of gc.threshold() - move all ports docs to the single ToC - topindex.html: remove link to wipy.io, it's no longer available - conf.py: add .venv dir to exclude_patterns - move topindex.html to templates/ subdir - differences/index_template: use consistent heading casing - builtins: add AssertionError, SyntaxError, ZeroDivisionError - add glossary - conf.py: switch to "new" format of intersphinx_mapping - conf.py: add file for global replacements definition - library: add CPython docs xref to each pertinent module - replace.inc: add |see_cpython|, to xref individual symbols from CPython - conf.py: set "version" and "release" to the same value - *_index: drop "Indices and tables" pseudo-section - pyboard: move hardware info into General Info chapter - uerrno: document "uerrno" module - esp8266/general.rst: fix name of NTP module - pyboard: move info about using Windows from topindex to general - uzlib: update description of decompress() and mention DecompIO - pyboard/tutorial/amp_skin: add example for playing large WAV files - library/ubinascii: update base64 docs - library/usocket: move socket.error to its own section - library/usocket: describe complete information on address formats - glossary: elaborate on possible MicroPython port differences - glossary: fix typos in micropython-lib paragraph - index: rewrite introduction paragraph to avoid confusion - use the name MicroPython consistently in documentation - consistently link to micropython-lib in glossary all: - make more use of mp_raise_{msg,TypeError,ValueError} helpers - unify header guard usage - remove trailing spaces, per coding conventions - don't include system errno.h when it's not needed - use the name MicroPython consistently in comments - make use of $(TOP) variable in Makefiles, instead of ".." - raise exceptions via mp_raise_XXX - make static dicts use mp_rom_map_elem_t type and MP_ROM_xxx macros README: - mention support for bytecode and frozen bytecode - improve description of precompiled bytecode; mention mpy-cross CODECONVENTIONS: - clarify MicroPython changes sign-off process - start to describe docs conventions - describe docs use of markup for None/True/False travis: - build STM32F769DISC board instead of F7DISC to test dbl-prec FP - pin cpp-coveralls at 0.3.12
Diffstat (limited to 'docs/reference/speed_python.rst')
-rw-r--r--docs/reference/speed_python.rst102
1 files changed, 56 insertions, 46 deletions
diff --git a/docs/reference/speed_python.rst b/docs/reference/speed_python.rst
index 8efba4702b..279a1bbcdc 100644
--- a/docs/reference/speed_python.rst
+++ b/docs/reference/speed_python.rst
@@ -1,9 +1,11 @@
-Maximising Python Speed
-=======================
+Maximising MicroPython Speed
+============================
+
+.. contents::
This tutorial describes ways of improving the performance of MicroPython code.
Optimisations involving other languages are covered elsewhere, namely the use
-of modules written in C and the MicroPython inline ARM Thumb-2 assembler.
+of modules written in C and the MicroPython inline assembler.
The process of developing high performance code comprises the following stages
which should be performed in the order listed.
@@ -17,6 +19,7 @@ Optimisation steps:
* Improve the efficiency of the Python code.
* Use the native code emitter.
* Use the viper code emitter.
+* Use hardware-specific optimisations.
Designing for speed
-------------------
@@ -50,7 +53,7 @@ once only and not permitted to grow in size. This implies that the object persis
for the duration of its use: typically it will be instantiated in a class constructor
and used in various methods.
-This is covered in further detail :ref:`Controlling garbage collection <gc>` below.
+This is covered in further detail :ref:`Controlling garbage collection <controlling_gc>` below.
Buffers
~~~~~~~
@@ -60,8 +63,8 @@ used for communication with a device. A typical driver will create the buffer in
constructor and use it in its I/O methods which will be called repeatedly.
The MicroPython libraries typically provide support for pre-allocated buffers. For
-example, objects which support stream interface (e.g., file or UART) provide ``read()``
-method which allocate new buffer for read data, but also a ``readinto()`` method
+example, objects which support stream interface (e.g., file or UART) provide `read()`
+method which allocates new buffer for read data, but also a `readinto()` method
to read data into an existing buffer.
Floating Point
@@ -79,14 +82,14 @@ Arrays
~~~~~~
Consider the use of the various types of array classes as an alternative to lists.
-The ``array`` module supports various element types with 8-bit elements supported
-by Python's built in ``bytes`` and ``bytearray`` classes. These data structures all store
+The `array` module supports various element types with 8-bit elements supported
+by Python's built in `bytes` and `bytearray` classes. These data structures all store
elements in contiguous memory locations. Once again to avoid memory allocation in critical
code these should be pre-allocated and passed as arguments or as bound objects.
-When passing slices of objects such as ``bytearray`` instances, Python creates
+When passing slices of objects such as `bytearray` instances, Python creates
a copy which involves allocation of the size proportional to the size of slice.
-This can be alleviated using a ``memoryview`` object. ``memoryview`` itself
+This can be alleviated using a `memoryview` object. `memoryview` itself
is allocated on heap, but is a small, fixed-size object, regardless of the size
of slice it points too.
@@ -97,7 +100,7 @@ of slice it points too.
mv = memoryview(ba) # small object is allocated
func(mv[30:2000]) # a pointer to memory is passed
-A ``memoryview`` can only be applied to objects supporting the buffer protocol - this
+A `memoryview` can only be applied to objects supporting the buffer protocol - this
includes arrays but not lists. Small caveat is that while memoryview object is live,
it also keeps alive the original buffer object. So, a memoryview isn't a universal
panacea. For instance, in the example above, if you are done with 10K buffer and
@@ -105,11 +108,11 @@ just need those bytes 30:2000 from it, it may be better to make a slice, and let
the 10K buffer go (be ready for garbage collection), instead of making a
long-living memoryview and keeping 10K blocked for GC.
-Nonetheless, ``memoryview`` is indispensable for advanced preallocated buffer
-management. ``.readinto()`` method discussed above puts data at the beginning
+Nonetheless, `memoryview` is indispensable for advanced preallocated buffer
+management. `readinto()` method discussed above puts data at the beginning
of buffer and fills in entire buffer. What if you need to put data in the
middle of existing buffer? Just create a memoryview into the needed section
-of buffer and pass it to ``.readinto()``.
+of buffer and pass it to `readinto()`.
Identifying the slowest section of code
---------------------------------------
@@ -118,8 +121,7 @@ This is a process known as profiling and is covered in textbooks and
(for standard Python) supported by various software tools. For the type of
smaller embedded application likely to be running on MicroPython platforms
the slowest function or method can usually be established by judicious use
-of the timing ``ticks`` group of functions documented
-`here <http://docs.micropython.org/en/latest/pyboard/library/time.html>`_.
+of the timing ``ticks`` group of functions documented in `utime`.
Code execution time can be measured in ms, us, or CPU cycles.
The following enables any function or method to be timed by adding an
@@ -130,9 +132,9 @@ The following enables any function or method to be timed by adding an
def timed_function(f, *args, **kwargs):
myname = str(f).split(' ')[1]
def new_func(*args, **kwargs):
- t = time.ticks_us()
+ t = utime.ticks_us()
result = f(*args, **kwargs)
- delta = time.ticks_diff(time.ticks_us(), t)
+ delta = utime.ticks_diff(utime.ticks_us(), t)
print('Function {} Time = {:6.3f}ms'.format(myname, delta/1000))
return result
return new_func
@@ -170,7 +172,7 @@ by caching the object in a local variable:
This avoids the need repeatedly to look up ``self.ba`` and ``obj_display.framebuffer``
in the body of the method ``bar()``.
-.. _gc:
+.. _controlling_gc:
Controlling garbage collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -182,7 +184,7 @@ process known as garbage collection reclaims the memory used by these redundant
objects and the allocation is then tried again - a process which can take several
milliseconds.
-There are benefits in pre-empting this by periodically issuing ``gc.collect()``.
+There may be benefits in pre-empting this by periodically issuing `gc.collect()`.
Firstly doing a collection before it is actually required is quicker - typically on the
order of 1ms if done frequently. Secondly you can determine the point in code
where this time is used rather than have a longer delay occur at random points,
@@ -190,34 +192,11 @@ possibly in a speed critical section. Finally performing collections regularly
can reduce fragmentation in the heap. Severe fragmentation can lead to
non-recoverable allocation failures.
-Accessing hardware directly
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-This comes into the category of more advanced programming and involves some knowledge
-of the target MCU. Consider the example of toggling an output pin on the Pyboard. The
-standard approach would be to write
-
-.. code:: python
-
- mypin.value(mypin.value() ^ 1) # mypin was instantiated as an output pin
-
-This involves the overhead of two calls to the ``Pin`` instance's ``value()``
-method. This overhead can be eliminated by performing a read/write to the relevant bit
-of the chip's GPIO port output data register (odr). To facilitate this the ``stm``
-module provides a set of constants providing the addresses of the relevant registers.
-A fast toggle of pin ``P4`` (CPU pin ``A14``) - corresponding to the green LED -
-can be performed as follows:
-
-.. code:: python
-
- BIT14 = const(1 << 14)
- stm.mem16[stm.GPIOA + stm.GPIO_ODR] ^= BIT14
-
The Native code emitter
-----------------------
-This causes the MicroPython compiler to emit ARM native opcodes rather than
-bytecode. It covers the bulk of the Python language so most functions will require
+This causes the MicroPython compiler to emit native CPU opcodes rather than
+bytecode. It covers the bulk of the MicroPython functionality, so most functions will require
no adaptation (but see below). It is invoked by means of a function decorator:
.. code:: python
@@ -276,7 +255,7 @@ Viper provides pointer types to assist the optimiser. These comprise
* ``ptr32`` Points to a 32 bit machine word.
The concept of a pointer may be unfamiliar to Python programmers. It has similarities
-to a Python ``memoryview`` object in that it provides direct access to data stored in memory.
+to a Python `memoryview` object in that it provides direct access to data stored in memory.
Items are accessed using subscript notation, but slices are not supported: a pointer can return
a single item only. Its purpose is to provide fast random access to data stored in contiguous
memory locations - such as data stored in objects which support the buffer protocol, and
@@ -330,3 +309,34 @@ The following example illustrates the use of a ``ptr16`` cast to toggle pin X1 `
A detailed technical description of the three code emitters may be found
on Kickstarter here `Note 1 <https://www.kickstarter.com/projects/214379695/micro-python-python-for-microcontrollers/posts/664832>`_
and here `Note 2 <https://www.kickstarter.com/projects/214379695/micro-python-python-for-microcontrollers/posts/665145>`_
+
+Accessing hardware directly
+---------------------------
+
+.. note::
+
+ Code examples in this section are given for the Pyboard. The techniques
+ described however may be applied to other MicroPython ports too.
+
+This comes into the category of more advanced programming and involves some knowledge
+of the target MCU. Consider the example of toggling an output pin on the Pyboard. The
+standard approach would be to write
+
+.. code:: python
+
+ mypin.value(mypin.value() ^ 1) # mypin was instantiated as an output pin
+
+This involves the overhead of two calls to the `Pin` instance's :meth:`~machine.Pin.value()`
+method. This overhead can be eliminated by performing a read/write to the relevant bit
+of the chip's GPIO port output data register (odr). To facilitate this the ``stm``
+module provides a set of constants providing the addresses of the relevant registers.
+A fast toggle of pin ``P4`` (CPU pin ``A14``) - corresponding to the green LED -
+can be performed as follows:
+
+.. code:: python
+
+ import machine
+ import stm
+
+ BIT14 = const(1 << 14)
+ machine.mem16[stm.GPIOA + stm.GPIO_ODR] ^= BIT14