source: trunk/docs/contents/pgdb/adaptation.rst @ 821

Last change on this file since 821 was 821, checked in by cito, 4 years ago

Support the uuid data type

This is often useful and also supported by SQLAlchemy

File size: 14.2 KB
1Remarks on Adaptation and Typecasting
4.. py:currentmodule:: pgdb
6Both PostgreSQL and Python have the concept of data types, but there
7are of course differences between the two type systems.  Therefore PyGreSQL
8needs to adapt Python objects to the representation required by PostgreSQL
9when passing values as query parameters, and it needs to typecast the
10representation of PostgreSQL data types returned by database queries to
11Python objects.  Here are some explanations about how this works in
12detail in case you want to better understand or change the default
13behavior of PyGreSQL.
15Supported data types
18The following automatic data type conversions are supported by PyGreSQL
19out of the box.  If you need other automatic type conversions or want to
20change the default conversions, you can achieve this by using the methods
21explained in the next two sections.
23================================== ==================
24PostgreSQL                          Python
25================================== ==================
26char, bpchar, name, text, varchar  str
27bool                               bool
28bytea                              bytes
29int2, int4, int8, oid, serial      int [#int8]_
30int2vector                         list of int
31float4, float8                     float
32numeric, money                     Decimal
34time, timetz                       datetime.time
35timestamp, timestamptz             datetime.datetime
36interval                           datetime.timedelta
37hstore                             dict
38json, jsonb                        list or dict
39uuid                               uuid.UUID
40array                              list
41record                             tuple
42================================== ==================
44.. note::
46    Elements of arrays and records will also be converted accordingly.
48    .. [#int8] int8 is converted to long in Python 2
50Adaptation of parameters
53PyGreSQL knows how to adapt the common Python types to get a suitable
54representation of their values for PostgreSQL when you pass parameters
55to a query. For example::
57    >>> con = pgdb.connect(...)
58    >>> cur = con.cursor()
59    >>> parameters = (144, 3.75, 'hello', None)
60    >>> tuple(cur.execute('SELECT %s, %s, %s, %s', parameters).fetchone()
61    (144, Decimal('3.75'), 'hello', None)
63This is the result we can expect, so obviously PyGreSQL has adapted the
64parameters and sent the following query to PostgreSQL:
66.. code-block:: sql
68    SELECT 144, 3.75, 'hello', NULL
70Note the subtle, but important detail that even though the SQL string passed
71to :meth:`cur.execute` contains conversion specifications normally used in
72Python with the ``%`` operator for formatting strings, we didn't use the ``%``
73operator to format the parameters, but passed them as the second argument to
74:meth:`cur.execute`.  I.e. we **didn't** write the following::
76>>> tuple(cur.execute('SELECT %s, %s, %s, %s' % parameters).fetchone()
78If we had done this, PostgreSQL would have complained because the parameters
79were not adapted.  Particularly, there would be no quotes around the value
80``'hello'``, so PostgreSQL would have interpreted this as a database column,
81which would have caused a :exc:`ProgrammingError`.  Also, the Python value
82``None`` would have been included in the SQL command literally, instead of
83being converted to the SQL keyword ``NULL``, which would have been another
84reason for PostgreSQL to complain about our bad query:
86.. code-block:: sql
88    SELECT 144, 3.75, hello, None
90Even worse, building queries with the use of the ``%`` operator makes us
91vulnerable to so called "SQL injection" exploits, where an attacker inserts
92malicious SQL statements into our queries that we never intended to be
93executed.  We could avoid this by carefully quoting and escaping the
94parameters, but this would be tedious and if we overlook something, our
95code will still be vulnerable.  So please don't do this.  This cannot be
96emphasized enough, because it is such a subtle difference and using the ``%``
97operator looks so natural:
99.. warning::
101  Remember to **never** insert parameters directly into your queries using
102  the ``%`` operator.  Always pass the parameters separately.
104The good thing is that by letting PyGreSQL do the work for you, you can treat
105all your parameters equally and don't need to ponder where you need to put
106quotes or need to escape strings.  You can and should also always use the
107general ``%s`` specification instead of e.g. using ``%d`` for integers.
108Actually, to avoid mistakes and make it easier to insert parameters at more
109than one location, you can and should use named specifications, like this::
111    >>> params = dict(greeting='Hello', name='HAL')
112    >>> sql = """SELECT %(greeting)s || ', ' || %(name)s
113    ...    || '. Do you read me, ' || %(name)s || '?'"""
114    >>> cur.execute(sql, params).fetchone()[0]
115    'Hello, HAL. Do you read me, HAL?'
117PyGreSQL does not only adapt the basic types like ``int``, ``float``,
118``bool`` and ``str``, but also tries to make sense of Python lists and tuples.
120Lists are adapted as PostgreSQL arrays::
122    >>> params = dict(array=[[1, 2],[3, 4]])
123    >>> cur.execute("SELECT %(array)s", params).fetchone()[0]
124    [[1, 2], [3, 4]]
126Note that the query gives the value back as Python lists again.  This
127is achieved by the typecasting mechanism explained in the next section.
128The query that was actually executed was this:
130.. code-block:: sql
132    SELECT ARRAY[[1,2],[3,4]]
134Again, if we had inserted the list using the ``%`` operator without adaptation,
135the ``ARRAY`` keyword would have been missing in the query.
137Tuples are adapted as PostgreSQL composite types::
139    >>> params = dict(record=('Bond', 'James'))
140    >>> cur.execute("SELECT %(record)s", params).fetchone()[0]
141    ('Bond', 'James')
143You can also use this feature with the ``IN`` syntax of SQL::
145    >>> params = dict(what='needle', where=('needle', 'haystack'))
146    >>> cur.execute("SELECT %(what)s IN %(where)s", params).fetchone()[0]
147    True
149Sometimes a Python type can be ambiguous. For instance, you might want
150to insert a Python list not into an array column, but into a JSON column.
151Or you want to interpret a string as a date and insert it into a DATE column.
152In this case you can give PyGreSQL a hint by using :ref:`type_constructors`::
154    >>> cur.execute("CREATE TABLE json_data (data json, created date)")
155    >>> params = dict(
156    ...     data=pgdb.Json([1, 2, 3]), created=pgdb.Date(2016, 1, 29))
157    >>> sql = ("INSERT INTO json_data VALUES (%(data)s, %(created)s)")
158    >>> cur.execute(sql, params)
159    >>> cur.execute("SELECT * FROM json_data").fetchone()
160    Row(data=[1, 2, 3], created='2016-01-29')
162Let's think of another example where we create a table with a composite
163type in PostgreSQL:
165.. code-block:: sql
167    CREATE TABLE on_hand (
168        item      inventory_item,
169        count     integer)
171We assume the composite type ``inventory_item`` has been created like this:
173.. code-block:: sql
175    CREATE TYPE inventory_item AS (
176        name            text,
177        supplier_id     integer,
178        price           numeric)
180In Python we can use a named tuple as an equivalent to this PostgreSQL type::
182    >>> from collections import namedtuple
183    >>> inventory_item = namedtuple(
184    ...     'inventory_item', ['name', 'supplier_id', 'price'])
186Using the automatic adaptation of Python tuples, an item can now be
187inserted into the database and then read back as follows::
189    >>> cur.execute("INSERT INTO on_hand VALUES (%(item)s, %(count)s)",
190    ...     dict(item=inventory_item('fuzzy dice', 42, 1.99), count=1000))
191    >>> cur.execute("SELECT * FROM on_hand").fetchone()
192    Row(item=inventory_item(name='fuzzy dice', supplier_id=42,
193            price=Decimal('1.99')), count=1000)
195However, we may not want to use named tuples, but custom Python classes
196to hold our values, like this one::
198    >>> class InventoryItem:
199    ...
200    ...     def __init__(self, name, supplier_id, price):
201    ... = name
202    ...         self.supplier_id = supplier_id
203    ...         self.price = price
204    ...
205    ...     def __str__(self):
206    ...         return '%s (from %s, at $%s)' % (
207    ...   , self.supplier_id, self.price)
209But when we try to insert an instance of this class in the same way, we
210will get an error::
212    >>> cur.execute("INSERT INTO on_hand VALUES (%(item)s, %(count)s)",
213    ...     dict(item=InventoryItem('fuzzy dice', 42, 1.99), count=1000))
214    InterfaceError: Do not know how to adapt type <class 'InventoryItem'>
216While PyGreSQL knows how to adapt tuples, it does not know what to make out
217of our custom class.  To simply convert the object to a string using the
218``str`` function is not a solution, since this yields a human readable string
219that is not useful for PostgreSQL.  However, it is possible to make such
220custom classes adapt themselves to PostgreSQL by adding a "magic" method
221with the name ``__pg_repr__``, like this::
223  >>> class InventoryItem:
224    ...
225    ...     ...
226    ...
227    ...     def __str__(self):
228    ...         return '%s (from %s, at $%s)' % (
229    ...   , self.supplier_id, self.price)
230    ...
231    ...     def __pg_repr__(self):
232    ...         return (, self.supplier_id, self.price)
234Now you can insert class instances the same way as you insert named tuples.
236Note that PyGreSQL adapts the result of ``__pg_repr__`` again if it is a
237tuple or a list.  Otherwise, it must be a properly escaped string.
239Typecasting to Python
242As you noticed, PyGreSQL automatically converted the PostgreSQL data to
243suitable Python objects when returning values via one of the "fetch" methods
244of a cursor.  This is done by the use of built-in typecast functions.
246If you want to use different typecast functions or add your own if no
247built-in typecast function is available, then this is possible using
248the :func:`set_typecast` function.  With the :func:`get_typecast` function
249you can check which function is currently set, and :func:`reset_typecast`
250allows you to reset the typecast function to its default.  If no typecast
251function is set, then PyGreSQL will return the raw strings from the database.
253For instance, you will find that PyGreSQL uses the normal ``int`` function
254to cast PostgreSQL ``int4`` type values to Python::
256    >>> pgdb.get_typecast('int4')
257    int
259You can change this to return float values instead::
261    >>> pgdb.set_typecast('int4', float)
262    >>> con = pgdb.connect(...)
263    >>> cur = con.cursor()
264    >>> cur.execute('select 42::int4').fetchone()[0]
265    42.0
267Note that the connections cache the typecast functions, so you may need to
268reopen the database connection, or reset the cache of the connection to
269make this effective, using the following command::
271    >>> con.type_cache.reset_typecast()
273The :class:`TypeCache` of the connection can also be used to change typecast
274functions locally for one database connection only.
276As a more useful example, we can create a typecast function that casts
277items of the composite type used as example in the previous section
278to instances of the corresponding Python class::
280    >>> con.type_cache.reset_typecast()
281    >>> cast_tuple = con.type_cache.get_typecast('inventory_item')
282    >>> cast_item = lambda value: InventoryItem(*cast_tuple(value))
283    >>> con.type_cache.set_typecast('inventory_item', cast_item)
284    >>> str(cur.execute("SELECT * FROM on_hand").fetchone()[0])
285    'fuzzy dice (from 42, at $1.99)'
287As you saw in the last section you, PyGreSQL also has a typecast function
288for JSON, which is the default JSON decoder from the standard library.
289Let's assume we want to use a slight variation of that decoder in which
290every integer in JSON is converted to a float in Python. This can be
291accomplished as follows::
293    >>> from json import loads
294    >>> cast_json = lambda v: loads(v, parse_int=float)
295    >>> pgdb.set_typecast('json', cast_json)
296    >>> cur.execute("SELECT data FROM json_data").fetchone()[0]
297    [1.0, 2.0, 3.0]
299Note again that you may need to run ``con.type_cache.reset_typecast()`` to
300make this effective.  Also note that the two types ``json`` and ``jsonb`` have
301their own typecast functions, so if you use ``jsonb`` instead of ``json``, you
302need to use this type name when setting the typecast function::
304    >>> pgdb.set_typecast('jsonb', cast_json)
306As one last example, let us try to typecast the geometric data type ``circle``
307of PostgreSQL into a `SymPy <>`_ ``Circle`` object.  Let's
308assume we have created and populated a table with two circles, like so:
310.. code-block:: sql
312    CREATE TABLE circle (
313        name varchar(8) primary key, circle circle);
314    INSERT INTO circle VALUES ('C1', '<(2, 3), 3>');
315    INSERT INTO circle VALUES ('C2', '<(1, -1), 4>');
317With PostgreSQL we can easily calculate that these two circles overlap::
319    >>> con.cursor().execute("""SELECT &&
320    ...     FROM circle c1, circle c2
321    ...     WHERE = 'C1' AND = 'C2'""").fetchone()[0]
322    True
324However, calculating the intersection points between the two circles using the
325``#`` operator does not work (at least not as of PostgreSQL version 9.5).
326So let' resort to SymPy to find out.  To ease importing circles from
327PostgreSQL to SymPy, we create and register the following typecast function::
329    >>> from sympy import Point, Circle
330    >>>
331    >>> def cast_circle(s):
332    ...     p, r = s[1:-1].rsplit(',', 1)
333    ...     p = p[1:-1].split(',')
334    ...     return Circle(Point(float(p[0]), float(p[1])), float(r))
335    ...
336    >>> pgdb.set_typecast('circle', cast_circle)
338Now we can import the circles in the table into Python quite easily::
340    >>> circle = { for c in con.cursor().execute(
341    ...     "SELECT * FROM circle").fetchall()}
343The result is a dictionary mapping circle names to SymPy ``Circle`` objects.
344We can verify that the circles have been imported correctly:
346    >>> circle
347    {'C1': Circle(Point(2, 3), 3.0),
348     'C2': Circle(Point(1, -1), 4.0)}
350Finally we can find the exact intersection points with SymPy:
352    >>> circle['C1'].intersection(circle['C2'])
353    [Point(29/17 + 64564173230121*sqrt(17)/100000000000000,
354        -80705216537651*sqrt(17)/500000000000000 + 31/17),
355     Point(-64564173230121*sqrt(17)/100000000000000 + 29/17,
356        80705216537651*sqrt(17)/500000000000000 + 31/17)]
Note: See TracBrowser for help on using the repository browser.