source: trunk/docs/contents/pgdb/adaptation.rst @ 821

Last change on this file since 821 was 821, checked in by cito, 4 years ago

Support the uuid data type

This is often useful and also supported by SQLAlchemy

File size: 14.2 KB
Line 
1Remarks on Adaptation and Typecasting
2=====================================
3
4.. py:currentmodule:: pgdb
5
6Both PostgreSQL and Python have the concept of data types, but there
7are of course differences between the two type systems.  Therefore PyGreSQL
8needs to adapt Python objects to the representation required by PostgreSQL
9when passing values as query parameters, and it needs to typecast the
10representation of PostgreSQL data types returned by database queries to
11Python objects.  Here are some explanations about how this works in
12detail in case you want to better understand or change the default
13behavior of PyGreSQL.
14
15Supported data types
16--------------------
17
18The following automatic data type conversions are supported by PyGreSQL
19out of the box.  If you need other automatic type conversions or want to
20change the default conversions, you can achieve this by using the methods
21explained in the next two sections.
22
23================================== ==================
24PostgreSQL                          Python
25================================== ==================
26char, bpchar, name, text, varchar  str
27bool                               bool
28bytea                              bytes
29int2, int4, int8, oid, serial      int [#int8]_
30int2vector                         list of int
31float4, float8                     float
32numeric, money                     Decimal
33date                               datetime.date
34time, timetz                       datetime.time
35timestamp, timestamptz             datetime.datetime
36interval                           datetime.timedelta
37hstore                             dict
38json, jsonb                        list or dict
39uuid                               uuid.UUID
40array                              list
41record                             tuple
42================================== ==================
43
44.. note::
45
46    Elements of arrays and records will also be converted accordingly.
47
48    .. [#int8] int8 is converted to long in Python 2
49
50Adaptation of parameters
51------------------------
52
53PyGreSQL knows how to adapt the common Python types to get a suitable
54representation of their values for PostgreSQL when you pass parameters
55to a query. For example::
56
57    >>> con = pgdb.connect(...)
58    >>> cur = con.cursor()
59    >>> parameters = (144, 3.75, 'hello', None)
60    >>> tuple(cur.execute('SELECT %s, %s, %s, %s', parameters).fetchone()
61    (144, Decimal('3.75'), 'hello', None)
62
63This is the result we can expect, so obviously PyGreSQL has adapted the
64parameters and sent the following query to PostgreSQL:
65
66.. code-block:: sql
67
68    SELECT 144, 3.75, 'hello', NULL
69
70Note the subtle, but important detail that even though the SQL string passed
71to :meth:`cur.execute` contains conversion specifications normally used in
72Python with the ``%`` operator for formatting strings, we didn't use the ``%``
73operator to format the parameters, but passed them as the second argument to
74:meth:`cur.execute`.  I.e. we **didn't** write the following::
75
76>>> tuple(cur.execute('SELECT %s, %s, %s, %s' % parameters).fetchone()
77
78If we had done this, PostgreSQL would have complained because the parameters
79were not adapted.  Particularly, there would be no quotes around the value
80``'hello'``, so PostgreSQL would have interpreted this as a database column,
81which would have caused a :exc:`ProgrammingError`.  Also, the Python value
82``None`` would have been included in the SQL command literally, instead of
83being converted to the SQL keyword ``NULL``, which would have been another
84reason for PostgreSQL to complain about our bad query:
85
86.. code-block:: sql
87
88    SELECT 144, 3.75, hello, None
89
90Even worse, building queries with the use of the ``%`` operator makes us
91vulnerable to so called "SQL injection" exploits, where an attacker inserts
92malicious SQL statements into our queries that we never intended to be
93executed.  We could avoid this by carefully quoting and escaping the
94parameters, but this would be tedious and if we overlook something, our
95code will still be vulnerable.  So please don't do this.  This cannot be
96emphasized enough, because it is such a subtle difference and using the ``%``
97operator looks so natural:
98
99.. warning::
100
101  Remember to **never** insert parameters directly into your queries using
102  the ``%`` operator.  Always pass the parameters separately.
103
104The good thing is that by letting PyGreSQL do the work for you, you can treat
105all your parameters equally and don't need to ponder where you need to put
106quotes or need to escape strings.  You can and should also always use the
107general ``%s`` specification instead of e.g. using ``%d`` for integers.
108Actually, to avoid mistakes and make it easier to insert parameters at more
109than one location, you can and should use named specifications, like this::
110
111    >>> params = dict(greeting='Hello', name='HAL')
112    >>> sql = """SELECT %(greeting)s || ', ' || %(name)s
113    ...    || '. Do you read me, ' || %(name)s || '?'"""
114    >>> cur.execute(sql, params).fetchone()[0]
115    'Hello, HAL. Do you read me, HAL?'
116
117PyGreSQL does not only adapt the basic types like ``int``, ``float``,
118``bool`` and ``str``, but also tries to make sense of Python lists and tuples.
119
120Lists are adapted as PostgreSQL arrays::
121
122    >>> params = dict(array=[[1, 2],[3, 4]])
123    >>> cur.execute("SELECT %(array)s", params).fetchone()[0]
124    [[1, 2], [3, 4]]
125
126Note that the query gives the value back as Python lists again.  This
127is achieved by the typecasting mechanism explained in the next section.
128The query that was actually executed was this:
129
130.. code-block:: sql
131
132    SELECT ARRAY[[1,2],[3,4]]
133
134Again, if we had inserted the list using the ``%`` operator without adaptation,
135the ``ARRAY`` keyword would have been missing in the query.
136
137Tuples are adapted as PostgreSQL composite types::
138
139    >>> params = dict(record=('Bond', 'James'))
140    >>> cur.execute("SELECT %(record)s", params).fetchone()[0]
141    ('Bond', 'James')
142
143You can also use this feature with the ``IN`` syntax of SQL::
144
145    >>> params = dict(what='needle', where=('needle', 'haystack'))
146    >>> cur.execute("SELECT %(what)s IN %(where)s", params).fetchone()[0]
147    True
148
149Sometimes a Python type can be ambiguous. For instance, you might want
150to insert a Python list not into an array column, but into a JSON column.
151Or you want to interpret a string as a date and insert it into a DATE column.
152In this case you can give PyGreSQL a hint by using :ref:`type_constructors`::
153
154    >>> cur.execute("CREATE TABLE json_data (data json, created date)")
155    >>> params = dict(
156    ...     data=pgdb.Json([1, 2, 3]), created=pgdb.Date(2016, 1, 29))
157    >>> sql = ("INSERT INTO json_data VALUES (%(data)s, %(created)s)")
158    >>> cur.execute(sql, params)
159    >>> cur.execute("SELECT * FROM json_data").fetchone()
160    Row(data=[1, 2, 3], created='2016-01-29')
161
162Let's think of another example where we create a table with a composite
163type in PostgreSQL:
164
165.. code-block:: sql
166
167    CREATE TABLE on_hand (
168        item      inventory_item,
169        count     integer)
170
171We assume the composite type ``inventory_item`` has been created like this:
172
173.. code-block:: sql
174
175    CREATE TYPE inventory_item AS (
176        name            text,
177        supplier_id     integer,
178        price           numeric)
179
180In Python we can use a named tuple as an equivalent to this PostgreSQL type::
181
182    >>> from collections import namedtuple
183    >>> inventory_item = namedtuple(
184    ...     'inventory_item', ['name', 'supplier_id', 'price'])
185
186Using the automatic adaptation of Python tuples, an item can now be
187inserted into the database and then read back as follows::
188
189    >>> cur.execute("INSERT INTO on_hand VALUES (%(item)s, %(count)s)",
190    ...     dict(item=inventory_item('fuzzy dice', 42, 1.99), count=1000))
191    >>> cur.execute("SELECT * FROM on_hand").fetchone()
192    Row(item=inventory_item(name='fuzzy dice', supplier_id=42,
193            price=Decimal('1.99')), count=1000)
194
195However, we may not want to use named tuples, but custom Python classes
196to hold our values, like this one::
197
198    >>> class InventoryItem:
199    ...
200    ...     def __init__(self, name, supplier_id, price):
201    ...         self.name = name
202    ...         self.supplier_id = supplier_id
203    ...         self.price = price
204    ...
205    ...     def __str__(self):
206    ...         return '%s (from %s, at $%s)' % (
207    ...             self.name, self.supplier_id, self.price)
208
209But when we try to insert an instance of this class in the same way, we
210will get an error::
211
212    >>> cur.execute("INSERT INTO on_hand VALUES (%(item)s, %(count)s)",
213    ...     dict(item=InventoryItem('fuzzy dice', 42, 1.99), count=1000))
214    InterfaceError: Do not know how to adapt type <class 'InventoryItem'>
215
216While PyGreSQL knows how to adapt tuples, it does not know what to make out
217of our custom class.  To simply convert the object to a string using the
218``str`` function is not a solution, since this yields a human readable string
219that is not useful for PostgreSQL.  However, it is possible to make such
220custom classes adapt themselves to PostgreSQL by adding a "magic" method
221with the name ``__pg_repr__``, like this::
222
223  >>> class InventoryItem:
224    ...
225    ...     ...
226    ...
227    ...     def __str__(self):
228    ...         return '%s (from %s, at $%s)' % (
229    ...             self.name, self.supplier_id, self.price)
230    ...
231    ...     def __pg_repr__(self):
232    ...         return (self.name, self.supplier_id, self.price)
233
234Now you can insert class instances the same way as you insert named tuples.
235
236Note that PyGreSQL adapts the result of ``__pg_repr__`` again if it is a
237tuple or a list.  Otherwise, it must be a properly escaped string.
238
239Typecasting to Python
240---------------------
241
242As you noticed, PyGreSQL automatically converted the PostgreSQL data to
243suitable Python objects when returning values via one of the "fetch" methods
244of a cursor.  This is done by the use of built-in typecast functions.
245
246If you want to use different typecast functions or add your own if no
247built-in typecast function is available, then this is possible using
248the :func:`set_typecast` function.  With the :func:`get_typecast` function
249you can check which function is currently set, and :func:`reset_typecast`
250allows you to reset the typecast function to its default.  If no typecast
251function is set, then PyGreSQL will return the raw strings from the database.
252
253For instance, you will find that PyGreSQL uses the normal ``int`` function
254to cast PostgreSQL ``int4`` type values to Python::
255
256    >>> pgdb.get_typecast('int4')
257    int
258
259You can change this to return float values instead::
260
261    >>> pgdb.set_typecast('int4', float)
262    >>> con = pgdb.connect(...)
263    >>> cur = con.cursor()
264    >>> cur.execute('select 42::int4').fetchone()[0]
265    42.0
266
267Note that the connections cache the typecast functions, so you may need to
268reopen the database connection, or reset the cache of the connection to
269make this effective, using the following command::
270
271    >>> con.type_cache.reset_typecast()
272
273The :class:`TypeCache` of the connection can also be used to change typecast
274functions locally for one database connection only.
275
276As a more useful example, we can create a typecast function that casts
277items of the composite type used as example in the previous section
278to instances of the corresponding Python class::
279
280    >>> con.type_cache.reset_typecast()
281    >>> cast_tuple = con.type_cache.get_typecast('inventory_item')
282    >>> cast_item = lambda value: InventoryItem(*cast_tuple(value))
283    >>> con.type_cache.set_typecast('inventory_item', cast_item)
284    >>> str(cur.execute("SELECT * FROM on_hand").fetchone()[0])
285    'fuzzy dice (from 42, at $1.99)'
286
287As you saw in the last section you, PyGreSQL also has a typecast function
288for JSON, which is the default JSON decoder from the standard library.
289Let's assume we want to use a slight variation of that decoder in which
290every integer in JSON is converted to a float in Python. This can be
291accomplished as follows::
292
293    >>> from json import loads
294    >>> cast_json = lambda v: loads(v, parse_int=float)
295    >>> pgdb.set_typecast('json', cast_json)
296    >>> cur.execute("SELECT data FROM json_data").fetchone()[0]
297    [1.0, 2.0, 3.0]
298
299Note again that you may need to run ``con.type_cache.reset_typecast()`` to
300make this effective.  Also note that the two types ``json`` and ``jsonb`` have
301their own typecast functions, so if you use ``jsonb`` instead of ``json``, you
302need to use this type name when setting the typecast function::
303
304    >>> pgdb.set_typecast('jsonb', cast_json)
305
306As one last example, let us try to typecast the geometric data type ``circle``
307of PostgreSQL into a `SymPy <http://www.sympy.org>`_ ``Circle`` object.  Let's
308assume we have created and populated a table with two circles, like so:
309
310.. code-block:: sql
311
312    CREATE TABLE circle (
313        name varchar(8) primary key, circle circle);
314    INSERT INTO circle VALUES ('C1', '<(2, 3), 3>');
315    INSERT INTO circle VALUES ('C2', '<(1, -1), 4>');
316
317With PostgreSQL we can easily calculate that these two circles overlap::
318
319    >>> con.cursor().execute("""SELECT c1.circle && c2.circle
320    ...     FROM circle c1, circle c2
321    ...     WHERE c1.name = 'C1' AND c2.name = 'C2'""").fetchone()[0]
322    True
323
324However, calculating the intersection points between the two circles using the
325``#`` operator does not work (at least not as of PostgreSQL version 9.5).
326So let' resort to SymPy to find out.  To ease importing circles from
327PostgreSQL to SymPy, we create and register the following typecast function::
328
329    >>> from sympy import Point, Circle
330    >>>
331    >>> def cast_circle(s):
332    ...     p, r = s[1:-1].rsplit(',', 1)
333    ...     p = p[1:-1].split(',')
334    ...     return Circle(Point(float(p[0]), float(p[1])), float(r))
335    ...
336    >>> pgdb.set_typecast('circle', cast_circle)
337
338Now we can import the circles in the table into Python quite easily::
339
340    >>> circle = {c.name: c.circle for c in con.cursor().execute(
341    ...     "SELECT * FROM circle").fetchall()}
342
343The result is a dictionary mapping circle names to SymPy ``Circle`` objects.
344We can verify that the circles have been imported correctly:
345
346    >>> circle
347    {'C1': Circle(Point(2, 3), 3.0),
348     'C2': Circle(Point(1, -1), 4.0)}
349
350Finally we can find the exact intersection points with SymPy:
351
352    >>> circle['C1'].intersection(circle['C2'])
353    [Point(29/17 + 64564173230121*sqrt(17)/100000000000000,
354        -80705216537651*sqrt(17)/500000000000000 + 31/17),
355     Point(-64564173230121*sqrt(17)/100000000000000 + 29/17,
356        80705216537651*sqrt(17)/500000000000000 + 31/17)]
Note: See TracBrowser for help on using the repository browser.