source: trunk/docs/contents/pg/adaptation.rst @ 930

Last change on this file since 930 was 930, checked in by cito, 18 months ago

Update year, version number and supported Pg versions

File size: 16.8 KB
Line 
1Remarks on Adaptation and Typecasting
2=====================================
3
4.. py:currentmodule:: pg
5
6Both PostgreSQL and Python have the concept of data types, but there
7are of course differences between the two type systems.  Therefore PyGreSQL
8needs to adapt Python objects to the representation required by PostgreSQL
9when passing values as query parameters, and it needs to typecast the
10representation of PostgreSQL data types returned by database queries to
11Python objects.  Here are some explanations about how this works in
12detail in case you want to better understand or change the default
13behavior of PyGreSQL.
14
15Supported data types
16--------------------
17
18The following automatic data type conversions are supported by PyGreSQL
19out of the box.  If you need other automatic type conversions or want to
20change the default conversions, you can achieve this by using the methods
21explained in the next two sections.
22
23================================== ==================
24PostgreSQL                          Python
25================================== ==================
26char, bpchar, name, text, varchar  str
27bool                               bool
28bytea                              bytes
29int2, int4, int8, oid, serial      int [#int8]_
30int2vector                         list of int
31float4, float8                     float
32numeric, money                     Decimal
33date                               datetime.date
34time, timetz                       datetime.time
35timestamp, timestamptz             datetime.datetime
36interval                           datetime.timedelta
37hstore                             dict
38json, jsonb                        list or dict
39uuid                               uuid.UUID
40array                              list [#array]_
41record                             tuple
42================================== ==================
43
44.. note::
45
46    Elements of arrays and records will also be converted accordingly.
47
48    .. [#int8] int8 is converted to long in Python 2
49
50    .. [#array] The first element of the array will always be the first element
51       of the Python list, no matter what the lower bound of the PostgreSQL
52       array is. The information about the start index of the array (which is
53       usually 1 in PostgreSQL, but can also be different from 1) is ignored
54       and gets lost in the conversion to the Python list. If you need that
55       information, you can request it separately with the `array_lower()`
56       function provided by PostgreSQL.
57
58
59Adaptation of parameters
60------------------------
61When you use the higher level methods of the classic :mod:`pg` module like
62:meth:`DB.insert()` or :meth:`DB.update()`, you don't need to care about
63adaptation of parameters, since all of this is happening automatically behind
64the scenes.  You only need to consider this issue when creating SQL commands
65manually and sending them to the database using the :meth:`DB.query` method.
66
67Imagine you have created a user  login form that stores the login name as
68*login* and the password as *passwd* and you now want to get the user
69data for that user.  You may be tempted to execute a query like this::
70
71    >>> db = pg.DB(...)
72    >>> sql = "SELECT * FROM user_table WHERE login = '%s' AND passwd = '%s'"
73    >>> db.query(sql % (login, passwd)).getresult()[0]
74
75This seems to work at a first glance, but you will notice an error as soon as
76you try to use a login name containing a single quote.  Even worse, this error
77can be exploited through a so called "SQL injection", where an attacker inserts
78malicious SQL statements into the query that you never intended to be executed.
79For instance, with a login name something like ``' OR ''='`` the user could
80easily log in and see the user data of another user in the database.
81
82One solution for this problem would be to clean your input from "dangerous"
83characters like the single quote, but this is tedious and it is likely that
84you overlook something or break the application e.g. for users with names
85like "D'Arcy".  A better solution is to use the escaping functions provided
86by PostgreSQL which are available as methods on the :class:`DB` object::
87
88    >>> login = "D'Arcy"
89    >>> db.escape_string(login)
90    "D''Arcy"
91
92As you see, :meth:`DB.escape_string` has doubled the single quote which is
93the right thing to do in SQL.  However, there are better ways of passing
94parameters to the query, without having to manually escape them.  If you
95pass the parameters as positional arguments to :meth:`DB.query`, then
96PyGreSQL will send them to the database separately, without the need for
97quoting them inside the SQL command, and without the problems inherent with
98that process.  In this case you must put placeholders of the form ``$1``,
99``$2`` etc. in the SQL command in place of the parameters that should go there.
100For instance::
101
102    >>> sql = "SELECT * FROM user_table WHERE login = $1 AND passwd = $2"
103    >>> db.query(sql, login, passwd).getresult()[0]
104
105That's much better.  So please always keep the following warning in mind:
106
107.. warning::
108
109  Remember to **never** insert parameters directly into your queries using
110  the ``%`` operator.  Always pass the parameters separately.
111
112If you like the ``%`` format specifications of Python better than the
113placeholders used by PostgreSQL, there is still a way to use them, via the
114:meth:`DB.query_formatted` method::
115
116    >>> sql = "SELECT * FROM user_table WHERE login = %s AND passwd = %s"
117    >>> db.query_formatted(sql, (login, passwd)).getresult()[0]
118
119Note that we need to pass the parameters not as positional arguments here,
120but as a single tuple.  Also note again that we did not use the ``%``
121operator of Python to format the SQL string, we just used the ``%s`` format
122specifications of Python and let PyGreSQL care about the formatting.
123Even better, you can also pass the parameters as a dictionary if you use
124the :meth:`DB.query_formatted` method::
125
126    >>> sql = """SELECT * FROM user_table
127    ...     WHERE login = %(login)s AND passwd = %(passwd)s"""
128    >>> parameters = dict(login=login, passwd=passwd)
129    >>> db.query_formatted(sql, parameters).getresult()[0]
130
131Here is another example::
132
133    >>> sql = "SELECT 'Hello, ' || %s || '!'"
134    >>> db.query_formatted(sql, (login,)).getresult()[0]
135
136You would think that the following even simpler example should work, too:
137
138    >>> sql = "SELECT %s"
139    >>> db.query_formatted(sql, (login,)).getresult()[0]
140    ProgrammingError: Could not determine data type of parameter $1
141
142The issue here is that :meth:`DB.query_formatted` by default still uses
143PostgreSQL parameters, transforming the Python style ``%s`` placeholder
144into a ``$1`` placeholder, and sending the login name separately from
145the query.  In the query we looked at before, the concatenation with other
146strings made it clear that it should be interpreted as a string. This simple
147query however does not give PostgreSQL a clue what data type the ``$1``
148placeholder stands for.
149
150This is different when you are embedding the login name directly into the
151query instead of passing it as parameter to PostgreSQL.  You can achieve this
152by setting the *inline* parameter of :meth:`DB.query_formatted`, like so::
153
154    >>> sql = "SELECT %s"
155    >>> db.query_formatted(sql, (login,), inline=True).getresult()[0]
156
157Another way of making this query work while still sending the parameters
158separately is to simply cast the parameter values::
159
160    >>> sql = "SELECT %s::text"
161    >>> db.query_formatted(sql, (login,), inline=False).getresult()[0]
162
163In real world examples you will rarely have to cast your parameters like that,
164since in an INSERT statement or a WHERE clause comparing the parameter to a
165table column the data type will be clear from the context.
166
167When binding the parameters to a query, PyGreSQL does not only adapt the basic
168types like ``int``, ``float``, ``bool`` and ``str``, but also tries to make
169sense of Python lists and tuples.
170
171Lists are adapted as PostgreSQL arrays::
172
173    >>> params = dict(array=[[1, 2],[3, 4]])
174    >>> db.query_formatted("SELECT %(array)s::int[]", params).getresult()[0][0]
175    [[1, 2], [3, 4]]
176
177Note that again we only need to cast the array parameter or use inline
178parameters because this simple query does not provide enough context.
179Also note that the query gives the value back as Python lists again.  This
180is achieved by the typecasting mechanism explained in the next section.
181
182Tuples are adapted as PostgreSQL composite types.  If you use inline paramters,
183they can also be used with the ``IN`` syntax.
184
185Let's think of a more real world example again where we create a table with a
186composite type in PostgreSQL:
187
188.. code-block:: sql
189
190    CREATE TABLE on_hand (
191        item      inventory_item,
192        count     integer)
193
194We assume the composite type ``inventory_item`` has been created like this:
195
196.. code-block:: sql
197
198    CREATE TYPE inventory_item AS (
199        name            text,
200        supplier_id     integer,
201        price           numeric)
202
203In Python we can use a named tuple as an equivalent to this PostgreSQL type::
204
205    >>> from collections import namedtuple
206    >>> inventory_item = namedtuple(
207    ...     'inventory_item', ['name', 'supplier_id', 'price'])
208
209Using the automatic adaptation of Python tuples, an item can now be
210inserted into the database and then read back as follows::
211
212    >>> db.query_formatted("INSERT INTO on_hand VALUES (%(item)s, %(count)s)",
213    ...     dict(item=inventory_item('fuzzy dice', 42, 1.99), count=1000))
214    >>> db.query("SELECT * FROM on_hand").getresult()[0][0]
215    Row(item=inventory_item(name='fuzzy dice', supplier_id=42,
216            price=Decimal('1.99')), count=1000)
217
218The :meth:`DB.insert` method provides a simpler way to achieve the same::
219
220    >>> row = dict(item=inventory_item('fuzzy dice', 42, 1.99), count=1000)
221    >>> db.insert('on_hand', row)
222    {'count': 1000,  'item': inventory_item(name='fuzzy dice',
223            supplier_id=42, price=Decimal('1.99'))}
224
225However, we may not want to use named tuples, but custom Python classes
226to hold our values, like this one::
227
228    >>> class InventoryItem:
229    ...
230    ...     def __init__(self, name, supplier_id, price):
231    ...         self.name = name
232    ...         self.supplier_id = supplier_id
233    ...         self.price = price
234    ...
235    ...     def __str__(self):
236    ...         return '%s (from %s, at $%s)' % (
237    ...             self.name, self.supplier_id, self.price)
238
239But when we try to insert an instance of this class in the same way, we
240will get an error.  This is because PyGreSQL tries to pass the string
241representation of the object as a parameter to PostgreSQL, but this is just a
242human readable string and not useful for PostgreSQL to build a composite type.
243However, it is possible to make such custom classes adapt themselves to
244PostgreSQL by adding a "magic" method with the name ``__pg_str__``, like so::
245
246    >>> class InventoryItem:
247    ...
248    ...     ...
249    ...
250    ...     def __str__(self):
251    ...         return '%s (from %s, at $%s)' % (
252    ...             self.name, self.supplier_id, self.price)
253    ...
254    ...     def __pg_str__(self, typ):
255    ...         return (self.name, self.supplier_id, self.price)
256
257Now you can insert class instances the same way as you insert named tuples.
258You can even make these objects adapt to different types in different ways::
259
260    >>> class InventoryItem:
261    ...
262    ...     ...
263    ...
264    ...     def __pg_str__(self, typ):
265    ...         if typ == 'text':
266    ...             return str(self)
267    ...        return (self.name, self.supplier_id, self.price)
268    ...
269    >>> db.query("ALTER TABLE on_hand ADD COLUMN remark varchar")
270    >>> item=InventoryItem('fuzzy dice', 42, 1.99)
271    >>> row = dict(item=item, remark=item, count=1000)
272    >>> db.insert('on_hand', row)
273    {'count': 1000, 'item': inventory_item(name='fuzzy dice',
274        supplier_id=42, price=Decimal('1.99')),
275        'remark': 'fuzzy dice (from 42, at $1.99)'}
276
277There is also another "magic" method ``__pg_repr__`` which does not take the
278*typ* parameter.  That method is used instead of ``__pg_str__`` when passing
279parameters inline.  You must be more careful when using ``__pg_repr__``,
280because it must return a properly escaped string that can be put literally
281inside the SQL.  The only exception is when you return a tuple or list,
282because these will be adapted and properly escaped by PyGreSQL again.
283
284Typecasting to Python
285---------------------
286
287As you noticed, PyGreSQL automatically converted the PostgreSQL data to
288suitable Python objects when returning values via the :meth:`DB.get()`,
289:meth:`Query.getresult()` and similar methods.  This is done by the use
290of built-in typecast functions.
291
292If you want to use different typecast functions or add your own if no
293built-in typecast function is available, then this is possible using
294the :func:`set_typecast` function.  With the :func:`get_typecast` function
295you can check which function is currently set.  If no typecast function
296is set, then PyGreSQL will return the raw strings from the database.
297
298For instance, you will find that PyGreSQL uses the normal ``int`` function
299to cast PostgreSQL ``int4`` type values to Python::
300
301    >>> pg.get_typecast('int4')
302    int
303
304In the classic PyGreSQL module, the typecasting for these basic types is
305always done internally by the C extension module for performance reasons.
306We can set a different typecast function for ``int4``, but it will not
307become effective, the C module continues to use its internal typecasting.
308
309However, we can add new typecast functions for the database types that are
310not supported by the C modul. Fore example, we can create a typecast function
311that casts items of the composite PostgreSQL type used as example in the
312previous section to instances of the corresponding Python class.
313
314To do this, at first we get the default typecast function that PyGreSQL has
315created for the current :class:`DB` connection.  This default function casts
316composite types to named tuples, as we have seen in the section before.
317We can grab it from the :attr:`DB.dbtypes` object as follows::
318
319    >>> cast_tuple = db.dbtypes.get_typecast('inventory_item')
320
321Now we can create a new typecast function that converts the tuple to
322an instance of our custom class::
323
324    >>> cast_item = lambda value: InventoryItem(*cast_tuple(value))
325
326Finally, we set this typecast function, either globally with
327:func:`set_typecast`, or locally for the current connection like this::
328
329    >>> db.dbtypes.set_typecast('inventory_item', cast_item)
330
331Now we can get instances of our custom class directly from the database::
332
333    >>> item = db.query("SELECT * FROM on_hand").getresult()[0][0]
334    >>> str(item)
335    'fuzzy dice (from 42, at $1.99)'
336
337Note that some of the typecast functions used by the C module are configurable
338with separate module level functions, such as :meth:`set_decimal`,
339:meth:`set_bool` or :meth:`set_jsondecode`.  You need to use these instead of
340:meth:`set_typecast` if you want to change the behavior of the C module.
341
342Also note that after changing global typecast functions with
343:meth:`set_typecast`, you may need to run ``db.dbtypes.reset_typecast()``
344to make these changes effective on connections that were already open.
345
346As one last example, let us try to typecast the geometric data type ``circle``
347of PostgreSQL into a `SymPy <http://www.sympy.org>`_ ``Circle`` object.  Let's
348assume we have created and populated a table with two circles, like so:
349
350.. code-block:: sql
351
352    CREATE TABLE circle (
353        name varchar(8) primary key, circle circle);
354    INSERT INTO circle VALUES ('C1', '<(2, 3), 3>');
355    INSERT INTO circle VALUES ('C2', '<(1, -1), 4>');
356
357With PostgreSQL we can easily calculate that these two circles overlap::
358
359    >>> q = db.query("""SELECT c1.circle && c2.circle
360    ...     FROM circle c1, circle c2
361    ...     WHERE c1.name = 'C1' AND c2.name = 'C2'""")
362    >>> q.getresult()[0][0]
363    True
364
365However, calculating the intersection points between the two circles using the
366``#`` operator does not work (at least not as of PostgreSQL version 10).
367So let's resort to SymPy to find out.  To ease importing circles from
368PostgreSQL to SymPy, we create and register the following typecast function::
369
370    >>> from sympy import Point, Circle
371    >>>
372    >>> def cast_circle(s):
373    ...     p, r = s[1:-1].split(',')
374    ...     p = p[1:-1].split(',')
375    ...     return Circle(Point(float(p[0]), float(p[1])), float(r))
376    ...
377    >>> pg.set_typecast('circle', cast_circle)
378
379Now we can import the circles in the table into Python simply using::
380
381    >>> circle = db.get_as_dict('circle', scalar=True)
382
383The result is a dictionary mapping circle names to SymPy ``Circle`` objects.
384We can verify that the circles have been imported correctly:
385
386    >>> circle['C1']
387    Circle(Point(2, 3), 3.0)
388    >>> circle['C2']
389    Circle(Point(1, -1), 4.0)
390
391Finally we can find the exact intersection points with SymPy:
392
393    >>> circle['C1'].intersection(circle['C2'])
394    [Point(29/17 + 64564173230121*sqrt(17)/100000000000000,
395        -80705216537651*sqrt(17)/500000000000000 + 31/17),
396     Point(-64564173230121*sqrt(17)/100000000000000 + 29/17,
397        80705216537651*sqrt(17)/500000000000000 + 31/17)]
Note: See TracBrowser for help on using the repository browser.