Indexes And Metadata¶
Description
How to program your custom fields and data queries through portal_catalog.
What does indexing mean?¶
Indexing is the action to make object data search-able. Plone stores available indexes in the database.
You can create them through-the-web and inspect existing indexes in portal_catalog on Index tab.
The Catalog Tool can be configured through the Management Interface or programatically in Python but current best practice in the CMF world is to use GenericSetup to configure it using the declarative catalog.xml file. The GenericSetup profile for Plone, for example, uses the CMFPlone/profiles/default/catalog.xml XML data file to configure the Catalog Tool when a Plone site is created. It is fairly readable so taking a quick look through it can be very informative.
When using a GenericSetup extension profile to customize the Catalog Tool in your portal, you only need to include XML for the pieces of the catalog you are changing. To add an index for the Archetypes location field, as in the example below, a policy package could include the following profiles/default/catalog.xml:
<?xml version="1.0"?>
<object name="portal_catalog" meta_type="Plone Catalog Tool">
<index name="location" meta_type="FieldIndex">
<indexed_attr value="location"/>
</index>
</object>
The GenericSetup import handler for the Catalog Tool also supports removing indexes from the catalog if present using the “remove” attribute of the <index> element. To remove the “start” and “end” indexes used for events, for example, a policy package could include the following profiles/default/catalog.xml:
<?xml version="1.0"?>
<object name="portal_catalog" meta_type="Plone Catalog Tool">
<index name="start" remove="True" />
<index name="end" remove="True" />
</object>
Warning
Care must be taken when setting up indexes with GenericSetup - if the import step for a catalog.xml is run a second time (for example when you reinstall the product), the indexes specified will be destroyed, losing all currently indexed entries, and then re-created fresh (and empty!). If you want to workaround this behavior, you can either update the catalog afterwards or add the indexes yourself in Python code using a custom import handler.
For more info, see this setuphandler https://github.com/plone/plone.app.event/blob/master/plone/app/event/setuphandlers.py in plone.app.event or these discussions on more about this problem:
Viewing Indexes And Indexed Data¶
Indexed data¶
You can do this through portal_catalog in the Management Interface.
Click portal_catalog in the portal root
Click Catalog tab
Click any object
Indexes And Metadata Columns¶
Available indexes are stored in the database, not in Python code. To see what indexes your site has
Click portal_catalog in the portal root
Click Indexes and Metadata tabs
Creating An Index¶
To perform queries on custom data, you need to add the corresponding index to portal_catalog first.
For example, if your Archetypes content type has a field:
schema = [
DateField("revisitDate",
widget = atapi.DateWidget(
label="Revisit date"),
description="When you are alarmed this content should be revisited (one month beforehand this date)",
schemata="revisit"
),
]
class MyContent(...):
# This is automatically run-time generated function accessor method,
# but could be any hand-written method as well
# def getMyCustomValue(self):
# pass
You can add a new index which will index the value of this field, so you can make queries based on it later.
See more information about accessor methods.
Note
If you want to create an index for content type you do not control yourself or if you want to do some custom logic in your indexer, please see Custom index method below.
Creating An Index Through The Web¶
This method is suitable during development time - you can create an index to your Plone database locally.
Go to the Management Interface
Click portal_catalog
Click Indexes tab
On top right corner, you have a drop down menu to add new indexes. Choose the index type you need to add.
Type: FieldIndex
Id: getMyCustomValue
Indexed attributes: getMyCustomValue
You can use Archetypes accessors methods directly as an indexed attribute.
In example we use getMyCustomValue
for AT field customValue
.
The type of index you need depends on what kind queries you need to do on the data. For example, direct value matching, ranged date queries, free text search, etc. need different kind of indexes.
After this you can query portal_catalog:
my_brains = contex.portal_catalog(getMyCustomValue=111) for brain in my_brains: print brain["getMyCustomValue"]
Adding Index Using Add-on Product Installer¶
You need to have your own add-on product which registers new indexes when the add-on installer is run. This is the recommended method for repeated installations.
You can create an index
Using catalog.xml where XML is written by hand
Create the index through the web and export catalog data from a development site using portal_setup tool Export functionality. The index is created through-the-web as above, XML is generated for you and you can fine tune the resulting XML before dropping it in to your add-on product.
Create indexes in Python code of add-on custom import step.
As a prerequisite, your add-on product must have GenericSetup profile support.
This way is repeatable: index gets created every time an add-on product is installed. It is more cumbersome, however.
Warning
There is a known issue of indexed data getting pruned when an add-on product is reinstalled.
If you want to avoid this then you need to create new indexes in add-on installer custom setup step (Python code).
The example below is not safe for data prune on reinstall.
This file is profiles/default/catalog.xml
It installs a new index called revisit_date
of DateIndex type.
<?xml version="1.0"?>
<object name="portal_catalog" meta_type="Plone Catalog Tool">
<index name="revisit_date" meta_type="DateIndex">
<property name="index_naive_time_as_local">True</property>
</index>
</object>
For more information see
Custom Index Methods¶
The plone.indexer package provides method to create custom indexing functions.
Sometimes you want to index “virtual” attributes of an object computed from existing ones, or you want to customize the way certain attributes are indexed, for example, saving only the 10 first characters of a field instead of its whole content.
To do so in an elegant and flexible way, Plone>=3.3 includes a new package, plone.indexer, which provides a series of primitives to delegate indexing operations to adapters.
Let’s say you have a content type providing the interface
IMyType
. To define an indexer for your type which takes the
first 10 characters from the body text, just type (assuming the
attribute’s name is ‘text’):
from plone.indexer.decorator import indexer
@indexer(IMyType)
def mytype_description(object, **kw):
return object.text[:10]
Finally, register this factory function as a named adapter using
ZCML. Assuming you’ve put the code above into a file named
indexers.py
:
<adapter name="description" factory=".indexers.mytype_description" />
Note
You can omit the for
attribute because you passed this to the @indexer
decorator,
and you can omit the provides
attribute because the thing returned by the decorator is actually a
class providing the required IIndexer
interface.
To learn more about the plone.indexer package, read its doctest.
For more info about how to create content types, refer to the developing add-ons section. For older Archetypes content types, see the Plone 4 documentention on Archetypes
Important
If you want to adapt an Archetypes content type like Event or News Item, take into account
you will have to feed the indexer
decorator with the Zope 3 interfaces defined in Products.ATContentTypes.interface.*
files, not with the deprecated Zope 2 ones into the Products.ATContentTypes.interfaces
file.
Creating A Metadata Column¶
The same rules and methods apply for metadata columns as creating index above. The difference with metadata is that
It is not used for searching, only displaying the search result
You store always a value copy as is
To create metadata colums in your catalog.xml
add:
<?xml version="1.0"?>
<object name="portal_catalog" meta_type="Plone Catalog Tool">
<!-- Add a new metadata column which will read from context.getSignificant() function -->
<column value="getSignificant"/>
</object>
When Indexing Happens And How To Reindex Manually¶
Content indexing happens automatically if:
The object is modified by the user using the standard edit forms
portal_catalog rebuild is run (from Advanced tab)
You must call reindexObject()
manually if you:
Directly call object field mutators
Otherwise directly change any object data
reindexObject()
method takes the optional argument idxs which will list the changed indexes.
If idxs is not given, all related indexes are updated even though they were not changed.
Example:
obj.setTitle('Foobar')
# update only the index associated with this change
obj.reindexObject(idxs=['Title'])
If you add a new index you need to run Rebuild catalog to get the existing values from content objects into the new index.
Also, if you modify security related parameters (permissions), you need to call reindexObjectSecurity()
.
Check the thread Best practices on reindexing the catalog for more tips on how to reduce memory consumption and speed up the process.
Warning
Unit test warning: Usually Plone reindexes modified objects at the end of each request (each transaction). If you modify the object yourself you are responsible to notify related catalogs about the new object data.
Index Types¶
Zope 2 product PluginIndexes defines various portal_catalog index types used by Plone.
FieldIndex stores values as is
DateIndex and DateRangeIndex store dates (Zope 2 DateTime objects) in searchable format. The latter provides ranged searches.
KeywordIndex allows keyword-style look-ups (query term is matched against all the values of a stored list)
ZCTextIndex is used for full text indexing
ExtendedPathIndex is used for indexing content object locations.
Default Plone Indexes And Metadata Columns¶
Some interesting indexes
start and end: Calendar event timestamps, used to make up calendar portlet
sortable_title: Title provided for sorting
portal_type: Content type as it appears in portal_types
Type: Translated, human readable, type of the content
path: Where the object is (getPhysicalPath accessor method).
object_provides: What interfaces and marker interfaces object has. KeywordIndex of interface full names.
is_default_page: is_default_page is method in CMFPlone/CatalogTool.py handled by plone.indexer, so there is nothing like object.is_default_page and this method calls ptool.isDefaultPage(obj)
Some interesting columns
getRemoteURL: Where to go when the object is clicked
- getIcon: This might be confusing:
Since Plone 5.0.2 - getIcon is a boolean value which is set to ``True``, when the item is an image ore has an image property (named image) e.g.: lead image or teaser image). The value of getIcon is used for showing preview images (thumbs) in lists, tables, content view, portlets, etc.).
Content type icons (aka portaltype-icons) ( e.g.: for folder, document, news item etc.) are now rendered as fontello fonts since Plone 5.0. Mime type icons are read from the mime type registry for all file content types instead a fontello font (since Plone 5.1)
exclude_from_nav: If True the object won’t appear in sitemap, navigation tree
mime_type: Since Plone 5.1: Mime type information for content items where applicable (file, image, custom types,…) e.g.:
text/plain
,image/jpeg
,application/pdf
…
Custom Sorting By Title¶
sortable_title is type of FieldIndex (raw value) and normal Title
index is type of searchable text.
sortable_title
is generated from Title
in Products/CMFPlone/CatalogTool.py
.
You can override sortable_title
by providing an indexer adapter with a specific interface of your content type.
Example indexes.py:
from plone.indexer import indexer
from xxx.researcher.interfaces import IResearcher
@indexer(IResearcher)
def sortable_title(obj):
"""
Provide custom sorting title.
This is used by various folder functions of Plone.
This can differ from actual Title.
"""
# Remember to handle None value if the object has not been edited yet
first_name = obj.getFirst_name() or ""
last_name = obj.getLast_name() or ""
return last_name + " " + first_name
Related configure.zcml
<adapter factory=".indexes.sortable_title" name="sortable_title" />
Full-text Searching¶
Plone provides special index called SearchableText
which is used on the site full-text search.
Your content types can override SearchableText
index with custom method to populate this index
with the text they want to go into full-text searching.
Below is an example of having SearchableText
on a custom Archetypes content class.
This class has some methods which are not part of AT schema and thus must be manually
added to SearchableText
def SearchableText(self):
"""
Override searchable text logic based on the requirements.
This method constructs a text blob which contains all full-text
searchable text for this content item.
This method is called by portal_catalog to populate its SearchableText index.
"""
# Test this by enable pdb here and run catalog rebuild in the Management Interface
# xxx
# Speed up string concatenation ops by using a buffer
entries = []
# plain text fields we index from ourself,
# a list of accessor methods of the class
plain_text_fields = ("Title", "Description")
# HTML fields we index from ourself
# a list of accessor methods of the class
html_fields = ("getSummary", "getBiography")
def read(accessor):
"""
Call a class accessor method to give a value for certain Archetypes field.
"""
try:
value = accessor()
except:
value = ""
if value is None:
value = ""
return value
# Concatenate plain text fields as is
for f in plain_text_fields:
accessor = getattr(self, f)
value = read(accessor)
entries.append(value)
transforms = getToolByName(self, 'portal_transforms')
# Run HTML valued fields through text/plain conversion
for f in html_fields:
accessor = getattr(self, f)
value = read(accessor)
if value != "":
stream = transforms.convertTo('text/plain', value, mimetype='text/html')
value = stream.getData()
entries.append(value)
# Plone accessor methods assume utf-8
def convertToUTF8(text):
if type(text) == unicode:
return text.encode("utf-8")
return text
entries = [ convertToUTF8(entry) for entry in entries ]
# Concatenate all strings to one text blob
return " ".join(entries)