tags) are then searched * ( tag's titles are deemed here to 'belong' to the target/linked page instead) * - Code to construct extracts updated to use mb_* functions. Copes now with multibyte charcters * * ------------------------------------------------------------------------------------------------------------------- * * USEAGE: * [!Search!] - the default behaviour; searches pagetitle, longtitle, introtext, content * * [!Search? * &searchFields=`pagetitle, longtitle, content, Year, RepairNotes` - search these TVs * &tpl=`SearchResultsTpl` - Template for each document found * &searchStartTpl=`SearchInfoTpl`!] - Template to put at the start * * INPUTS: * $searchParents - optional space and/or comma delimited list of document roots to search. If set but empty, don't search anything. * $searchExcludeParents - optional space and/or comma delimited list of document roots to not search (overides $searchParents). If set but empty, has no effect. * $searchTerms - space and/or comma delimited list of search terms (default: searchTerms i.e. looks in $_REQUEST['searchTerms']) * $searchFields - space and/or comma delimited list of TVs to search (default: pagetitle, longtitle, introtext, content) * $searchExtractChars - Maximum number of chars in extract (default: 200). If set to zero, the default value is used. * $searchMaxDocs - Maximum number of docs to find (if unset or zero all are listed). If pagination used this refers to the total found, not just the number on this page. * $searchMinLetters - Shortest length of search terms to use (default: 3). Warning issued if shorter lengths used. * $searchOutputList - Output document list as unordered list (default: 1) i.e. surrounds results listing with * $searchStartTpl - Template to output at the start (default:

[+search.finds+] document[+search.plural+] found containing [+search.terms+].

[+search.form+][+search.pagination+]). * $searchEndTpl - Template to output at the end (default: [+search.pagination+]). * $searchNoResultTpl - Output if no results found (default:

No Results Found

[+search.form+]) * $searchNoInputTpl - Output if no input given (default:

No search terms given.

* $tpl - Template for individual documents. (default:

[+search.result_link+][+pagetitle+][+/link+]

[+search.terms_found+][+search.result_extract+]) * $searchPlural - Plural character(s) for templating; applies to [+search.plural+] placeholder. (Default: 's') * $mb_charset - Character set received from MySQL; sets mb_internal_encoding and used for entity de/encoding. Defaults to 'UTF-8'. * * INPUTS FOR PAGINATION: * $searchPaginate - Set to 1 to enable pagination * $searchPaginationTpl - Template to insert for [+search.pagination+] if needed. * (Default:
*

Displaying [+search.page_start+] to [+search.page_end+] of [+search.finds+] result[+search.plural+].

*
* [+search.itemsperpage+] *
* *
) * $searchPaginateLinkText - comma delimited link text (default: First,Previous,Next,Last) * $searchPaginateLinksSeparator - Separator between page links (default: ' | ') * $searchPaginateItemsPerPage - Items per page (default: 10) * $searchPaginateItemsPerPageLabel - Label for select box (for above options) (default: 'Items per page') * $searchPaginateItemsPerPageOptions - Options for 'items per page' select box as a space and/or comma delimited list. * $searchPaginateItemsPerPage will be included by the snippet if missing from this list. * (default: "5, 10, 25, 50") * * TEMPLATING: * [+search.finds+] - total number of documents found (NOTE - if $searchMaxDocs used this will show the number displayed) * [+search.terms+] - search terms used * [+search.plural+] - inserts $searchPlural if >1 document found * [+search.pagination+] - inserts pagination template ($searchPaginationTpl) if used and required * * TEMPLATING FOR PAGINATION ($searchPaginationTpl chunk only) * [+search.page_start+] - index of first item displayed * [+search.page_end+] - index of last item displayed * [+search.page_items+] - items displayed per page * [+search.goto_first+] - link to first 'page' or if current page, output a dummy instead of * [+search.goto_last+] - link to last 'page' or if current page, output a dummy instead of * [+search.goto_prev+] - link to previous 'page' or if current page, output a dummy instead of * [+search.goto_next+] - link to next 'page' or if current page, output a dummy instead of * [+search.goto_pages+] - links to all 'pages' or if current page, output a dummy instead of * [+search.itemsperpage+] - Select box to change items per page (JS only, not displayed if JS not present) * * TEMPLATING WITHIN INDIVIDUAL SEARCH RESULTS ($tpl chunk only): * [+search.result_posn+] - position in search results * [+search.result_terms+] - Shows terms found in individual documents if (and only if) more than one search term entered. * (Wrapped by div.search-terms-found) * [+search.result_link+]....[+/link] - Link to document * TVs can be inserted using [+TV+] e.g. [+pagetitle+], [+customTV+] * * NON-TEMPLATED OUTPUT: * EXTRACTS: * p.search-result-extract - extract for each result * div.search-terms-found - wraps [+search.result_terms+] * span.highlight - used in extracts for highlighting of search terms within text * LISTS: * ul#search-result-list and li.search-result-extract * - used if $searchOutputList is 1 (the default) * JAVASCRIPT: (items per page functionality) * .search-JS-only - elements to be displayed only if JS enabled (this being handled by the snippet) * * ------------------------------------------------------------------------------------------------------------------- * * ISSUES: Will not find words that contain HTML tags - minor * Will not find words that contain ­ - very minor * If documents are found correctly that also have characteristics of false positives, * the false positive search terms are still displayed - very minor * * TODO Prioritise on headings (1.4) * - cycle through search finds, checking content fields for ([^<]|\<(?!\/h))* Cannot use MySQL REGEXP due to Lookahead. * Do this prior to uasort and modify uasort's anonymous function. Just need to check terms from $search_terms_found. * - implement NOT/- search * - miss weblink content link out??? * * FUTURE PORTABILITY ISSUES: * Use of mysql_* functions * * ------------------------------------------------------------------------------------------------------------------- */ // PHP 4 fix. if (!function_exists("mb_stripos")) { function mb_stripos($str,$needle) { return mb_strpos(strtolower($str),strtolower($needle)); } } /********************************/ /* Check and process parameters */ /********************************/ // Set character set for mb_* functions $mb_charset = isset($mb_charset) ? $mb_charset : 'UTF-8'; mb_internal_encoding($mb_charset); $searchTerms = isset($searchTerms) ? $searchTerms : 'searchTerms'; // Convert searchTerms to actual string $search_string = trim($_REQUEST[$searchTerms]); if (get_magic_quotes_gpc()) $search_string = stripslashes($search_string); $search_string = strip_tags($search_string); // Check search terms are present if (empty($search_string)) return isset($searchNoInputTpl) ? $modx->getChunk($searchNoInputTpl) : '

No search terms given.

'; /* !! Early Return !! */ // Convert $search_string to arrays $search_terms_array_quoted = array(); // Note that the search is case insensitive (uses SQL 'LIKE'). The next line ensures the calls to array_unique() below filter out repeated values that differ in case only. $search_string_copy = strtolower($search_string); // First look for double quoted terms. while (($pos = mb_strpos($search_string_copy, '"')) !== false) { if ($pos2 = mb_strpos($search_string_copy, '"', $pos+1)) { $search_terms_array_quoted[] = mb_substr($search_string_copy, $pos+1, $pos2 - $pos - 1); $search_string_copy = ($pos ? mb_substr($search_string_copy, 0, $pos-1) : '').' '.(($pos2 < mb_strlen($search_string_copy) - 1) ? mb_substr($search_string_copy, $pos2+1) : ''); } else { // incomplete? Finish looping break; } } // Filter out repeated values. $search_terms_array_quoted = array_unique($search_terms_array_quoted); // If anything is left in $search_string_copy, then secondly split string based on spaces and/or commas. if ($search_string_copy = trim($search_string_copy)) { $search_terms_array_splitonspaces = array_unique(preg_split('/[\s,]+/', $search_string_copy)); // Note that repeated values are filtered out. $search_terms_array = array_unique(array_merge($search_terms_array_splitonspaces, $search_terms_array_quoted)); // ditto } else { $search_terms_array_splitonspaces = array(); $search_terms_array = $search_terms_array_quoted; } // Add an extra 'term' - that of the entire search string, unprocessed. This will ensure that exact phrase matches have a $search_finds[n]['count'] higher than all other documents. // Note that we do not bother if the phrase is already there. This would mean that one word or one quoted phrase has been entered, but we don't check // if (sizeof($search_terms_array) > 1) as this wouldn't cover situations such as 'around 10' where the 10 would not by default be added to $search_terms_array. if (!in_array($search_string, $search_terms_array)) $search_terms_array[] = $search_string; // Terms to highlight $search_highlight_plugin_parameters = '&searched='.urlencode(implode(' ', $search_terms_array)).'&highlight=highlight'; // Convert $searchFields to array $search_fields_array = isset($searchFields) ? preg_split('/[\s,]+/', $searchFields) : array('pagetitle', 'longtitle', 'introtext', 'content'); // Split $search_fields_array into default site content fields, and custom template variables. Store strings escaped for MySQL. $search_std_tplvars = array('pagetitle', 'longtitle', 'description', 'alias', 'introtext', 'content', 'menutitle'); foreach($search_fields_array as $search_field) { if ($search_field) { if (in_array($search_field, $search_std_tplvars)) $search_std_fields[] = mysql_real_escape_string($search_field); else $search_custom_tv_fields[] = mysql_real_escape_string($search_field); } } // Set $searchExtractChars to default if not set $searchExtractChars = $searchExtractChars ? $searchExtractChars : 200; $searchExtractCharsHalved = $searchExtractChars >> 1; // Convert template chunk names to text (or use defaults) $tpl = isset($tpl) ? $modx->getChunk($tpl) : '

[+search.result_link+][+pagetitle+][+/link+]

[+search.terms_found+][+search.result_extract+]'; $searchStartTpl = isset($searchStartTpl) ? $modx->getChunk($searchStartTpl) : '

[+search.finds+] document[+search.plural+] found containing [+search.terms+].

[+search.form+][+search.pagination+]'; $searchEndTpl = isset($searchEndTpl) ? $modx->getChunk($searchEndTpl) : '[+search.pagination+]'; $searchNoResultTpl = isset($searchNoResultTpl) ? $modx->getChunk($searchNoResultTpl) : '

No Results Found

[+search.form+]'; $searchPlural = isset($searchPlural) ? $searchPlural : 's'; // Output as (unordered) list? $searchOutputList = isset($searchOutputList) ? $searchOutputList : 1; // Minimum length of search terms $searchMinLetters = isset($searchMinLetters) ? $searchMinLetters : 3; // Pagination if ($searchPaginate) { // Ensure $_GET params can override snippet parameters $searchPaginateItemsPerPage = ($_GET['search_itemsperpage'] && ctype_digit($_GET['search_itemsperpage'])) ? $_GET['search_itemsperpage'] : (isset($searchPaginateItemsPerPage) ? $searchPaginateItemsPerPage : 10); $search_paginate_start = ($_GET['search_start'] && ctype_digit($_GET['search_start'])) ? $_GET['search_start'] : 1; $searchPaginateLinksSeparator = isset($searchPaginateLinksSeparator) ? $searchPaginateLinksSeparator : ' | '; // Array of link text: 0=>First 1=>Prev 2=>Next 3=>Last $search_paginate_linktext_array = explode(',', $searchPaginateLinkText ? $searchPaginateLinkText : 'First,Previous,Next,Last'); // User selectable items per page $search_paginate_itemsPerPage_options = isset($searchPaginateItemsPerPageOptions) ? preg_split('/[\s,]+/', $searchPaginateItemsPerPageOptions) : array (5, 10, 25, 50); if (!in_array($searchPaginateItemsPerPage, $search_paginate_itemsPerPage_options)) { // Ensure current itemsperpage value is in option list $search_paginate_itemsPerPage_options[] = $searchPaginateItemsPerPage; sort($search_paginate_itemsPerPage_options); } $searchPaginateItemsPerPageLabel = isset($searchPaginateItemsPerPageLabel) ? $searchPaginateItemsPerPageLabel : 'Items per page'; // Pagination template $searchPaginationTpl = isset($searchPaginationTpl) ? $modx->getChunk($searchPaginationTpl) : '

Displaying [+search.page_start+] to [+search.page_end+] of [+search.finds+] result[+search.plural+].

[+search.itemsperpage+]
'; // Start of query string to pass on with pagination. Note quotes are encoded now to avoid issues of quotes within (X)HTML attributes. $search_pagination_query = $searchTerms.'='.str_replace('"', '%22', $search_string); } // Check for $searchParents/$searchExcludeParents - if set we need to restrict the domain of the search by filtering the MySQL results function getChildrenInDomain($id) { global $modx; $result = array($id); $child_docs = $modx->getActiveChildren($id, 'id', 'ASC', 'id'); if ($child_docs && sizeof($child_docs)) { foreach($child_docs as $child_doc) $result = array_merge($result, getChildrenInDomain($child_doc['id'])); } return $result; } if (strlen($searchParents)) { $search_parents_array = preg_split('/[\s,]+/', $searchParents); $search_domain = array(); foreach($search_parents_array as $search_parent) $search_domain = array_merge($search_domain, getChildrenInDomain($search_parent)); } elseif ($searchParents === '') $search_domain = array(); if (strlen($searchExcludeParents)) { if (!isset($search_domain)) $search_domain = getChildrenInDomain(0); $search_exclude_parents_array = preg_split('/[\s,]+/', $searchExcludeParents); foreach($search_exclude_parents_array as $search_exclude_parent) $search_domain = array_diff($search_domain, getChildrenInDomain($search_exclude_parent)); } /**********/ /* Search */ /**********/ $modx_site_content = $modx->getFullTableName('site_content'); $modx_site_tmplvars = $modx->getFullTableName('site_tmplvars'); $modx_site_tmplvar_contentvalues = $modx->getFullTableName('site_tmplvar_contentvalues'); $search_output = ''; $search_output_warnings = array(); // $search_finds[$docid] will be incremented when a search term is found. // $search_terms_found[$docid] is either unset or will contain an array of search terms. $search_finds = array(); $search_terms_found = array(); // Loop through search terms foreach($search_terms_array as $search_term_key=>$search_term) { if (mb_strlen($search_term, $mb_charset) < $searchMinLetters) { // Eliminate search terms that are too short // Also gets rid of empty items in $search_terms_array that may have occured due to extraneous spaces and explode() $search_output_warnings[] = '

Ignoring search terms with less than '.$searchMinLetters.' letters.

'; // Ensure that [+search.terms+] placeholder does not show null or irrelevant strings e.g. "term,,,,term2" unset($search_terms_array[$search_term_key]); } else { // Note that magic-quotes-derived slashes are stripped above // Consider all combinations of entities without duplicating search values $search_terms_escaped = array(); $search_terms_escaped[] = mysql_real_escape_string($search_term); $search_terms_escaped[] = mysql_real_escape_string(htmlentities($search_term, ENT_NOQUOTES, $mb_charset)); $search_terms_escaped[] = mysql_real_escape_string(htmlentities($search_term, ENT_COMPAT, $mb_charset)); $search_terms_escaped[] = mysql_real_escape_string(htmlentities($search_term, ENT_QUOTES, $mb_charset)); $search_terms_escaped = array_unique($search_terms_escaped); // Reset query parts from previous iterations $search_query1 = $search_query2 = ''; // Standard TVs as stored in modx_site_content (note that $search_std_fields has already been escaped) if (is_array($search_std_fields)) { foreach($search_std_fields as $search_std_field) $search_query1 .= " OR {$search_std_field} LIKE '%".implode("%' OR {$search_std_field} LIKE '%", $search_terms_escaped).'%\''; } // Custom TV fields (note that $search_custom_tv_fields has already been escaped) if (is_array($search_custom_tv_fields)) { foreach($search_custom_tv_fields as $search_custom_tv_field) $search_query2 .= " OR ({$modx_site_tmplvars}.name = '{$search_custom_tv_field}' AND ({$modx_site_tmplvar_contentvalues}.value LIKE '%".implode("%' OR {$modx_site_tmplvar_contentvalues}.value LIKE '%", $search_terms_escaped).'%\'))'; } if ($search_query1 && $search_query2) // Both standard and custom TV's present { $search_result = mysql_query(" SELECT id, editedon FROM {$modx_site_content} WHERE published = 1 AND searchable = 1 AND deleted=0 AND (".substr($search_query1, 4).") UNION DISTINCT SELECT {$modx_site_content}.id, {$modx_site_content}.editedon FROM {$modx_site_content}, {$modx_site_tmplvar_contentvalues}, {$modx_site_tmplvars} WHERE {$modx_site_content}.id = {$modx_site_tmplvar_contentvalues}.contentid AND {$modx_site_tmplvars}.id = {$modx_site_tmplvar_contentvalues}.tmplvarid AND {$modx_site_content}.published = 1 AND {$modx_site_content}.searchable = 1 AND {$modx_site_content}.deleted=0 AND (".substr($search_query2, 4).')'); } elseif ($search_query1) // Standard TVs only { $search_result = mysql_query(" SELECT id, editedon FROM {$modx_site_content} WHERE published = 1 AND searchable = 1 AND deleted=0 AND (".substr($search_query1, 4).')'); } elseif ($search_query2) // Custom TVs only { $search_result = mysql_query(" SELECT {$modx_site_content}.id, {$modx_site_content}.editedon FROM {$modx_site_content}, {$modx_site_tmplvar_contentvalues}, {$modx_site_tmplvars} WHERE {$modx_site_content}.id = {$modx_site_tmplvar_contentvalues}.contentid AND {$modx_site_tmplvars}.id = {$modx_site_tmplvar_contentvalues}.tmplvarid AND {$modx_site_content}.published = 1 AND {$modx_site_content}.searchable = 1 AND {$modx_site_content}.deleted=0 AND (".substr($search_query2, 4).')'); } if ($search_result) { while($search_row = mysql_fetch_row($search_result)) { // increase counts of docids found $search_finds[$search_row[0]]['count']++; // store last edit date for ordering $search_finds[$search_row[0]]['date'] = $search_row[1]; // record search terms found and in which documents $search_terms_found[$search_row[0]][] = $search_term; } } } } if (sizeof($search_finds)) { // Put documents with the most finds (by distinct keywords) first, then by date edited (most recent first) uasort($search_finds, create_function('$a, $b', 'if ($a[\'count\'] > $b[\'count\']) return -1; elseif ($a[\'count\'] < $b[\'count\']) return 1; elseif ($a[\'date\'] > $b[\'date\']) return -1; elseif ($a[\'date\'] < $b[\'date\']) return 1; else return 0;')); /**********/ /* Output */ /**********/ $search_posn = 0; // Current position in ALL results during construction of output. // At the end is the total number of items found $search_page_posn = 0; // Current position within page during construction of output. // At the end is the number of items on this page. foreach(array_keys($search_finds) as $search_find_docid) { if ($searchMaxDocs && $search_posn >= $searchMaxDocs) break; // !! Early exit from loop !! if (isset($search_domain) && !in_array($search_find_docid, $search_domain)) continue; // !! Early jump to next iteration !! $search_template = $tpl; // Reset from previous iteration // Get template vars for the document. $search_tv_outputs = $modx->getTemplateVarOutput('*', $search_find_docid, 1); if (is_array($search_tv_outputs)) // Filters out documents that current user cannot access { // Parse template vars, and get their lengths (used when constructing extract) foreach($search_tv_outputs as $search_tv_name=>$search_tv_output) { // Parse $search_template = str_replace('[+'.$search_tv_name.'+]', $search_tv_output, $search_template); // Store lengths of outputs for use by extract code $search_tv_output_lengths[$search_tv_name] = strlen($search_tv_output); } // Parse and construct extract // This also acts as a final filter to eliminate documents that have been erroneously found // due to matches within the code of HTML entities e.g. eta with θ See notes below re: false positive // Go through TVs in descending order of output length so as to try and form an extract using the longest TV arsort($search_tv_output_lengths); // First try to construct extract based on main body of template variables. // This excludes everything within tags e.g. title attributes foreach(array_keys($search_tv_output_lengths) as $search_tv_name) { // Only try to construct extracts from fields that were searched if (in_array($search_tv_name, $search_fields_array)) { // Decode entities to avoid erroneous matchings within the entity codes // Strip tags for uniform extract styling $search_extract = html_entity_decode(strip_tags($search_tv_outputs[$search_tv_name]), ENT_QUOTES, $mb_charset); if ($search_extract) // Avoid mb_stripos error { foreach ($search_terms_array as $search_term) { // At some point in these two nested loops this will succeed unless we have a // false positive OR the search terms have been found within title attributes. // This means that on success we leave these loops with $search_extract_posn as an integer, or false on fail. if (($search_extract_posn = mb_stripos($search_extract, $search_term)) !== false) { // Found - make extract if (($search_extract_length = mb_strlen($search_extract)) > $searchExtractChars) { // Extract is smaller than its source TV if ($search_extract_length - $search_extract_posn > $searchExtractCharsHalved) { if ($search_extract_posn > $searchExtractCharsHalved) $search_extract = '...'.mb_substr($search_extract, $search_extract_posn - $searchExtractCharsHalved, $searchExtractChars).'...'; else $search_extract = mb_substr($search_extract, 0, $searchExtractChars).'...'; } else $search_extract = '...'.mb_substr($search_extract, -$searchExtractChars); } break 2; } } } } } // If the above fails, then attempt to construct an extract using the title attributes. // Do not search tags - their titles are related to the linked page, not the source document. if ($search_extract_posn === false) { foreach(array_keys($search_tv_output_lengths) as $search_tv_name) { // Only try to construct extracts from fields that were searched if (in_array($search_tv_name, $search_fields_array)) { // Decode entities to avoid erroneous matchings within the entity codes. // Find title attributes - attribute strings go into $title_attributes[2] $title_attributes = array(); preg_match_all('/<\s*(a[a-z]|[^a])[^>]*title="([^"]*)"[^>]*>/', html_entity_decode($search_tv_outputs[$search_tv_name], ENT_QUOTES, $mb_charset), $title_attributes); if (is_array($title_attributes[2])) { foreach ($search_terms_array as $search_term) { foreach($title_attributes[2] as $title_attribute) { if ($title_attribute && ($search_extract_posn = mb_stripos($title_attribute, $search_term)) !== false) { $search_extract = $title_attribute; break 3; } } } } } } } if ($search_extract_posn !== false) // i.e. if not a false positive { ++$search_posn; // This records the first item as 1, not 0, and hence can be later used as a // test for finds vs no finds, as well as showing the total number of finds. if (!$searchPaginate || ($search_posn >= $search_paginate_start && $search_posn < $search_paginate_start + $searchPaginateItemsPerPage)) { ++$search_page_posn; // Highlight extract // Ensure all terms highlighted, not just the one that identified the extract // Need to search for htmlentities in the extract as we have just converted it. foreach ($search_terms_array as $search_term) { $search_highlight_posn = 0; $search_term_len = mb_strlen($search_term); while (($search_highlight_posn = mb_stripos($search_extract, $search_term, $search_highlight_posn)) !== false) { // Insert temporary placeholders until HTML entities re-encoded $search_extract = mb_substr($search_extract, 0, $search_highlight_posn). '[+search_highlight+]'.mb_substr($search_extract, $search_highlight_posn, $search_term_len).'[+/search_highlight+]'. mb_substr($search_extract, $search_highlight_posn + $search_term_len); // Advance offset to after this highlight (including temporary placeholders) for next iteration $search_highlight_posn += $search_term_len + 41; // strlen('[+search_highl.....') = 41 } } // Re-encode (note this always encodes quotes regardless of the initial state in the TV) $search_extract = htmlentities($search_extract, ENT_QUOTES, $mb_charset); // Replace temporary script-inserted placeholders after running htmlentities $search_extract = str_replace('[+search_highlight+]', '', $search_extract); $search_extract = str_replace('[+/search_highlight+]', '', $search_extract); // When finished with the individual extract, replace the placeholder in the template with this extract $search_template = str_replace('[+search.result_extract+]', '

'.$search_extract.'

', $search_template); // Parse position $search_template = str_replace('[+search.result_posn+]', $search_posn, $search_template); // Parse search terms found in this document $search_template = str_replace('[+search.terms_found+]', sizeof($search_terms_array) > 1 ? '
Contains '.strtolower(implode(', ', $search_terms_found[$search_find_docid])).'.
' : '', $search_template); // Parse links $search_template = str_replace('[+search.result_link+]', '
', $search_template); $search_template = str_replace('[+/link+]', '', $search_template); // Add this item to the output if ($searchOutputList) $search_output .= '
  • '.$search_template.'
  • '; else $search_output .= $search_template; } } } } /*******************************************/ /* Check for success and make final output */ /*******************************************/ if ($search_posn) // Valid finds and no false positives { // Wrap in