Content Component
Content Component is an ACI server. For details of changes that affect all ACI servers, see ACI Server Framework.
23.4.0
New in this Release
-
Content now allows you to collect basic statistics about the distribution and data content of your document fields, such as:
-
The total number of individual occurrences of each field, and the number of distinct documents each appears in.
-
How many distinct values are observed for each field.
-
Occurrence counts for values that might be parsed as numeric, date, or geographic values.
-
Distribution (minimum, maximum, and mean) of numeric and date values, and value lengths, for each field.
You can configure statistics collection by setting the new
CollectFieldStatisticsconfiguration parameter to in the[Server]section.For an existing index, you can also use the new
RegenerateFieldStatsIndexconfiguration parameter to generate field statistics at startup.When you enable field statistics:
-
you can retrieve statistics for each field by using the
GetTagNamesaction with the newFieldStatsparameter set to True.
-
the
Suggestaction uses structural information from the source documents as well as the unstructured information to find relevant documents. -
the
TermGetBestaction can return information about the occurrences of structured field and value pairs, when you set the newFieldStatsparameter to True. -
The
GetQueryTagValuescan return total occurrences for non-parametric fields (that is when you set bothAllowNonParametricFieldsandDocumentCountto True).
-
-
You can now index vector values into Content, and use these for queries. A vector in a document is a comma-separated list of floating point values. You can generate vectors by using many different models. Content can then use these vectors to find documents that are similar to a vector value that you use in the new
VECTORoperator in aQueryaction, or to performSuggestqueries.To configure Content to process and use vector values, you must use the new
VectorTypefield property for the field that contains the vector values. You can update an existing index to use these vector values by setting theRegenerateVectorIndexconfiguration parameter, or by using theDREREGENERATEindex action.You can configure the method to use to determine how close vectors are to one another by setting the
DistanceMetricparameter in the[VectorIndex]configuration section. You can also change the directory that Content uses to store the vector index files by setting theVectorPathparameter in the[Paths]section.For more information, refer to the IDOL Content Component Help.
-
The spellcheck phase of a query now respects timeouts.
-
Indexing performance has been improved when sending documents to Content in small batches.
-
The mapped security library has been updated. The security type
AUTONOMY_SECURITY_V4_TRIM_CONTEXT_EXT_MAPPED(for Content Manager) now supports exclusions. -
Performance has been improved for cases where a several index actions were issued sequentially with a pause ot wait for each to complete before sending the next (for example, from a script or application that polls for a finished status between running each action).
Resolved Issues
-
When requesting value details from a numeric field (with the
GetQueryTagValuesaction andValueDetailsset to True), results were sometimes missing from multi-section documents. -
When Content was archiving index actions, and the index log stream was configured to report messages at Full log level, sending an index action with the
NoArchiveparameter set to True could cause an unexpected interruption of service. -
Geospatial queries could time out when the
XMLFullStructureconfiguration parameter was set to Trueand there were a large number of geospatial fields (more than approximately 10000).
23.3.0
New in this Release
-
The handling of reasons has been improved to merge overlapping reasons. For example, the query text
James Watt" DNEAR Jrpreviously gave the reasons James Watt and Watt Jr. It now returns the single reason James Watt Jr. -
The efficiency of suggesting spelling corrections has been improved. This change gives particular improvements when
UnstemmedMinDocOccsis configured to a value less than the currentSpellCheckCorrectMinDocOccssetting. -
Several updates and improvements have been made to the BIAS FieldText operators:
-
The new
BIASRANGEoperator has been added. This operator allows you to bias the score of results that fall within a particular date range. It also allows you to reduce the score bias for values within a specified range outside this optimum range. For example:BIASRANGE{21/08/2011,25/08/2011,172800,86400,10}:DATEThis example boosts the score by 10% for documents with a DATE field value in the range 21/08/2011 to 25/08/2011 (inclusive). It gives a smaller boost (on a linear scale) for documents within 172800s (two days) before 21/08/2011, and 86400s (one day) after.
-
The new
BIASNRANGEoperator has been added. this operator allows you to bias the score of results that contain a value within a specified range in a specified field, and to reduce the score bias linearly for values within a specified range outside this optimum range. For example:FieldText=BIASNRANGE{100,150,20,40,10}:*/PRICEA document whose
PRICEfield value is between 100 and 150 has its weight increased by 10%. This boost decreases linearly to 0% at 80 and lower, and 190 and higher. -
The
BIASVALoperator now supports an empty value for its first argument. For example,BIASVAL{,10}:COLOURapplies a score boost to any result document that does not have aCOLOURfield, or has aCOLOURfield with an empty value.NOTE:
BIASVALstill requires two arguments, soBIASVAL{10}:COLOURis not valid syntax. -
You can now use all BIAS field specifiers in
FieldTextFieldfields for use with AgentBoolean queries (that is,BIAS,BIASDATE,BIASDISTCARTESIAN,BIASDISTSPHERICAL,BIASVAL,BIASRANGE, andBIASNRANGEare now supported for AgentBoolean queries).
-
-
You can now use an open-ended range in the
NRANGEfield operator by setting one of the values to a period (.). For exampleNRANGE{.,5}:NUMmeans that theNUMfield must contain a value of 5 or less. -
The
GetQueryTagValuesvalue response whenDocumentCountis set to True now includes the total number of occurrences for each value in the server.
Resolved Issues
-
When used in conjunction with the
WHENoperator in XML full-structure mode, theTERMandTERMEXACTFieldText specifiers failed to return some documents that should have matched. -
The indexer thread could be blocked for an extended time when attempting to delete a file, if the target had been removed in the meantime by an external process.
-
When rebuilding the unstemmed index with
RegenerateUnstemmedIndex, numeric/alphanumeric terms were sometimes excluded, regardless of the configuredIndexNumbersvalue. -
The Content component NiFi processor,
ContentServiceImplwas unable to obtain a license correctly.
23.2.0
New in this Release
-
Loading has been optimized for
ACLTypefields that have also been configured asMemcachedType(seeNodetableCacheFields).NOTE: This change is only relevant to security models where the DLL load is required for evaluation.
-
The
QueryCacheMaxMemKBconfiguration parameter has been added to the[Security]section. Set this parameter to a value in KB to enable a per-query cache that speeds up security checks for cases where there are many non-unique ACLs in the system (for example, where security is inherited from a top level folder). If the same ACL has already been evaluated during the query, Content does not need to call the security DLL again. You can setQueryCacheMaxMemKBto -1 for an unlimited cache size, or 0 to disable the cache.NOTE: This change is only relevant to security models where the DLL load is required for evaluation (for example, there is no need to use this parameter with NT_V4 security).
Resolved Issues
-
In some cases, Content failed to return hits for terms that existed only in the index cache and not in any indexed documents when
SearchUncommittedDocumentswas set to True. -
Content could spuriously log an error
"Dynterm list is NULL for term". This error tended to happen for terms with a large number (millions) of occurrences, in servers where documents were regularly deleted and the index compacted. -
When the Active Directory contained a group name that ends with a space character, the Content security index could become invalid after the component was restarted.
-
When the saved best terms cache file was non-valid, the Content application could shut down during a
DRECOMPACToperation. Content now automatically rebuilds the cache if it cannot load the saved file.