Amazon Kendra is a extremely correct and simple-to-use clever search service powered by machine studying (ML). Amazon Kendra presents a collection of information supply connectors to simplify the method of ingesting and indexing your content material, wherever it resides.
Invaluable information in organizations is saved in each structured and unstructured repositories. An enterprise search answer ought to be capable to index and search throughout a number of structured and unstructured repositories.
Alfresco Content material Providers gives open, versatile, extremely scalable enterprise content material administration (ECM) capabilities with the added advantages of a content material providers platform, making content material accessible wherever and nonetheless you’re employed by way of straightforward integrations with the enterprise functions you employ each day. Many organizations use the Alfresco content material administration platform to retailer their content material. One of many key necessities for enterprise clients utilizing Alfresco is the flexibility to simply and securely discover correct data throughout all of the saved paperwork.
We’re excited to announce that you may now use the brand new Amazon Kendra Alfresco connector to go looking paperwork saved in your Alfresco repositories and websites. On this publish, we present use the brand new connector to retrieve paperwork saved in Alfresco for indexing functions and securely use the Amazon Kendra clever search operate. As well as, the ML-powered clever search can precisely discover data from unstructured paperwork with pure language narrative content material, for which key phrase search just isn’t very efficient.
What’s new within the Amazon Kendra Alfresco connector
The Amazon Kendra Alfresco connector presents help for the next:
Fundamental and OAuth2 authentication mechanisms for the Alfresco On-Premises (On-Prem) platform
Fundamental and OAuth2 authentication mechanisms for the Alfresco PaaS platform
Side-based crawling of Alfresco repository paperwork
Answer overview
With Amazon Kendra, you possibly can configure a number of information sources to offer a central place to go looking throughout your doc repositories and websites. The answer on this publish demonstrates the next:
Retrieval of paperwork and feedback from Alfresco personal websites and public websites
Retrieval of paperwork and feedback from Alfresco repositories utilizing Amazon Kendra-specific facets
Authentication towards Alfresco On-Prem and PaaS platforms utilizing Fundamental and OAuth2 mechanisms, respectively
The Amazon Kendra search functionality with entry management throughout websites and repositories
If you will use solely one of many platforms, you possibly can nonetheless observe this publish to construct the instance answer; simply ignore the steps similar to the platform that you’re not utilizing.
The next is a abstract of the steps to construct the instance answer:
Add paperwork to the three Alfresco websites and the repository folder. Be certain that the uploaded paperwork are distinctive throughout websites and repository folders.
For the 2 personal websites and repository, use document-level Alfresco permission administration to set entry permissions. For the general public web site, you don’t must arrange permissions on the doc degree. Word that permissions data is retrieved by the Amazon Kendra Alfresco connector and used for entry management by the Amazon Kendra search operate.
For the 2 personal websites and repository, create a brand new Amazon Kendra index (you employ the identical index for each the personal websites and the repository). For the general public web site, create a brand new Amazon Kendra index.
For the On-Prem personal web site, create an Amazon Kendra Alfresco information supply utilizing Fundamental authentication, inside the Amazon Kendra index for personal websites.
For the On-Prem repository paperwork with Amazon Kendra-specific facets, create an information supply utilizing Fundamental authentication, inside the Amazon Kendra index for personal websites.
For the PaaS personal web site, create an information supply utilizing Fundamental authentication, inside the Amazon Kendra index for personal websites.
For the PaaS public web site, create an information supply utilizing OAuth2 authentication, inside the Amazon Kendra index for public websites.
Carry out a sync for every information supply.
Run a take a look at question within the Amazon Kendra index meant for personal websites and the repository utilizing entry management.
Run a take a look at question within the Amazon Kendra index meant for public websites with out entry management.
Conditions
You want an AWS account with privileges to create AWS Identification and Entry Administration (IAM) roles and insurance policies. For extra data, see Overview of entry administration: Permissions and insurance policies. You’ll want to have a fundamental information of AWS and navigate the AWS Administration Console.
For the Alfresco On-Prem platform, full the next steps:
Create a non-public web site or use an current web site.
Create a repository folder or use an current repository folder.
Get the repository URL.
Get Fundamental authentication credentials (consumer ID and password).
Be certain that authentication are a part of the ALFRESCO_ADMINISTRATORS group.
Get the general public X509 certificates in .pem format and reserve it regionally.
For the Alfresco PaaS platform, full the next steps:
Create a non-public web site or use an current web site.
Create a public web site or use an current web site.
Get the repository URL.
Get Fundamental authentication credentials (consumer ID and password).
Get OAuth2 credentials (shopper ID, shopper secret, and token URL).
Verify that authentication customers are a part of the ALFRESCO_ADMINISTRATORS group.
Step 1: Add instance paperwork
Every uploaded doc will need to have 5 MB or much less in textual content. For extra data, see Amazon Kendra Service Quotas. You may add instance paperwork or use current paperwork inside every web site.
As proven within the following screenshot, we have now uploaded 4 paperwork to the Alfresco On-Prem personal web site.
We’ve uploaded three paperwork to the Alfresco PaaS personal web site.
We’ve uploaded 5 paperwork to the Alfresco PaaS public web site.
We’ve uploaded two paperwork to the Alfresco On-Prem repository.
Assign the side awskendra:indexControl to a number of paperwork within the repository folder.
Step 2: Configure Alfresco permissions
Use the Alfresco Permissions Administration function to present entry rights to instance customers for viewing uploaded paperwork. It’s assumed that you’ve got some instance Alfresco consumer names, with e mail addresses, that can be utilized for setting permissions on the doc degree in personal websites. These customers are usually not used for crawling the websites.
Within the following instance for the On-Prem personal web site, we have now supplied customers My Dev User1 and My Dev User2 with site-consumer entry to the instance doc. Repeat the identical process for the opposite uploaded paperwork.
Within the following instance for the PaaS personal web site, we have now supplied consumer Kendra Person 3 with site-consumer entry to the instance doc. Repeat the identical process for the opposite uploaded paperwork.
For the Alfresco repository paperwork, we have now supplied consumer My Dev user1 with shopper entry to the instance doc.
The next desk lists the positioning or repository names, doc names, and permissions.
Platform
Website or Repository Title
Doc Title
Person IDs
On-Prem
MyAlfrescoSite
ChannelMarketingBudget.xlsx
My Supervisor User3
On-Prem
MyAlfrescoSite
wellarchitected-sustainability-pillar.pdf
My Dev User1, My Dev User2
On-Prem
MyAlfrescoSite
WorkDocs.docx
My Dev User1, My Dev User2, My Supervisor User3
On-Prem
MyAlfrescoSite
WorldPopulation.csv
My Dev User1, My Dev User2, My Supervisor User3
PaaS
MyAlfrescoCloudSite2
DDoS_White_Paper.pdf
Kendra User3
PaaS
MyAlfrescoCloudSite2
wellarchitected-framework.pdf
Kendra User3
PaaS
MyAlfrescoCloudSite2
ML_Training.pptx
Kendra User1
PaaS
MyAlfrescoCloudPublicSite
batch_user.pdf
Everybody
PaaS
MyAlfrescoCloudPublicSite
Amazon Easy Storage Service – Person Information.pdf
Everybody
PaaS
MyAlfrescoCloudPublicSite
AWS Batch – Person Information.pdf
Everybody
PaaS
MyAlfrescoCloudPublicSite
Amazon Detective.docx
Everybody
PaaS
MyAlfrescoCloudPublicSite
Pricing.xlsx
Everybody
On-Prem
Repo: MyAlfrescoRepoFolder1
Polly-dg.pdf (side awskendra:indexControl)
My Dev User1
On-Prem
Repo: MyAlfrescoRepoFolder1
Transcribe-api.pdf (side awskendra:indexControl)
My Dev User1
Step 3: Arrange Amazon Kendra indexes
You may create a brand new Amazon Kendra index or use an current index for indexing paperwork hosted in Alfresco personal websites. To create a brand new index, full the next steps:
On the Amazon Kendra console, create an index known as Alfresco-Personal.
Create a brand new IAM function, then select Subsequent.
For Entry Management, select Sure.
For Token Kind¸ select JSON.
Preserve the consumer title and group as default.
Select None for consumer group enlargement as a result of we’re assuming no integration with AWS IAM Identification Heart (successor to AWS Single Signal-On).
Select Subsequent.
Select Developer Version for this instance answer.
Select Create to create a brand new index.
The next screenshot exhibits the Alfresco-Personal index after it has been created.
You may confirm the entry management configuration on the Person entry management tab.
Repeat these steps to create a second index known as Alfresco-Public.
Step 4: Create an information supply for the On-Prem personal web site
To create an information supply for the On-Prem personal web site, full the next steps:
On the Amazon Kendra console, navigate to the Alfresco-Personal index.
Select Knowledge sources within the navigation pane.
Select Add information supply.
Select Add connector for the Alfresco connector.
For Knowledge supply title, enter Alfresco-OnPrem-Personal.
Optionally, add an outline.
Preserve the remaining settings as default and select Subsequent.
To connect with the Alfresco On-Prem web site, the connector wants entry to the general public certificates similar to the On-Prem server. This was one of many conditions.
Use a distinct browser tab to add the .pem file to an Amazon Easy Storage Service (Amazon S3) bucket in your account.
You utilize this S3 bucket title within the subsequent steps.
Return to the info supply creation web page.
For Supply, choose Alfresco server.
For Alfresco repository URL, enter the repository URL (created as a prerequisite).
For Alfresco consumer software URL, enter the identical worth because the repository URL.
For SSL certificates location, select Browse S3 and select the S3 bucket the place you uploaded the .pem file.
For Authentication, choose Fundamental authentication.
For AWS Secrets and techniques Supervisor secret, select Create and add new secret.
A pop-up window opens to create an AWS Secrets and techniques Supervisor secret.
Enter a reputation to your secret, consumer title, and password, then select Save.
For Digital Personal Cloud (VPC), select No VPC.
Flip the id crawler on.
For IAM function, select Create a brand new IAM function.
Select Subsequent.
You may configure the info supply to synchronize contents from a number of Alfresco websites. For this publish, we sync to the on-prem personal web site.
For Content material to sync, choose Single Alfresco web site sync and select MyAlfrescoSite.
Choose Embrace feedback to retrieve feedback along with paperwork.
For Sync mode, choose Full sync.
For Frequency, select Run on demand (or a distinct frequency possibility as wanted).
Select Subsequent.
Map the Alfresco doc fields to the Amazon Kendra index fields (you possibly can preserve the defaults), then select Subsequent.
On the Evaluate and Create web page, confirm all the knowledge, then select Add information supply.
After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.
Step 5: Create an information supply for the On-Prem repository paperwork with Amazon Kendra-specific facets
Equally to the earlier steps, create an information supply for the On-Prem repository paperwork with Amazon Kendra-specific facets:
On the Amazon Kendra console, navigate to the Alfresco-Personal index.
Select Knowledge sources within the navigation pane.
Select Add information supply.
Select Add connector for the Alfresco connector.
For Knowledge supply title, enter Alfresco-OnPrem-Facets.
Optionally, add an outline.
Preserve the remaining settings as default and select Subsequent.
For Supply, choose Alfresco server.
For Alfresco repository URL, enter the repository URL (created as a prerequisite).
For Alfresco consumer software URL, enter the identical worth because the repository URL.
For SSL certificates location, select Browse S3 and select the S3 bucket the place you uploaded the .pem file.
For Authentication, choose Fundamental authentication.
For AWS Secrets and techniques Supervisor secret, select the key you created earlier.
For Digital Personal Cloud (VPC), select No VPC.
Flip the id crawler off.
For IAM function, select Create a brand new IAM function.
Select Subsequent.
For this scope, the connector retrieves solely these On-Prem server repository paperwork which were assigned a facet known as awskendra:indexControl.
For Content material to sync, choose Alfresco facets sync.
For Sync mode, choose Full sync.
For Frequency, select Run on demand (or a distinct frequency possibility as wanted).
Select Subsequent.
Map the Alfresco doc fields to the Amazon Kendra index fields (you possibly can preserve the defaults), then select Subsequent.
On the Evaluate and Create web page, confirm all the knowledge, then select Add information supply.
After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.
Step 6: Create an information supply for the PaaS personal web site
Comply with comparable steps because the earlier sections to create an information supply for the PaaS personal web site:
On the Amazon Kendra console, navigate to the Alfresco-Personal index.
Select Knowledge sources within the navigation pane.
Select Add information supply.
Select Add connector for the Alfresco connector.
For Knowledge supply title, enter Alfresco-Cloud-Personal.
Optionally, add an outline.
Preserve the remaining settings as default and select Subsequent.
For Supply, choose Alfresco cloud.
For Alfresco repository URL, enter the repository URL (created as a prerequisite).
For Alfresco consumer software URL, enter the identical worth because the repository URL.
For Authentication, choose Fundamental authentication.
For AWS Secrets and techniques Supervisor secret, select Create and add new secret.
Enter a reputation to your secret, consumer title, and password, then select Save.
For Digital Personal Cloud (VPC), select No VPC.
Flip the id crawler off.
For IAM function, select Create a brand new IAM function.
Select Subsequent.
We are able to configure the info supply to synchronize contents from a number of Alfresco websites. For this publish, we configure the info supply to sync from the PaaS personal web site MyAlfrescoCloudSite2.
For Content material to sync, choose Single Alfresco web site sync and select MyAlfrescoCloudSite2.
Choose Embrace feedback.
For Sync mode, choose Full sync.
For Frequency, select Run on demand (or a distinct frequency possibility as wanted).
Select Subsequent.
Map the Alfresco doc fields to the Amazon Kendra index fields (you possibly can preserve the defaults) and select Subsequent.
On the Evaluate and Create web page, confirm all the knowledge, then select Add information supply.
After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.
Step 7: Create an information supply for the PaaS public web site
We observe comparable steps as earlier than to create an information supply for the PaaS public web site:
On the Amazon Kendra console, navigate to the Alfresco-Public index.
Select Knowledge sources within the navigation pane.
Select Add information supply.
Select Add connector for the Alfresco connector.
For Knowledge supply title, enter Alfresco-Cloud-Public.
Optionally, add an outline.
Preserve the remaining settings as default and select Subsequent.
For Supply, choose Alfresco cloud.
For Alfresco repository URL, enter the repository URL (created as a prerequisite).
For Alfresco consumer software URL, enter the identical worth because the repository URL.
For Authentication, choose OAuth2.0 authentication.
For AWS Secrets and techniques Supervisor secret, select Create and add new secret.
Enter a reputation to your secret, shopper ID, shopper secret, and token URL, then select Save.
For Digital Personal Cloud (VPC), select No VPC.
Flip the id crawler off.
For IAM function, select Create a brand new IAM function.
Select Subsequent.
We configure this information supply to sync to the PaaS public web site MyAlfrescoCloudPublicSite.
For Content material to sync, choose Single Alfresco web site sync and select MyAlfrescoCloudPublicSite.
Optionally, choose Embrace feedback.
For Sync mode, choose Full sync.
For Frequency, select Run on demand (or a distinct frequency possibility as wanted).
Select Subsequent.
Map the Alfresco doc fields to the Amazon Kendra index fields (you possibly can preserve the defaults) and select Subsequent.
On the Evaluate and Create web page, confirm all the knowledge, then select Add information supply.
After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.
Step 8: Carry out a sync for every information supply
Navigate to every of the info sources and select Sync now. Full just one synchronization at a time.
Watch for synchronization to be full for all information sources. When every synchronization is full for an information supply, you see the standing as proven within the following screenshot.
You may as well view Amazon CloudWatch logs for a particular sync below Sync run historical past.
Step 9: Run a take a look at question within the personal index utilizing entry management
Now it’s time to check the answer. We first run a question within the personal index utilizing entry management:
On the Amazon Kendra console, navigate to the Alfresco-Personal index and select Search listed content material.
Enter a question within the search area.
As proven within the following screenshot, Amazon Kendra didn’t return any outcomes.
Select Apply token.
Enter the e-mail tackle similar to the My Dev User1 consumer and select Apply.
Word that Amazon Kendra entry management works primarily based on the e-mail tackle related to an Alfresco consumer title.
Run the search once more.
The search leads to a doc record (containing wellarchitected-sustainability-pillar.pdf within the following instance) primarily based on the entry management setup.
In case you run the identical question once more and supply an e mail tackle that doesn’t have entry to both of those paperwork, you shouldn’t see these paperwork within the outcomes record.
Enter one other question to go looking within the paperwork primarily based on the side awskendra:indexControl.
Select Apply token, enter the e-mail tackle similar to My Dev User1 consumer, and select Apply.
Rerun the question.
Step 10: Run a take a look at question within the public index with out entry management.
Equally, we will take a look at our answer by operating queries within the public index with out entry management:
On the Amazon Kendra console, navigate to the Alfresco-Public index and select Search listed content material.
Run a search question.
As a result of this instance Alfresco public web site has not been arrange with any entry management, we don’t use an entry token.
Clear up
To keep away from incurring future prices, clear up the sources you created as a part of this answer. Delete newly added Alfresco information sources inside the indexes. In case you created new Amazon Kendra indexes whereas testing this answer, delete them as effectively.
Conclusion
With the brand new Alfresco connector for Amazon Kendra, organizations can faucet into the repository of data saved of their account securely utilizing clever search powered by Amazon Kendra.
To find out about these potentialities and extra, check with the Amazon Kendra Developer Information. For extra data on how one can create, modify, or delete metadata and content material when ingesting your information from Alfresco, check with Enriching your paperwork throughout ingestion and Enrich your content material and metadata to reinforce your search expertise with customized doc enrichment in Amazon Kendra.
In regards to the Authors
Arun Anand is a Senior Options Architect at Amazon Net Providers primarily based in Houston space. He has 25+ years of expertise in designing and creating enterprise functions. He works with companions in Power & Utilities phase offering architectural and greatest follow suggestions for brand spanking new and current options.
Rajnish Shaw is a Senior Options Architect at Amazon Net Providers, with a background as a Product Developer and Architect. Rajnish is obsessed with serving to clients construct functions on the cloud. Outdoors of labor Rajnish enjoys spending time with household and buddies, and touring.
Yuanhua Wang is a software program engineer at AWS with greater than 15 years of expertise within the know-how trade. His pursuits are software program structure and construct instruments on cloud computing.