In a previous article I explained how to cluster two Magnolia public instances.
Today I will explain another scenario. At the end, we will have 4 public instances where one single workspace (forum) is clustered and the all others workspaces will be also clustered but only between 2 public instances.
It’s not clear? Well, then have a look at the diagram below.
Image may be NSFW.
Clik here to view.
What do you need for this tutorial
- Good Magnolia knowledge
- A magnolia bundle (I use 4.5.10 EE but it works with any new version of Magnolia )
- MySQL installed
- And please read the previous article about clustering
Step by Step Clustering
Preparation of the environment
Unzip Magnolia bundle and create 3 other public instances.
Adapt the configuration of all public instances and make sure that they will be installed as public instances.
Create 3 databases:
- mgnl_p1_p2
- mgnl_p3_p4
- mgnl_all_forum
And finally create somewhere on your file system 3 folders
- shared_data1
- shared_data2
- shared_forum
Update configuration of the public instances
First in magnolia.properties (of all publics), update the following line
magnolia.repositories.jackrabbit.config=WEB-INF/config/repo-conf/jackrabbit-bundle-mysql-search.xml
In the same file, add a new line
magnolia.repositories.jackrabbit.cluster-forum.config=WEB-INF/config/repo-conf/jackrabbit-bundle-mysql-search-forum.xml
Before going further with these 2 files we have to create a new repository for the clustered forum workspace.
Open the file WEB-INF/config/default/repository.xml and add at the end, after the magnolia default repository, these lines:
<Repository name="magnoliacluster-forum" provider="info.magnolia.jackrabbit.ProviderImpl" loadOnStartup="true"> <param name="configFile" value="${magnolia.repositories.jackrabbit.cluster-forum.config}" /> <param name="repositoryHome" value="${magnolia.repositories.home}/magnoliacluster-forum" /> <param name="contextFactoryClass" value="org.apache.jackrabbit.core.jndi.provider.DummyInitialContextFactory" /> <param name="providerURL" value="localhost" /> <param name="bindName" value="cluster-forum-${magnolia.webapp}" /> <workspace name="forum" /> </Repository>
And in the RepositoryMapping node add a new line about the forum
<RepositoryMapping> <Map name="website" repositoryName="magnolia" workspaceName="website" /> <Map name="config" repositoryName="magnolia" workspaceName="config" /> <Map name="users" repositoryName="magnolia" workspaceName="users" /> <Map name="userroles" repositoryName="magnolia" workspaceName="userroles" /> <Map name="usergroups" repositoryName="magnolia" workspaceName="usergroups" /> <Map name="mgnlSystem" repositoryName="magnolia" workspaceName="mgnlSystem" /> <!-- System internal data --> <Map name="mgnlVersion" repositoryName="magnolia" workspaceName="mgnlVersion" /> <!-- magnolia version workspace --> <Map name="forum" repositoryName="magnoliacluster-forum" workspaceName="forum" /> </RepositoryMapping>
Now, let’s change the 2 files we defined in the magnolia.properties.
First, jackrabbit-bundle-mysql-search.xml. This file will hold the configuration for the clustering of all the workspaces except the forum workspace.
It is similar to what we did in the previous tutorial, from top to bottom, first add a Cluster node.
Important is that cluster id must be unique per instance.
As Public 1 and public 2 persist to their data under the same database, they share the same url configuration:
<param name="url" value="jdbc:mysql://localhost:3306/mgnl_p1_p2" />
For public 3 and public 4 it must be:
<param name="url" value="jdbc:mysql://localhost:3306/mgnl_p3_p4" />
Datasource follows the same logic.
<DataSources> <DataSource name="magnolia"> <param name="driver" value="com.mysql.jdbc.Driver" /> <param name="url" value="jdbc:mysql://localhost:3306/mgnl_p1_p2" /> <param name="user" value="root" /> <param name="password" value="" /> <param name="databaseType" value="mysql"/> <param name="validationQuery" value="select 1"/> </DataSource> </DataSources>
This is for public 1 and public 2, I let you guess the configuration for public 3 and public 4.
And finally, Filesystem and Datastore must use the shared folder.
Below the configuration for public 1 and public 2:
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="/Volumes/Macintosh/DEMO/clustered-environment_part2/shared_data/repository" /> </FileSystem>
Use the other shared folder for public 3 and public 4:
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="/Volumes/Macintosh/DEMO/clustered-environment_part2/shared_data2/repository" /> </FileSystem>
Follow the same logic for the configuration of the Datastore node.
Now, let’s create the file jackrabbit-bundle-mysql-search-forum.xml.
If you still follow everything, this is the configuration of the new repository we defined before, you remember, “magnoliacluster-forum”.
Start by making a copy of the file jackrabbit-bundle-mysql-search.xml.
Then what you have to update is pretty simple:
Database URL must point to localhost:3306/mgnl_allpublic_forum, do it for the Cluster node and Datasource node.
<param name="url" value="jdbc:mysql://localhost:3306/mgnl_allpublic_forum" />
The shared folder is now using /shared_forum folder created before. It is used in Filesystem and Datastore.
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="/Volumes/Macintosh/DEMO/clustered-environment_part2/shared_forum/repository" /> </FileSystem>
Create a file jackrabbit-bundle-mysql-search.xml for the 4 public instances.
Everything is configured the same, only add a unique cluster id per instance.
Voila, you are done. Your clustered environment is setup.
Start and configure the subscribers
First you should only start the author instance, public 1 and public 3.
But you will have an issue during the installation of public 3, data already exists in the forum workspace and public 3 tries to override it.
The workaround is that before starting public 3, you delete all the tables in mgnl_allpublic_forum.
Public 3 will re-install all the configuration.
When you start to cluster your architecture you must be really careful about concurrent write behavior.
Once public 1 and public 3 are correctly installed you can start the two other public instances.
Configure only 2 subscribers, one for public 1 and one for public 3
Image may be NSFW.
Clik here to view.
Test your configuration by creating a new page and publishing it. The page must be available on all public instances.
Check that the forum workspace is correctly clustered, go to the page /demo-project/about/history.html (from a public instance) and add a comment. It will be available on all the other instances.
Further Improvements
Few comments about possible improvements (whispered by my dear colleague Jan).
Cluster forum with Author instance
In the current example, Forum post are not available from the Author instance.
To make the moderation of the forum easier, we could also cluster its workspace with the author instance. In this case a moderator won’t have to log into a public instance to do his job.
Scheduled tasks
Here it’s not an improvements but more a warning.
It’s about concurrent write behavior. If you run any scheduled tasks on such a cluster, you need to set cluster id for the job to make sure it is executed only on one cluster node otherwise if such a job writes to repositories it might deadlock it.
Persistent Storage
By following the same configuration done for the clustering, you will understand how to persist your workspaces on different kind of persistent storages.
We could imagine that workspaces that have a lot of read / write access will be stored on a specific database while workspaces with only read access will persist on a file system.
Conclusion
Finally it’s easy to setup a Magnolia cluster !!