Development of a prototype of the server structure of an ERP system for 1500 users
The initial task was to implement a functional IT structure prototype, which would allow to calculate the requirements for a comprehensive system. The prototype had to take the following factors into account:
- Fault tolerance;
- Disaster tolerance;
- Satisfactory performance;
- Separation of the development, test and production environments, filtration of traffic between them;
- Resources for the next 2 years.
Architecture design and test rig were implemented as follows:
- A decision was made to divide the system into two cores (primary and backup) to comply with the disaster tolerance requirements. Each one must be located in the data center of the corresponding class (see table 1), at a distance of at least 50 km from each other, and connected by a direct optical fiber with a bandwidth of at least 2 Gb/s.
Table 1. Data center requirements
|Data center requirements||Tier2 backup data center||Tier3 primary data center|
|In-process data center maintenance||No||Yes|
|Annual downtime (hours)||28.8||1.6|
|Likelihood of shutting down within 5 years||37.17||25.91|
|Duplication of communication channels and client’s IT equipment||No||Yes|
|Server rooms separated from other rooms by 1-hour fire rated walls||No||Yes|
- An IPsec-class network tunnel with AES 256-bit encryption was built between the primary and backup data centers.
- Replication of the server structure from the primary to the backup hosting was carried out regularly online using Hyper-V Replica synchronization technology.
- Routing, limiting network access and providing network access to the ERP service in each data center was implemented by two Cisco hardware routers, combined into an active-passive cluster with minimum requirements presented in Table 2.
Table 2. Router requirements
|Number of devices||At least 2|
|Integrated 10GE Interfaces||At least 2|
|Integrated 1GE Interfaces||At least 24|
|Throughput||At least 40 Gb/s|
|Performance||At least 55 million data packets|
|Protocol support||eBGP, iBGP, OSPF|
|Reservation||Support of automated service reservation functions according to primary/backup model|
- Each data center’s server structure was divided into three logical areas - development, test and production environments (see Diagram 1).
Diagram 1. General architecture
- Each environment is located in a separate vlan (logical "virtual" local computer network).
- The physical core of these structures comprised:
- Hitachi data storage;
- Two Twin-class hardware server platforms – virtual machine media;
- Two hardware ERP and DBMS servers;
- SAN network based on Fiber Channel 16 Gbit/s.
- The structure of the development environment comprised a terminal access server and an ERP + DBMS server.
- The test environment was an approximate copy of the production environment and fully replicated its architecture, but with fewer resources.
- The structure of the productive environment comprised (see Diagram 2):
- A cluster of multiple ERP servers;
- A cluster of multiple web servers;
- Primary and backup MS SQL database management servers.
Diagram 2. Production environment architecture
- The ERP cluster was of the active-active format (ERP client requests were evenly distributed across all ERP servers).
- Primary and backup structures’ SQL database management servers were combined into a Microsoft SQL AlwaysOn cluster with regular online database replication.
- Interaction of the ERP server cluster with database management server was conducted via TCP/IP network access through a special Microsoft SQL Listener service. This architecture ensures automatic switching of queries to the backup DBMS structure in case of problems in the main DBMS structure.
- Web service cluster combined the IIS service in the primary and backup data centers. Requests to the web service are sent through TCP/IP network access through a special Microsoft web cluster service. This architecture ensures automatic switching of requests to the backup web-service in case of problems with the main server.
- A more detailed description of fault tolerance schemes is available in Table 3:
Table 3. Fault-tolerance schemes
|Issue||Description of clustering technology|
|Router fault tolerance||Fault tolerance is achieved using a system that comprises primary and backup hardware gateways|
|ERP server fault tolerance||Fault tolerance is achieved by combining all ERP servers into a single cluster with automatic failover and saving of current sessions.|
|SQL fault tolerance||Implemented using AlwaysOn Microsoft SQL Server technology, in which the primary and backup DBMS instances are located on different physical media and work simultaneously. Data synchronization occurs in real time. If the main SQL server fails, the queries are automatically transferred to the backup server due to the role of the availability group listener|
The second stage involved the preparation of load tests together with programmers, the launch of iterative testing and calculation of required capacities for 1,500 ERP users:
- A performance testing script matrix was created based on the existing process registries and ERP database layout test cases.
- Quantitative and target operation indicators were averaged based on data from real ERP customer bases and expert estimates. The number of input documents of each type per day, the number of users and the testing time were provided at script input. The system itself calculated how many documents and reports should be created and processed in a specified period of time and at what interval.
- Tests were conducted within several user groups. The group consisted of 4 users, each of which had its own role and a list of sequential operations that each user performs:
- the number of user groups was calculated using the formula N = K / 4, where K is the number of users (based on the assumption that there are 4 users in one group);
- the number of chains per group was calculated based on the fact that one group per day makes 24 chains in 8 working hours. Thus, in 1 hour, one group makes 24/8 = 3 chains at 20-minute intervals;
- the number of items in the array from which the item will be randomly selected into the receiving document was calculated using the formula: P = (K / 4) * 10 * 93%, where K is the number of users.
- At each testing stage, the following Apdex indicators were determined for the system load per user:
- Disk accesses per second;
- RAM consumption;
- Processor power consumption.
- Equipment parameter requirements for 150 ERP users were calculated based on these indicators.
- Subsequently, the same tests were conducted for 150 users based on these calculations. Theoretical estimates were verified with actual calculations; error coefficient was derived.
- Subsequently, iterations of Apdex tests were repeated for 100, 200 and 500 users. Theoretical estimates were compared with actual calculations for each iteration, error coefficient was derived.
- As a result, theoretical calculations were performed for 1,500 ERP users by direct extrapolation of measurements obtained on smaller volumes.