Our infrastructure comprises of high availability clusters of different machines, with varied operating systems and applications, spread across multiple continents. Our Datacenter is a distributed across the globe and is in fact a network of redundant server infrastructure which hosts all our Products and Services.
In today’s complex scenario we manage a large number of servers with equally large number of services running on each of them. Manually checking each service on just one server 24 x 7 is near impossible. Service providers who do not have a good monitoring system pass on more downtime to its customers and run high risk of potential damage caused due to service disruptions. An undetected minor issue can change into a major issue rapidly, increasing the amount of damage caused. We know that an effective monitoring system is extremely crucial and necessary for maintaining maximum availability of our servers.
Palcom’s monitoring systems and tools provide our system administrators with an all-encompassing view into the health of our globally distributed infrastructure. We monitor a large number of parameters related to the hardware and network status of our servers and individual services that reside on them.
Services monitored include -
Network Connectivity
Server Disk Space
Server CPU Usage
Server Memory Usage
Web Services - HTTP, HTTPS & FTP, SSH etc.
Email Services - SMTP, POP & IMAP
Database Services - MySQL, MSSQL
Reactive human response to reboot and restart failed services
DNS Services
All Log Files….and a lot more
In the event any server or service in any server fails or any resource utilization exceeds specified limits, then an automatic alert pops-up immediately on the monitoring screens. This prompts our system administrators to actively set the things right then and there and update the customers at the same time.
The auto notification system also handles escalation of issues. When any issue is not resolved or it takes more than its usual time for resolution, alerts are sent out to higher level system administrators and subsequently to Management for appropriate action on the ground.
Our robust monitoring system allows us to detect and resolve any issue promptly without causing any interruption in service availability for our customers.