Five reasons engineers use the cloud for simulations
The challenge is as the complexity of the models increase, so do the computing demands required for meshing, assembly and solving for these increases. The system requirements can quickly scale above those available in most desktop configurations – even those purpose built for running simulations. The increased complexity will cause simulations to a crawl wasting your organization’s most valuable resource – time.
The cloud has become a power tool in an engineer’s arsenal. The ability to spin up extra computing means you deploy a power house on demand to plow through even the most complex simulations. The ability to create configurations on-the-fly means engineers can respond to clients with urgent delivery timelines.
In this article, we’re going to explore the specific benefits of why you should investigate cloud solutions in your organization and we’ll list the five reasons engineers are using cloud computing. We’ll also take time to investigate how you can secure your use.
We’re going to explore single machine based high performance computing in the cloud. A future article will explore high-performance computing realized from cluster computing.
#1 – On demand studies
On demand studies means you spin-up cloud computing capacity precisely when you need it. The concept is simple; your organization creates a pre-built cloud machine. You pre-equip it with simulation software and the plugins required to run studies. After building, your organization leaves the cloud machine turned off. In this hibernated state, the costs are just the cost of disk storage, typically billed at pennies per GB (gigabyte).
Anytime an engineer requires a study, he spins-up the cloud machine, uploads his study materials and initiates the study. After the solution, he transfers the results back to the organization and spins-down the cloud machine. The organization is charged for the time the cloud machine was running at the pre-approved hourly rate.
The beauty of this solution is the simple way costs can be tracked back to the client’s job. The time billed is directly tied to the client’s deliverable which satisfies the audit trail for those tasks that include client auditable activity. This is particularly important when serving government clients.
#2 – Scope equipment before purchase
Building your own High Performance Computing infrastructure is a complex proposition. The lengthy period of cost recovery tends to cause systems people to over-estimate the requirements to avoid coming up short on the final implementation. The result of over-estimation means over-spending. Even worse, if the staff fails to make the right choices, they could over-estimate on some components, and under-estimate on components that can lead to materially poor performance.
The software simulation manufacturers are remarkably unwilling to make specific recommendations. The reason is understandable – your organization is performing simulations that may be unique to your business. The software manufacturers will usually provide general advice or advice that requires computing expert’s skills to review and make their recommendations.
The cloud is a great laboratory for experimenting with different variables without committing to acquisition costs. Engineers can upload historical simulations and re-perform the study multiple times, each time varying configurations. This can narrow down the hardware configuration that best match the types of studies your organization performs, and to only then commit to acquisition.
It should be stressed that this requires a disciplined approach to experimentation instead of reliance on assumptions. Engineers are likely to be surprised by what they learn from this activity. For example, increasing the number of cores past a certain point is likely to have no benefit but will increase costs needlessly. While fundamentally simulations are driven by solving for systems of linear equations which will benefit from the parallelism offered by multiple cores, there are several important exceptions such as time-driven models or models that require continuation methods. Experimentation with real-world studies is your organization’s best guide.
#3 – Simultaneous studies
Regardless of how capable your organization’s infrastructure, invariably there will come a time when you have a simulation backlog. Anytime there’s more demand for computing resources than available, the organization will be confronted with scheduling studies. That means confronting how the organization responds to urgent or time-sensitive client studies in the presence of a backlog.
The cloud is good solution for addressing exactly this type of simultaneous need. During periods of normal operations, the organization uses the internal infrastructure, which is already factored into the operation’s costs. Then, in the event of a time-sensitive need, to spin-up temporary cloud capacity to deal with the priority task.
A related benefit is to deploy to the cloud those studies with system requirements that significantly exceed the typical study. For example, say the organization typically performed analysis with 20,000 degrees of freedom (DOF), which are comfortably served with in-house infrastructure. If a client requested a study with double or triple the DOF it could easily exceed the RAM of the engineer’s systems. The resulting spill over into virtual memory could translate to runtimes 200 times longer than normal and cause a backlog.
Rather than performing the study using the in-house infrastructure, it would be better to use the pre-created cloud environment. Again, the organization would only be charged for the time of the study and would have an invoice that could be directly tracked to the client’s requested task.
#3 – Disaster recovery
High performance infrastructure is typically the most expensive computing equipment to acquire. This usually leads to limited spares or standby equipment. Even with redundancy, there are still component failures that can lead to catastrophic downtime. Anytime this infrastructure is down for maintenance or repair is time it’s not performing studies. Having an alternate environment to use during periods of unavailability can be crucial to the organization delivering on time and on budget.
Having a pre-configured cloud environment ready to take over during maintenance or repair windows is a good fit for the cloud. The ability to keep churning through studies presents a professional, business-as-usual face to clients, which instills confidence in your organizations’ ability to deliver. It can even eliminate more of those “risk of association” issues you respond to during RFP.
#3 – Zero disruption upgrades
Most software manufactures don’t like to admit it – but software upgrades are costly and risky. Anytime you upgrade a system, you risk the possibility it will introduce breaking changes into how it operates. Due to the unique nature of high performance infrastructure, breaking changes there could present the organization with unplanned downtime that could span days, or worse even weeks if your break is the first instance witnessed by the software manufacturer.
Using the cloud for upgrades provides a seamless test environment to your operation. The idea is simple, you install the new version of simulation software in your cloud environment and run through a sample set of studies. Only after you’ve successfully processed your sample set, do you perform the upgrade on your local environment.
This approach also introduces the ability for your organization to avoid ‘performance degradation’ regression testing. While not as severe as complete disruption, if the new version of software runs materially slower than the previous generation – your organization can postpone the upgrade until the manufacturer has the opportunity to investigated the issue. This can save you untold headaches trying to perform your primary mission while at the same time keeping up with the latest technology. Two goals which are often at odds with each other.
Security of the cloud
In this article, we’ve addressed five good reasons engineers use the cloud for their simulation needs, but if senior management can’t be convinced the cloud does not present a risk to loss of intellectual property or the risk of cyber “leakage”, then the cloud remains out of reach. The common reason we hear for slow cloud adoption is the concern over securing the transfer of intellectual property between the organization and the cloud.
We’re going to explore three simple solutions to securing engineering simulations in the cloud. The first step to securing the cloud environment is to secure the access. Access refers to the path that files, screen output and keystrokes flows between your organization’s network and the cloud datacenter environment. The simplest technique to secure this access is to encrypt all traffic over a VPN (virtual private network). Further, this encryption should be separate and distinct from those facilities provided by the cloud provider.
What this means is to deploy your own organization’s VPN infrastructure and not rely on VPN technology or HTTPS encryption provided by the cloud provider. The reason is simple – to ensure that all communication that passes between your network and the cloud is concealed from unauthorized viewing. Further, by not using the facilities of the cloud provider, you eliminate the cloud provider (even for benign reasons) from being able to introspect the access. This creates two secure end-points that can reliably used to secure all communications.
The second step is to encrypt all on-disk storage of files, models and solutions. Again, the on-disk encryption should be entirely separate from that which is provided by the cloud provider. The idea is all storage of materials at rest in the cloud environment be completely secured from unauthorized access. In this case, even with direct access to your computing environment, your cloud provider’s personnel would have no access to the material stored within their storage environment.
Engineering simulation
The third and final step is to solicit a guarantee of privacy from your cloud provider. This guarantee should stipulate what the provider’s access policies are and ensure they guarantee to limit access to your organization’s information resources. This step is probably the most important – these guarantees should form the basis of any agreement with your provider prior to considering their environment.
If they are unable to provide such guarantees or refer you to their SLA and mention service credits for downtime, you should search for a better hosting partner. Without confidentiality guarantees that include material contract provisions most organizations will refuse to consider cloud hosting for anything other than minor business functions.