For one of the projects I'm currently involved in, we want to have a better view on sustainability within IT and see what we (IT) can contribute in light of the sustainability strategy of the company. For IT infrastructure, one would think that selecting more power-efficient infrastructure is the way to go, as well as selecting products whose manufacturing process takes special attention to sustainability.
There are other areas to consider as well, though. Reusability of IT infrastructure and optimal resource consumption are at least two other attention points that deserve plenty of attention. But let's start at the manufacturing process...
Certifications for products and companies
Eco certifications are a good start in the selection process. By selecting products with the right certification, companies can initiate their sustainable IT strategy with a good start. Such certifications look at the product and manufacturing, and see if they use proper materials, create products that can have extended lifetimes in the circular (reuse) economy, ensure the manufacturing processes use renewable energy and do not have harmful emissions, safeguard clean water, etc.
In the preliminary phase I am right now, I do not know yet which certifications make most sense to pursue and request. Sustainability is becoming big business, so plenty of certifications exist as well. From a cursory search, I'd reckon that the following certifications are worth more time:
- EcoVadis provides business sustainability ratings that not only cover the ecological aspect, but also social and ethical performance.
- ISO 14001 covers environmental management, looking at organizations' processes and systematic improvements contributing to sustainability.
- Carbon Neutral focus on transparency in measurements and disclosure of emissions, and how the company is progressing in their strategy to reduce the impact on the environment.
- TCO Certified attempts to address all stages of a manufacturing process, from material selection over social responsibility and hazardous substances up to electronic waste and circular economy.
- Energy Star focuses on energy efficiency, and tries to use standardized methods for scoring appliances (including computers and servers).
A second obvious part is on power efficiency. Especially in data center environments, which is the area that I'm interested in, power efficiency also influences the data center's capability of providing sufficient cooling to the servers and appliances. Roughly speaking, a 500 Watt server generates twice as much heat as a 250 Watt server. Now, that's oversimplifying, but for calculating heat dissipation in a data center, the maximum power of infrastructure is generally used for the calculations.
Now, we could start looking for servers with lower power consumption. But a 250 Watt server is most likely going to be less powerful (computing-wise) than a 500 Watt server. Hence, power efficiency should be considered in line with the purpose of the server, and thus also the workloads that it would have to process.
We can use benchmarks, like SPEC's CPU 2017 or SPEC's Cloud IaaS 2018 benchmarks, to compare the performance of systems. Knowing the server's performance for given workloads and the power consumption, allows architects to optimize the infrastructure.
Heat management (and more) in the data center
A large consumer of power in a data center environment are the environmental controls, with the cooling systems taking a big chunk out of the total power consumption. Optimizing the heat management in the data center has a significant impact on the power consumption. Such optimizations are not solely about reducing the electricity bill, but also about reusing the latent heat for other purposes. For instance, data center heat can be used to heat up nearby buildings.
A working group of the European Commission, the European Energy Efficiency Platform (E3P), publishes an annual set of best practices in the EU Code of Conduct on Data Center Energy Efficiency which covers areas such as airflow design patterns, operating temperature and humidity ranges, power management features in servers and appliances, infrastructure design aspects (like virtualization and appropriate, but no over-engineered redundancy), etc.
This practice goes much beyond the heat management alone (and is worth a complete read), covering the complete data center offering. Combining these practices with other areas of data center design (such as redundancy levels, covered by data center tiering) allows for companies that are looking at new data centers to overhaul their infrastructure and be much better prepared for sustainable IT.
Another part that often comes up in sustainability measures is how reusable the infrastructure components are after their "first life". Infrastructure systems, which frequently renew after 4 to 5 years of activity, can be resold rather than destroyed. The same can be said for individual components.
Companies that deal with sensitive data regularly employ "Do Not Return" clauses in the purchases of storage devices. Disks are not returned if they are faulty, or just swapped for higher density disks. Instead, they are routinely destroyed to make sure no data leakage occurs.
Instead of destroying otherwise perfect disks (or disks that still have reusable components) companies could either opt for degaussing (which still renders the disk unusable, but has better recyclability than destroyed disks) or data wiping (generally through certified methods that guarantee the data cannot be retrieved).
Systems are often working perfectly beyond their 4 to 5 year lifespans. Still, these systems are process-wise automatically renewed to get more efficient and powerful systems in place. But that might not always be necessary - beyond even the circular ecosystem remarks above (where such systems could be resold), these systems can even get extended lifecycle within the company.
If there is no need for a more powerful system, and the efficiency of the system is still high (or the efficiency can be improved through minor updates), companies can seek out ways to prolong the use of the systems. In previous projects, I advised that big data nodes can perfectly remain inside the cluster after their regular lifetime, as the platform software (Hadoop) can easily cope with failures if those would occur.
Systems can also be used to host non-production environments or support lab environments. Or they can be refurbished to ensure maximal efficiency while still being used in production. Microsoft for instance has a program called Microsoft Circular Centers which aims at a zero-waste sustainability within the data center, through reuse, repurpose and recycling.
Right-sizing the infrastructure
Right-sizing is to select and design infrastructure to deal with the workload, but not more. Having a set of systems at full capacity is better than having twice as many systems at half capacity, as this leads to power inefficiencies.
To accomplish right-sizing isn't as easy as selecting the right server for a particular workload. Workload is distributed, and systems are virtualized. Virtualization allows for much better right-sizing as you can distribute workload more optimally.
Companies with large amounts of systems can more efficiently distribute workload across their systems, making it easier to have a good consumption pattern. Smaller companies will notice that they need to design for the burst and maximum usage, whereas the average usage is far, far below that threshold.
Using cloud resources can help to deal with bursts and higher demand, while still having resources on-premise to deal with the regular workload. Such hybrid designs, however, can be complex, so make sure to address this with the right profiles (yes, I'm making a stand for architects here ;-)
Standardizing your infrastructure also makes this easier to accomplish. If the vast majority of servers are of the same architecture, and you standardize on as few operating systems, programming languages and what not, you can more easily distribute workload than when these systems have different architectures and purposes.
Automated workload and power management
Large environments will regularly have servers and infrastructure that is not continuously used at near full capacity. Workloads are frequently following a certain curve, such as higher demand during the day and lower at night. Larger platforms use this curve to schedule appropriate workload (like running heavy batch workload at night while keeping the systems available for operational workload during the day) so that the resources are more optimally used.
By addressing workload management and aligning power management, companies can improve their power usage by reducing active systems when there are less resource needs. This can be done gradually, such as putting CPUs in lower power modes (CPU power takes roughly 30% of a system's total power usage), but can expand to complete hosts being put in idle state.
We can even make designs where servers are shut down when unused. While this is frequently frowned upon, citing possible impact on hardware failures as well as reduced reactivity to sudden workload demand, proper shutdown techniques do offer significant power savings (as per a research article titled Quantifying the Impact of Shutdown Techniques for Energy-Efficient Data Centers).
Sustainability within IT focuses on several improvements and requirements. Certification helps in finding and addressing these, but this is not critical in any company's strategy. Companies can address sustainability easily without certification, but with proper attention and design.
The IT world is littered with frameworks, best practices, reference architectures and more. In an ever-lasting attempt to standardize IT, we often get lost in too many standards or specifications. For consultants, this is a gold-mine, as they jump in to support companies - for a fee, naturally - in adopting one or more of these frameworks or specifications.
While having references and specifications isn't a bad thing, there are always pros and cons.
At work, as with many other companies, we're actively investing in new platforms, including container platforms and public cloud. We use Kubernetes based container platforms both on-premise and in the cloud, but are also very adamant that the container platforms should only be used for application workload that is correctly designed for cloud-native deployments: we do not want to see vendors packaging full operating systems in a container and then shouting they are now container-ready.
One of the main IT processes that a company should strive to have in place is a decent IT asset management system. It facilitates knowing what assets you own, where they are, who the owner is, and provides a foundation for numerous other IT processes.
However, when asking "what is an IT asset", it gets kind off fuzzy...
This time a much shorter post, as I've been asked to share this information recently and found that it, by itself, is already useful enough to publish. It is a conceptual data model for IT services.
In a perfect world, using infrastructure or technology services would be seamless, without impact, without risks. It would auto-update, tailor to the user needs, detect when new features are necessary, adapt, etc. But while this is undoubtedly what vendors are saying their product delivers, the truth is way, waaaay different.
Managing infrastructure services implies that the company or organization needs to organize itself to deal with all aspects of supporting a service. What are these aspects? Well, let's go through those that are top-of-mind for me...
No, not Diphtheria, Tetanus, and Pertussis (vaccine), but Development, Test, Acceptance, and Production (DTAP): different environments that, together with a well-working release management process, provide a way to get higher quality and reduced risks in production. DTAP is an important cornerstone for a larger infrastructure architecture as it provides environments that are tailored to the needs of many stakeholders.
Nowadays it is impossible to ignore, or even prevent open source from being active within the enterprise world. Even if a company only wants to use commercially backed solutions, many - if not most - of these are built with, and are using open source software.
However, open source is more than just a code sourcing possibility. By having a good statement within the company on how it wants to deal with open source, what it wants to support, etc. engineers and developers can have a better understanding of what they can do to support their business further.
In many cases, companies will draft up an open source policy, and in this post I want to share some practices I've learned on how to draft such a policy.
I am not an advocate for hybrid cloud architectures. Or at least, not the definition for hybrid cloud that assumes one (cloud or on premise) environment is just an extension of another (cloud or on premise) environment. While such architectures seem to be simple and fruitful - you can easily add some capacity in the other environment to handle burst load - they are a complex beast to tame.
Transparent encryption is relatively easy to implement, but without understanding what it actually means or why you are implementing it, you will probably make the assumption that this will prevent the data from being accessed by unauthorized users. Nothing can be further from the truth.