A network-aware distributed storage cache for data intensive environments
read more
Citations
Data management and transfer in high-performance computational grid environments
Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing
End-to-end quality of service for high-end applications
Method and system for providing dynamic hosted service management across disparate accounts/sites
System for balance distribution of requests across multiple servers using dynamic metrics
References
Congestion avoidance and control
The network weather service: a distributed resource performance forecasting service for metacomputing
A resource management architecture for metacomputing systems.
A Resource Management Architecture for Metacomputing Systems
The SDSC storage resource broker
Related Papers (5)
Frequently Asked Questions (16)
Q2. What future works have the authors mentioned in the paper "A network-aware distributed storage cache for data intensive environments1" ?
The authors plan to test the DPSS in a larger testbed, with an OC-12 wide area link and more clients. The authors also plan to experiment with other network monitoring methods, such as passive methods, for collecting network throughput information. This ability to predict future performance would be extremely valuable for this system, and the authors plan to try to incorporate NWS into their JAMM system. This will allow us to evaluate the minimum cost master under heavier load, where the authors expect load balancing to have a greater impact.
Q3. What is the role of a high-speed cache?
This cache can be used as a large buffer, able to absorb data from a high rate data source and then to forward it to aslower tertiary storage system.
Q4. Why does the DPSS server scale linearly with the number of disks?
Because of the threaded nature of the DPSS server, a server scales linearly with the number of disks, up to the network limit of the host (possibly limited by the network card or the CPU).
Q5. How many MBps of data is delivered to the minimum cost master?
In the testbed configuration, the minimum cost master sustained a peak throughput of 128 Mbps to three clients using a fully replicated data set; without replication, the peak throughput was only 59 MBps.
Q6. What is the importance of a buffer space?
For TCP to perform well over high-speeds networks, it is critical that there be enough buffer space for the congestion control algorithms to work correctly [12].
Q7. How many disks can be used in a DPSS?
A four-server DPSS with a capacity of one Terabyte (costing about $80K in mid-1999) can thus produce throughputs of over 50 MBytes/sec by providing parallel access to 20-30 disks.
Q8. How many Mbps is the maximum bandwidth between Server A and Client B?
The sustained bandwidth from Server A to Client A is 112 Mbps, but only 80 Mbps to Server B.A network configuration problem limited the bandwidth between Server C and Client B to 11 Mbps, although Server C achieved 107 Mbps to Client A and Client B to Server A achieved 56 Mbps.
Q9. What is the main reason why the authors are using this type of data cache?
The authors believe that this type of network-aware data cache will be an important architectural component to building effective data intensive computational grids.
Q10. What is the minimum cost flow approach to load balancing?
The minimum cost flow approach to load balancing increases the total throughput of the system by adapting to varying client demand.
Q11. What is the total latency of a client's request to its receipt of the first?
The total latency from a client's request to its receipt of the first tile from a server is affected by three different network paths: the paths from client to master, master to server, and server to client.
Q12. What other methods do you plan to use for network monitoring?
The authors also plan to experiment with other network monitoring methods, such as passive methods, for collecting network throughput information.
Q13. What is the importance of a large buffer size for the WAN environment?
It is also apparent that while setting the buffer size big enough is particularly important for the WAN case, it is also important not to set it too big for the LAN environment.
Q14. How many blocks are distributed to a server?
If the authors assume blocks are uniformly distributed to servers, then it is unlikely that any one server will store more than m/n percent of the blocks requested.
Q15. How many times did the data be replicated?
Since the authors expect this type of storage cache to be mainly used with very large data sets, in practice the data will likely only be replicated at most twice.
Q16. How many times does the bandwidth availability appear to be stable?
The measured bandwidth availability might appear to be stable based on measurements every 10 minutes, but might actually be very bursty; this burstiness might only be noticed if measurements are made every few seconds.