scispace - formally typeset
Search or ask a question

Answers from top 10 papers

More filters
Papers (10)Insight
Experimental evaluation has shown that by using the proposed techniques, ETL flows can be better optimized and with much less effort compared to existing methods.
Proceedings ArticleDOI
Bruno Oliveira, Orlando Belo 
28 Oct 2013
14 Citations
Additionally, we believe that it's necessary a more simply and reliable approach for ETL conceptual modelling covering the "lack of mature" of this important part of ETL development.
With the results generated through such evaluations, companies will be able to make informed decisions and choose the best ETL tool for their purposes.
Our experimental findings indicate that ETL Committee significantly outperforms single ETL models, achieving state-of-the-art competitive results.
Proceedings ArticleDOI
28 Oct 2011
22 Citations
An experimental evaluation shows that by using a little more CPU time, the (wall-clock) time to run an ETL program can be greatly reduced.
Proceedings ArticleDOI
Huamin Wang, Zhiwei Ye 
22 May 2010
18 Citations
Practice has proved that the model and framework proposed in this paper can improve the ETL efficiency to a large extent.
Book ChapterDOI
Bruno Oliveira, Orlando Belo 
04 Dec 2012
37 Citations
We think this is a practical approach to reduce significantly the inadequacy of an ETL system in its real world scenario.
Proceedings ArticleDOI
Sonal Sharma, Rajni Jain 
08 Feb 2014
9 Citations
So this study would be substantially fruitful for understanding various approaches of ETL modeling in data warehousing.
The practice and experiment results show that the proposed approach has better efficiency and applicability than other existing ETL methods.
Proceedings ArticleDOI
Xiufeng Liu, Nadeem Iftikhar 
13 Apr 2015
15 Citations
The experimental results show that the proposed framework can achieve 4.7 times faster than the ordinary ETL dataflows (without using the proposed partitioning and optimization methods), and is comparable to the similar ETL tools.

See what other people are reading

How can the operating system facilitate data analytics and reporting for monitoring sales and trends in a fries business?
4 answers
The operating system can facilitate data analytics and reporting for monitoring sales and trends in a fries business by utilizing IT tools for data analysis. These tools can process data, create data segments, and derive system health metrics from application data. By incorporating machine learning, artificial intelligence, and natural language processing methods, dynamic analyses can be performed without programming skills, enhancing the ability to detect trends and predict sales. This proactive approach enables the early detection of fads, crazes, and sudden sales trends, allowing for better prediction of overall sales and timely adjustments in the volume of items to meet demand, ultimately benefiting both suppliers and retailers.
Is MySQL efficient in database?
5 answers
MySQL is widely used in various fields and has garnered attention for its performance optimization. It is favored for being open-source, lightweight, and supporting high concurrency. However, with the rise of unstructured big data, NoSQL databases like Cassandra are becoming more efficient for handling heavy loads and complex queries, surpassing MySQL and other NoSQL options like HBase and MongoDB. To address the evolving database application requirements and the challenges posed by massive scale and high concurrency, a distributed database based on MySQL transformation has been proposed, aiming to enhance data backup, high availability, and performance. While MySQL remains efficient for many applications, the choice between MySQL and NoSQL databases depends on the specific needs of the organization and the nature of the data being handled.
What is the role of systematic layout planning in optimizing warehouse operations and efficiency?
5 answers
Systematic layout planning plays a crucial role in optimizing warehouse operations and efficiency by addressing various issues such as unreasonable layout, inefficient material flows, and low operational effectiveness. By utilizing methods like the Analytic Hierarchy Process for picking efficiency analysis, systematic layout planning can improve the flow of production processes, reduce distances between departments, and enhance overall operational efficiency. Additionally, it aids in designing flexible, sustainable, and cost-effective supply chains by considering investment and operational costs, thus contributing to efficient and responsive supply chain management. Moreover, systematic layout planning helps in evaluating current warehouse designs, proposing alternatives for performance improvement, and calculating optimal dimensions for warehouses, ultimately enhancing productivity without excessive costs.
What are the long-term psychological consequences of a low humane orientation on an individual's acculturation process?
5 answers
A low humane orientation can have significant psychological consequences on an individual's acculturation process. Research indicates that low acculturation is associated with negative mental health outcomes, such as increased psychological distress. While some studies suggest that low acculturation may lead to better outcomes, findings show that individuals with low acculturation levels are more likely to experience post-disaster panic attacks, higher anxiety, and poorer mental health status. Moreover, the negative impact of low acculturation on mental health can be mitigated by positive perceptions of ethnic community social cohesion but intensified by negative interactions within the ethnic community. Therefore, a low humane orientation within the acculturation process can contribute to heightened psychological distress and poorer mental well-being outcomes.
What is the best note taking app for ipad?
4 answers
Based on the research data provided, the best note-taking app for iPad depends on the user's specific needs. Various apps offer different features and functionalities. For instance, some apps focus on simulating real handwriting with smooth strokes and responsive interfaces, while others prioritize document reading, annotation, and integration with cloud services. Additionally, there are apps tailored for classroom observations, offering features like visual depictions of classrooms, automatic time-stamps, and interaction tracking. Ultimately, the choice between apps like GoodReader, PDF Reader Pro, or Lesson Note depends on the user's requirements, such as handwriting simulation, document management, annotation tools, or classroom observation functionalities. Users should select the app that aligns best with their specific note-taking needs.
What are some popular final year projects ideas for Bachelor of Science in Computer Science students?
5 answers
Final year projects for Bachelor of Science in Computer Science students can include contributing to open source software projects, implementing a web-based final year project system with enhanced security features, developing novel game ideas for local communities emphasizing soft skills and teamwork. These projects offer practical experiences in software engineering principles, enhance technical and communication skills, and provide opportunities to work on real-world applications. Additionally, projects like these can help students build a portfolio of contributions, gain confidence, and prepare them for the industry. By focusing on areas like open source software, security implementation, and community-oriented game development, students can explore diverse aspects of computer science and make meaningful contributions to the field.
What are the key performance indicators (KPIs) that can be used to measure the effectiveness of a datalake?
4 answers
Key Performance Indicators (KPIs) are essential metrics for assessing the effectiveness of various organizational processes, including the management and operation of a datalake. To measure the effectiveness of a datalake, KPIs should encompass a range of dimensions such as performance, quality, and user satisfaction, tailored to the specific objectives and operational aspects of the datalake. Firstly, considering the economic perspective and environmental performance as highlighted by Kantor et al., the internal rate of return and CO2 emissions could be adapted to evaluate the cost-efficiency and environmental impact of running the datalake, respectively. These indicators are crucial for ensuring the datalake's operations align with broader organizational goals of profitability and sustainability. From a project management perspective, especially relevant to construction but applicable to datalake management, cost, time/schedule, quality, client satisfaction, and effective communication are identified as paramount KPIs. These could be translated into datalake terms as cost of storage and operations, data processing and retrieval times, data quality and integrity, user satisfaction levels, and the effectiveness of communication between the datalake and its users or integrated systems. Furthermore, the application of KPIs in evaluating ERP processes within companies, as discussed by Selmeci et al., suggests the importance of measuring the effectiveness of ERP integrations and the performance of data processing within the datalake. This could include metrics on the integration success rate, query response times, and the accuracy of data analytics outputs. Additionally, the development and management of universities' strategic plans through KPIs, as proposed by Abdullah et al., can inspire the monitoring of educational and research outputs generated from datalake insights, emphasizing the datalake's role in supporting academic excellence and innovation. Lastly, the concept of Key Incorporated Results (KIRs), combining various performance measurement methods, offers a comprehensive approach to datalake KPIs by integrating process auditing, strategic alignments, and agile concepts to ensure the datalake's operations are both efficient and aligned with organizational strategies. In summary, effective KPIs for measuring the effectiveness of a datalake should include cost-efficiency, environmental impact, operational performance (such as processing times and data quality), user satisfaction, integration success with other systems, and the contribution to strategic objectives like research and innovation.
What role do data lake solutions like Microsoft Fabric and Databricks play in big data analytics?
5 answers
Data lake solutions, exemplified by Microsoft Fabric and Databricks, are pivotal in the realm of big data analytics, serving as foundational components that enable the efficient handling, storage, and analysis of vast datasets. Databricks, as highlighted, operates on Azure and other multi-cloud environments, offering a unified data and analytics platform that supports a wide range of functionalities including data warehousing, advanced analytics, real-time streaming analytics, and more. Its integration with Delta Lakes and optimization capabilities positions it as a leading platform for Lakehouse capabilities, outperforming competitors in query speed and cost efficiency through its compute engines, Apache Spark, and Photon. Similarly, Microsoft Azure has significantly contributed to the big data landscape by providing a comprehensive platform for data workloads, emphasizing its strength in PaaS offerings and focusing on making big data tools and technologies both rich and accessible. Azure's support for data lakes, particularly through Azure Data Lake Store (ADLS), showcases its capability to handle big data analytics by offering scalable, secure, and agile solutions that accommodate structured, unstructured, and semi-structured data in their native formats. Data lakes themselves are crucial for big data analytics due to their flexibility in storing raw data formats, facilitating a broad spectrum of analytics tasks by providing access to extensive computation and storage resources. This flexibility is essential for supporting the diverse needs of data analysis, from the ingestion stage to visualization and analytics, thereby democratizing data access and enabling real-time decision analytics. The role of data lakes in big data analytics is further underscored by their ability to manage large datasets, uncover hidden patterns, and derive valuable insights, which are fundamental for driving digital transformation and addressing big data challenges across various domains. In essence, data lake solutions like Microsoft Fabric and Databricks play a critical role in big data analytics by offering robust, scalable, and flexible platforms that support the comprehensive management and analysis of big data, thereby enabling organizations to harness the full potential of their data assets.
Why is mental health important global data?
5 answers
Mental health data holds global importance due to the significant burden of mental health challenges worldwide. Sharing and reusing this data is crucial for advancing scientific knowledge and addressing mental illness effectively. The FAIR Guiding Principles emphasize the importance of making data Findable, Accessible, Interoperable, and Reusable, enhancing transparency and efficiency in research. Collaborative efforts across disciplines and nations are key in improving the accessibility and usability of mental health data, facilitating research progress and promoting equity in healthcare. By adhering to these principles and promoting data sharing, the global mental health community can work towards better understanding, treating, and preventing mental health issues on a broader scale.
How business choose datalake?
5 answers
Businesses choose Data Lakehouses as a modern solution for their data needs due to the scalability, diverse data types support, and advanced analytics capabilities they offer. Data Lakehouses, which combine the features of Data Warehouses and Data Lakes, provide direct access to open data formats, excellent support for machine learning, and efficient data processing. To effectively utilize Data Lakehouses, businesses employ techniques such as graph-based searching, node ranking algorithms, and collaborative filtering to enhance data relevancy, extract candidate results, and enrich data sub-graphs for better decision-making. The need for scalable storage platforms that can handle diverse data types, ingest data at high speeds, and accommodate big data requirements drives businesses towards adopting Data Lakes as a simple yet powerful option for managing and analyzing vast amounts of data effectively. Additionally, businesses leverage Data Lakes on the cloud to access new analytical techniques, predictive analytics, data visualization, and artificial intelligence, enabling them to build data science models using open-source languages and recommend the best algorithms for various use cases. The concept of Data Lakes as storage spaces for both structured and unstructured data allows businesses to dynamically incorporate the data they require, enhancing adaptability and robust data analytics for improved decision-making and business intelligence processes.
Why is subject access to information so bad?
5 answers
Subject access to information is often problematic due to various reasons outlined in the provided contexts. The challenges stem from poorly organized collections lacking quality-controlled subject access systems, the inadequacy of technologies to handle the vast amount of online information on the World Wide Web, difficulties in subject searches leading to either no retrievals or an overwhelming number of records, and the issue of providing subject access to works of art by individuals lacking expertise in the field. Additionally, information access technologies sometimes prioritize efficiency over expressiveness, resulting in the retrieval of irrelevant documents in response to user queries. These factors collectively contribute to the subpar state of subject access to information.