Tag Archives: database

9 top threats to cloud computing security

9 top threats to cloud computing security

Cloud computing has grabbed the spotlight at this year’s RSA Conference 2013 in San Francisco, with vendors aplenty hawking products and services that equip IT with controls to bring order to cloud chaos. But the first step is for organization to identify precisely where the greatest cloud-related threats lie.

To that end, the CSA (Cloud Security Alliance) has identified "The Notorious Nine," the top nine cloud computing threats for 2013. The report reflects the current consensus among industry experts surveyed by CSA, focusing on threats specifically related to the shared, on-demand nature of cloud computing.

First on the list is data breaches. To illustrate the potential magnitude of this threat, CSA pointed to a research paper from last November describing how a virtual machine could use side-channel timing information to extract private cryptographic keys in use by other VMs on the same server. A malicious hacker wouldn’t necessarily need to go to such lengths to pull off that sort of feat, though. If a multitenant cloud service database isn’t designed properly, a single flaw in one client’s application could allow an attacker to get at not just that client’s data, but every other clients’ data as well.

The challenge in addressing this threats of data loss and data leakage is that "the measures you put in place to mitigate one can exacerbate the other," according to the report. You could encrypt your data to reduce the impact of a breach, but if you lose your encryption key, you’ll lose your data. However, if you opt to keep offline backups of your data to reduce data loss, you increase your exposure to data breaches.

The second-greatest threat in a cloud computing environment, according to CSA, is data loss: the prospect of seeing your valuable data disappear into the ether without a trace. A malicious hacker might delete a target’s data out of spite — but then, you could lose your data to a careless cloud service provider or a disaster, such as a fire, flood, or earthquake. Compounding the challenge, encrypting your data to ward off theft can backfire if you lose your encryption key.

Data loss isn’t only problematic in terms of impacting relationships with customers, the report notes. You could also get into hot water with the feds if you’re legally required to store particular data to remain in compliance with certain laws, such as HIPAA.

The third-greatest cloud computing security risk is account or service traffic hijacking. Cloud computing adds a new threat to this landscape, according to CSA. If an attacker gains access to your credentials, he or she can eavesdrop on your activities and transactions, manipulate data, return falsified information, and redirect your clients to illegitimate sites. "Your account or services instances may become a new base for the attacker. From here, they may leverage the power of your reputation to launch subsequent attacks," according to the report. As an example, CSA pointed to an XSS attack on Amazon in 2010 that let attackers hijack credentials to the site.

Source: http://www.infoworld.com/t/cloud-security/9-top-threats-cloud-computing-security-213428

Did you like this? Share it:

Data Integration is Key Tech Need in 2013

In the past few years, many marketers have tested multiple kinds of campaign management tools, and that has created a multitude of unwieldy data silos.

“Marketers are struggling with integration issues,” says Michael Della Penna, senior vice president, emerging channels, Responsys. “They’re looking for a solution that can collect critical social data and make it actionable.”

This means that integrated solutions will be a key area for tech spending, says Della Penna, who notes that 2011 was very much a testing phase for social media.

“It wasn’t unusual to talk to a brand that has three campaign management tools in place, testing which is the best tool for them,” he says, noting that many tools initially just focused on one specific area, like email or social listening. “But by the end of 2012, many of these tools started morphing and increasing their offerings to increase revenue by account.

Now, brands are realizing that they don’t need three of the same thing, and will look to consolidate into the one that best meets their particular needs.

Where else will marketers focus their  tech budget dollars this year?

Orchestration will be key in 2013, says Della Penna. “All of the different channels [available] have created issues—customers are seeing different voices in different channels, and brands need to be creating messages in a more coordinated way, timed to where the consumer is in the buying process.”

Tied into this is optimization and responsive design, considering how customers experience things in different channels and making sure emails are rendered properly for viewing on a multitude of devices, he adds.

Optimizing systems to deliver localized targeting will also be a key area, as marketers try to take advantage of locally relevant social data. “A lot of social data is unstructured, so the challenge is making this data useful in campaign development.”

Marketing automation has provided amazing results for many firms, and there is a trend to extend that beyond email into other channels, such as display, where what ads are pushed to website visitors can be automated based not only on behavior but whether the prospect has already converted.

“We can pull those who have converted out of market so clients are not wasting money trying to contact them,” he says, noting that display has been making a comeback. “There’s a huge interest in display retargeting, building strategies that are different between known and unknown users for contextually relevant offers.”

On the mobile front, there is a renewed interest in technology to enable SMS. “It’s the workhorse of mobile, and brands are now coordinating it with other channels for things like notifications about product availability or confirming purchases,” says Della Penna. “There’s particular interest in tools to push relevant offers such being able to leverage [the iOS application] Passbook to push out a coupon.”

Is getting C-level buy-in for marketing tech expenditures becoming easier? Della Penna thinks so. “The CMO and CTO relationship is changing. There is rarely a situation where we don’t have IT involved at some point in the buying cycle, and all disciplines are working more closely together.”

The way B2B and B2C firms are looking at marketing tech isn’t all that different, he adds. “The scale just varies. In B2B there may be more of a focus on live events and face-to-face but it’s all about focusing on knowing the customer better and then reaching them at the right touch points.”

Source: http://chiefmarketer.com/database-marketing/data-integration-key-tech-need-2013

Did you like this? Share it:

Use Memcached for Java enterprise performance, Part 2: Database-driven web apps

However you slice it traditional caching requires performance trade-offs that some enterprise applications cannot afford. Find out for yourself why Memcached is a go-to solution for Java developers whose applications need serious scale. After first setting up spymemcached as your open source Java client for Memcached, you’ll use it in two powerful application scenarios: first configuring Memcached as second-level cache for Hibernate (via hibernate-memcached), and then using it to cache the HTML generated for each web page.

We concluded the first half of this tutorial with a look at using Telnet and the Memcached protocol to store and retrieve cache entries in a Memcached server. Accessing Memcached via Telnet is especially useful for debugging, but if you want to use Memcached in a Java enterprise application you’ll need to use a Memcached Java client.

We’ll use spymemcached, a very popular Memcached Java client, for the introductory purposes of this tutorial. Listing 1 shows spymemcached’s main class, MemcachedClient.

Listing 1. MemcachedClient
public static void main(String[] args) throws Exception{
    if(args.length < 2){
        System.out.println("Please specify command line options");
        return;
    }
    MemcachedClient memcachedClient = new MemcachedClient(AddrUtil.getAddresses("127.0.0.1:11211"));
    if(commandName.equals("get")){
        String keyName= args[1];
        System.out.println("Key Name " +keyName);
        System.out.println("Value of key " +memcachedClient.get(keyName));
    }else if(commandName.equals("set")){
        String keyName =args[1];
        String value=args[2];
        System.out.println("Key Name " +keyName + " value=" + value);
        Future<Boolean> result= memcachedClient.set(keyName, 0, value);
        System.out.println("Result of set " + result.get());
    }else if(commandName.equals("add")){
        String keyName =args[1];
        String value=args[2];
        System.out.println("Key Name " +keyName + " value=" + value);
        Future<Boolean> result= memcachedClient.add(keyName, 0, value);
        System.out.println("Result of add " + result.get());
    }else if(commandName.equals("replace")){
        String keyName =args[1];
        String value=args[2];
        System.out.println("Key Name " +keyName + " value=" + value);
        Future<Boolean> result= memcachedClient.replace(keyName, 0, value);
        System.out.println("Result of replace " + result.get());
    }else if(commandName.equals("delete")){
        String keyName =args[1];
        System.out.println("Key Name " +keyName );
        Future<Boolean> result= memcachedClient.delete(keyName);
        System.out.println("Result of delete " + result.get());
    }else{
        System.out.println("Command not found");
    }
    memcachedClient.shutdown();

}

In Listing 1, we first create a MemcachedClient object with hostname:portname as its argument. Once we have the object we can start calling its methods to set and get cache entries. Note that MemcachedClient has equivalent methods for every method supported by the Memcached protocol:

  • MemcachedClient.set() is used to store a Java object into the Memcached server. In Listing 1, we stored an instance of a Contact object with the string contactId-1. The second parameter of this set method is time-in-seconds.
  • MemcachedClient.get() is used to get the value of the key from a Memcached server. In Listing 1 the value of the key is an object of Contact.java, so the client will first retrieve the value and then deserialize it and return the object. The get() method returns null if the value is not found or is expired.
  • MemcachedClient.add() is used to add an object to the cache only if it does not already exist. In Listing 1 the contactId-1 key exists in the cache already, so it won’t be added.
  • MemcachedClient.replace() replaces an object with the value for the given key (if there is already such a value). In Listing 1, contactId-1 is already in the cache, so its value would be replaced.
  • MemcachedClient.delete() deletes a given key from the cache. In Listing 1, the call is used to delete the contactId-1 key.
MemcachedClient.set()

You can use MemcachedClient.set() to store a simple string or a complex object. When you store a complex object, MemcachedClient will first serialize the object and then store it. As a result, every object that you store in Memcached must be serializable, and the key must also be a string. In Listing 1, the set() method returns an object of Future<Boolean>. When we call the set method, it is executed asynchronously, so the control moves to the next line without waiting for a response from the Memcached server. If you needed to know the result of the set operation, then you would call setResult.get() on the Future object instance.

Spying under the hood: spymemcached

Before we move on to the web application exercise, let’s look under the hood to see how spymemcached works. Figure 1 is a sequence diagram showing what happens in spymemcached when a client issues a get().

Figure 1. get() under the hood (click to enlarge)

Spymemcached is an asynchronous, single-threaded Memcached client. When you call any caching-related method on spymemcached’s MemcachedClient, it will be handled asynchronously. The client call method handles writing the details of the operation that should be performed into a queue and returning the control back to the client making the call. The actual interaction with the Memcached server, meanwhile, is handled by a separate thread that runs in the background.

Notice that the sequence diagram in Figure 1 has two different threads. The first thread shows the method sequence for what happens when a client makes a get() call. The second thread displays a method sequence for the daemon thread that communicates with the Memcached server. Both threads are worth a closer look.

Sequence of events in a client thread

When you call MemcachedClient‘s get() method it takes the arguments and forwards control to the asyncGet() method on the object of Memcached class. The asyncGet() method then forwards control to either AsciiOperationFactory or BinaryOperationFactory, depending on the Memcached protocol your client uses to communicate with the server. AsciiOperationFactory is the default value.

The AsciiOperationFactory constructs an object of the command-specific operation object. In this case, since the client issued a get command, it creates an object of GetOperationImpl and returns it. The MemcachedClient.asyncGet() method then takes care of attaching a callback function to the operation. This function will be called when MemcachedClient gets data back from the server and returns a java.util.concurrent.Future object. The client uses the java.util.concurrent.Future to retrieve the data returned from server.

Once MemcachedClient has the object of GetOperationImpl, it first tries to validate the key by ensuring that the length of the key is less than 250 characters and does not contain any special characters. Once the key is validated, the next task is to figure out which server the request should go to. For that MemcachedClient passes control to an instance of the NodeLocator class with the key. The NodeLocator class calls the HashAlgorithm.hash(key) method to get the hashCode for the key. By default, NodeLocator will call the hashCode() method on the key, which is a String object. Once it has the hashCode it will divide that by the number of servers; for example, if the hashCode were 10 and the number of servers three, then the remainder would be one. So the cache entry would be located in Server 1. The MemcachedNode object representing Server 1 would be selected. The MemcachedConnection object would then add the get operation to the queue of Server 1 and return control to the client code.

Sequence of events in daemon threads

If you take a look at the MemcachedClient source code, you will notice that it implements the java.lang.Thread interface. When you create a new instance of the MemcachedClient it kicks off a new thread by calling a start() method on the current object, at which point the JVM will call a MemcachedClient run() method from the newly created thread.

Inside the run() method, MemcachedClient checks the value of the running flag. If it is true, MemcachedClient calls the handleIO() method of the MemcachedConnection object. handleIo() looks at the current job queue to get a list of pending tasks and tries to optimize them. For example, if more than one get() request is pending then this method will combine them into one call. The handleIO() method uses the java.nio methods to communicate with the server. When you call the MemcachedClient shutdown() method, it changes the value of the running flag to false, which results in stopping the run() method and the background daemon thread. It also closes the connection with the Memcached server.

Using Memcached in an enterprise web application

This section introduces you to integrating Memcached into an enterprise architecture. Using the Contact web application introduced in Listing 1, we’ll add, remove, update, and view records in a CONTACT table. Next I’ll show you how to use Memcached to alleviate database load by configuring it as second-level cache for Hibernate. Finally, I’ll explain how to use Memcached to store custom Java objects and cache the HTML generated for each web page.

Start by downloading ManageContact-NoCaching.zip. The ManageContact application contains a ContactServlet that looks at incoming requests and decides what database interaction is required to execute each one. It then forwards control to ContactDAO in order to execute the required database interaction. ContactDAO uses Hibernate to execute select, insert, and update functions on the Contact table. When control is returned to ContactServlet, it forwards control to the appropriate JSP in order to generate markup on the web page. The ManageContact application has a Maven script that takes care of downloading all of the necessary dependencies; you can execute a mvn install command to download these dependencies and then run the application in an embedded Glassfish server. Once the server is started you should be able to access the application at http://localhost:8080/ManageContact/contact.

Enter hibernate-memcached

Most web applications spend a good chunk of their time interacting with databases. Caching data can help you speed up that interaction, as well reduce load on your database. Hibernate provides a nice interface for caching that allows you to use your own caching framework. You instruct Hibernate on which caching framework to use by specifying the name of a class that implements org.hibernate.cache.CacheProvider with the property hibernate.cache.provider_class. From there, you have two options: you can either create your own class that implements the org.hibernate.cache.CacheProvider interface and stores the cache entries in the Memcached server, or you can use the hibernate-memcached framework, which is an open source framework based on the spymemcached client. hibernate-memcached supports entity and query caching.

Configure hibernate-memcached as a caching framework

By default caching is disabled in hibernate-memcached, so our first step is to enable Hibernate’s second-level cache. We’ll also need to configure Memcached as a caching implementation for Hibernate. We can handle both of these requirements in the Hibernate configuration file (hibernate.cfg.xml), which is used to configure application-level settings for Hibernate:

Listing 2. Hibernate config
<property name="cache.provider_class">com.googlecode.hibernate.Memcached.MemcachedCacheProvider</property>
 <property name="hibernate.Memcached.servers">localhost:11211</property>
 <property name="hibernate.Memcached.cacheTimeSeconds">300</property>
 <property name="hibernate.Memcached.connectionFactory">BinaryConnectionFactory</property>

Now let’s take a closer look at each of the properties set. You can enable the second-level cache by setting a value of cache.provider_class to com.googlecode.hibernate.Memcached.MemcachedCacheProvider. The properties that start with "hibernate.Memcached" are specific to the Memcached provider:

  • cache.provider_class: The value of this property defines which class should be used as a cache implementation. In this case we set it to com.googlecode.hibernate.Memcached.MemcacheddCacheProvider, which is provided by the hibernate-memcached framework and uses Memcached as a caching framework.
  • hibernate.Memcached.server: The properties starting with hibernate.Memcached are used by the hibernate-memcached framework. The value of hibernate.Memcached.server should be a space-delimited list of Memcached instances in host:port format. This Memcached server is running on my localhost at port 11211, so I set the value to localhost:11211, which is the default value.
  • hibernate.Memcached.cacheTimeSeconds: The value of this property defines the default number of seconds that each item should be cached. I want the CONTACT record to be cached for 300 second.
  • hibernate.Memcached.connectionFactory: This is the "simple" name of the spyMemcached ConnectionFactory class. It must be one of DefaultConnectionFactory, KetamaConnectionFactory, or BinaryConnectionFactory. The BinaryConnectionFactory performs much better by using a binary protocol
More configuration properties

See "Adding hibernate-memcached to your application" on the hibernate-memcached wiki for a list of all the configuration properties supported by hibernate-memcached.

Develop a caching strategy

Once you have enabled caching your next step is to choose which data you want to cache and what caching strategy you will use. In my case, I want to cache the data from my CONTACT table. Since that will be regularly updated, I set my cache strategy to read-write by adding a cache element in the Contact.hbm.xml, like this:

Listing 3. Configure the caching strategy
<hibernate-mapping package="com.javaworld.Memcached">
    <class name="Contact" table="CONTACT"  >
    <cache usage="read-write"/>
        <id name="contactId" column="CONTACTID">
            <generator class="increment"/>
        </id>
        <property name="firstName" column="FIRSTNAME"/>
        <property name="lastName" column="LASTNAME"/>
        <property name="email" column="EMAIl"/>
    </class>
</hibernate-mapping>

After you’ve updated your own Contact.hbm.xml file (which you can download with the article source), start the Memcached server by executing a Memcached -vv command. This will start Memcached in verbose mode so that it prints every client interaction on the console. Next, execute mvn clean install to start the application. Go to http://localhost:8080/ManageContact/contact and add couple of records. When you click on the record to go to the details, you should notice that no SQL query is executed; instead those records are coming back from the cache. On the Memcached server console, you should see an interaction similar to what is shown in Figure 2.

Figure 2. A view from the Memcached console (click to enlarge)

Using Memcached for server responses

So far you’ve seen how to reduce the load on your database by using Memcached as a second-level cache in Hibernate. Not all application scenarios are quite so simple, however. For instance, how should you handle web pages that process and display data from a web service? You’ll find your application using CPU-intensive logic to build markup for every response, which probably would be better off cached. If you cache the generated markup then the next request for that markup will return from the cache instead of going to a servlet.

The first step is to build a simple Servlet filter to intercept the request. Start by copying the CachingResponseWrapper.java and CachingResponseWriter.java into a com.javaworld.Memcached.filter package. Together these two classes will collect the responses generated by ContactServlet into a String.

Next, create a CacheFilter.java in the com.javaworld.Memcached.filter package and change its doFilter() method so that it looks like this:

Listing 4. CacheFilter.java
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
        log.debug("Inside CachingFilter.doFilter() " );
        try {
            HttpServletRequest httpServletRequest = (HttpServletRequest)request;
            HttpServletResponse httpServletResponse = (HttpServletResponse)response;
            ObjectPool<MemcachedClient> MemcachedClientPool = MemcachedHelper.getMemcachedConnectionPool();
            MemcachedClient  MemcachedClient = MemcachedClientPool.borrowObject();
            StringBuffer cacheKeyBuffer = new StringBuffer();
            cacheKeyBuffer.append(httpServletRequest.getContextPath());
            cacheKeyBuffer.append(httpServletRequest.getServletPath());
            if(httpServletRequest.getQueryString() != null){
                cacheKeyBuffer.append("?");
                cacheKeyBuffer.append(httpServletRequest.getQueryString());
            }
            
            String cacheKey = httpServletResponse.encodeURL(cacheKeyBuffer.toString());
            System.out.println ("Get Path Info  " + cacheKey);
            String cachedResponse =(String) MemcachedClient.get(cacheKey);
            
            if( cachedResponse == null){
                System.out.println("Response is not cached forwarding control to servlet");
                CachingResponseWrapper cachingResponseWrapper =new CachingResponseWrapper((HttpServletResponse)response);
                chain.doFilter(request, cachingResponseWrapper);
                CachingResponseWriter collectResponseWriter = (CachingResponseWriter)cachingResponseWrapper.getWriter();
                String collectedResponseStr = collectResponseWriter.getCollectedResponse();//.replaceAll("\n", "") ;
                System.out.println( "Set value in the Memcached for key " + httpServletResponse.encodeURL(collectedResponseStr));
                
                System.out.println("Result of set" + MemcachedClient.set(cacheKey, 0, collectedResponseStr).get());
                //MemcachedClient.flush().get();
            }else{
                System.out.println("Returning cached response ");
                response.setContentType("text/html");
                response.getWriter().println(cachedResponse);
            }
            //MemcachedClient.flush().get();
            MemcachedClientPool.returnObject(MemcachedClient);
        } catch (NoSuchElementException e) {
            e.printStackTrace();
        } catch (IllegalStateException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

Note that in the doFilter() method you’re first building the URL that the client used to access the ContactServlet; this URL is used as a key for caching. Once the URL is built, you’ll execute MemcachedClient.get(cacheKey) to check if the response for the given URL is already cached. If it is, then return it; if it isn’t, then wrap the response object into a CachingResponseWriter and pass control to a servlet that will generate the necessary markup. After the control returns from the servlet, call a collectResponseWriter.getCollectedResponse() method. This method will return the response generated by the servlet as a String. Take that response and save it in your cache with its URL as the key. The next time this query is used, the response will come from your cache and not from the servlet.

source:

http://www.javaworld.com/javaworld/jw-05-2012/120515-memcached-for-java-enterprise-performance-2.html?page=1

Did you like this? Share it:

Database Testing – Practical Tips and Insight on How to Test Database

Database is one of the inevitable parts of a software application these days. It does not matter at all whether it is web or desktop, client server or peer to peer, enterprise or individual business, database is working at backend. Similarly, whether it is healthcare of finance, leasing or retail, mailing application or controlling spaceship, behind the scene a database is always in action.

Moreover, as the complexity of application increases the need of stronger and secure database emerges. In the same way, for the applications with high frequency of transactions (e.g. banking or finance application), necessity of fully featured DB Tool is coupled.

Currently, several database tools are available in the market e.g. MS-Access2010, MS SQL Server 2008 r2, Oracle 10g, Oracle Financial, MySQL, PostgreSQL, DB2 etc.  All of these vary in cost, robustness, features and security. Each of these DBs possesses its own benefits and drawbacks. One thing is certain; a business application must be built using one of these or other DB Tools.

Before I start digging into the topic, let me comprehend the foreword. When the application is under execution, the end user mainly utilizes the ‘CRUD’ operations facilitated by the DB Tool.

C: Create – When user ‘Save’ any new transaction, ‘Create’ operation is performed.
R: Retrieve – When user ‘Search’ or ‘View’ any saved transaction, ‘Retrieve’ operation is performed.
U: Update – when user ‘Edit’ or ‘Modify’ an existing record, the ‘Update’ operation of DB is performed.
D: Delete – when user ‘Remove’ any record from the system, ‘Delete’ operation of DB is performed.

It does not matter at all, which DB is used and how the operation is preformed. End user has no concern if any join or sub-query, trigger or stored-procedure, query or function was used to do what he wanted. But, the interesting thing is that all DB operations performed by user, from UI of any application, is one of the above four, acronym as CRUD.

Database Testing

As a database tester one should be focusing on following DB testing activities:

What to test in database testing:
1) Ensure data mapping:

Make sure that the mapping between different forms or screens of AUT and the Relations of its DB is not only accurate but is also according to design documents. For all CRUD operations, verify that respective tables and records are updated when user clicks ‘Save’, ‘Update’, ‘Search’ or ‘Delete’ from GUI of the application.

2) Ensure ACID Properties of Transactions:

ACID properties of DB Transactions refer to the ‘Atomicity’, ‘Consistency’, ‘Isolation’ and ‘Durability’. Proper testing of these four properties must be done during the DB testing activity. This area demands more rigorous, thorough and keen testing when the database is distributed.

3) Ensure Data Integrity:

Consider that different modules (i.e. screens or forms) of application use the same data in different ways and perform all the CRUD operations on the data. In that case, make it sure that the latest state of data is reflected everywhere. System must show the updated and most recent values or the status of such shared data on all the forms and screens. This is called the Data Integrity.

4) Ensure Accuracy of implemented Business Rules:

Today, databases are not meant only to store the records. In fact, DBs have been evolved into extremely powerful tools that provide ample support to the developers in order to implement the business logic at DB level. Some simple examples of powerful features of DBs are ‘Referential Integrity’, relational constrains, triggers and stored procedures. So, using these and many other features offered by DBs, developers implement the business logic on DB level. Tester must ensure that the implemented business logic is correct and works accurately.

Above points describe the four most important ‘What Tos’ of database testing. Now, I will put some light on ‘How Tos’ of DB Testing. But, first of all I feel it better to explicitly mention an important point. DB Testing is a business critical task, and it should never be assigned to a fresh or inexperienced resource without proper training.

How To Test Database:
1. Create your own Queries

In order to test the DB properly and accurately, first of all a tester should have very good knowledge of SQL and specially DML (Data Manipulation Language) statements. Secondly, the tester should acquire good understanding of internal DB structure of AUT. If these two pre-requisites are fulfilled, then the tester is ready to test DB with complete confidence. (S)He will perform any CRUD operation from the UI of application, and will verify the result using SQL query.

This is the best and robust way of DB testing especially for applications with small to medium level of complexity. Yet, the two pre-requisites described are necessary. Otherwise, this way of DB testing cannot be adopted by the tester.

Moreover, if the application is very complex then it may be hard or impossible for the tester to write all of the needed SQL queries himself or herself. However, for some complex queries, tester may get help from the developer too. I always recommend this method for the testers because it does not only give them the confidence on the testing they have performed but, also enhance their SQL skill.

2. Observe data table by table

If the tester is not good in SQL, then he or she may verify the result of CRUD operation, performed using GUI of the application, by viewing the tables (relations) of DB. Yet, this way may be a bit tedious and cumbersome especially when the DB and tables have large amount of data.

Similarly, this way of DB testing may be extremely difficult for tester if the data to be verified belongs to multiple tables. This way of DB testing also requires at least good knowledge of Table structure of AUT.

3. Get query from developer

This is the simplest way for the tester to test the DB. Perform any CRUD operation from GUI and verify its impacts by executing the respective SQL query obtained from the developer. It requires neither good knowledge of SQL nor good knowledge of application’s DB structure.

So, this method seems easy and good choice for testing DB. But, its drawback is havoc. What if the query given by the developer is semantically wrong or does not fulfill the user’s requirement correctly? In this situation, the client will report the issue and will demand its fix as the best case. While, the worst case is that client may refuse to accept the application.

Conclusion:

Database is the core and critical part of almost every software application. So DB testing of an application demands keen attention, good SQL skills, proper knowledge of DB structure of AUT and proper training.

In order to have the confident test report of this activity, this task should be assigned to a resource with all the four qualities stated above. Otherwise, shipment time surprises, bugs identification by the client, improper or unintended application’s behavior or even wrong outputs of business critical tasks are more likely to be observed. Get this task done by most suitable resources and pay it the well-deserved attention.

Source:http://www.softwaretestinghelp.com/database-testing-%E2%80%93-practical-tips-and-insight-on-how-to-test-database/

Did you like this? Share it:

11 Important Database designing rules

Introduction

Before you start reading this article let me confirm that I am not a guru in database designing. The below 11 points which are listed are points which I have learnt via projects, my own experiences and my own reading. I personally think it has helped me a lot when it comes to DB designing. Any criticism welcome.

The reason why I am writing a full blown article is, when developers sit for designing a database they tend to follow the three normal forms like a silver bullet. They tend to think normalization is the only way of designing. Due this mind set they sometimes hit road blocks as the project moves ahead.

In case you are new to normalization, then click and see 3 normal forms in action which explains all three normal forms step by step.

Said and done normalization rules are important guidelines but taking them as a mark on stone is calling for troubles. Below are my own 11 rules which I remember on the top head while doing DB design.

Rule 1:- What is the Nature of the application(OLTP or OLAP)?

When you start your database design the first thing to analyze is what is the natureof theapplication you are designing for, is it Transactional or Analytical. You will find many developers by default applying normalization rules without thinking about the nature of the application and then later getting in to performance and customization issues. As said there are 2 kinds of applications transaction based and analytical based,let’s understand what these types are.

Transactional: - In this kind of application your end user is more interested in CRUD i.e. Creating, reading, updating and deleting records. The official name for such kind of database is called as OLTP.

Analytical: -In these kinds of applications your end user is more interested in Analysis, reporting, forecasting etc. These kinds of databases have less number of inserts and updates. The main intention here is to fetch and analyze data as fast as possible. The official name for such kind of databases is OLAP.

a2.jpg

So in other words if you think insert, updates and deletes are more prominent then go for normalized table design or else create a flat denormalized database structure.

Below is a simple diagram which shows how the names and address in the left hand side is a simple normalized table and by applying denormalized structure how we have created a flat table structure.

a3.jpg

Rule 2:- Break your data in to logical pieces, make life simpler

This rule is actually the 1st rule from 1st normal formal. One of the signs of violation of this rule is if your queries are using too many string parsing functions like substring, charindexetc , probably this rule needs to be applied.

For instance you can see the below table which has student names , if you ever want to query student name who is having "Koirala" and not "Harisingh" , you can imagine what kind of query you can end up with.

So the better approach would be to break this field in to further logical pieces so that we can write clean and optimal queries.

a4.jpg

Rule 3:- Do not get overdosed with rule 2

Developers are cute creatures. If you tell them this is the way, they keep doing it; well they overdo it leading to unwanted consequences. This also applies to rule 2 which we just talked above. When you think about decomposing, give a pause and ask yourself is it needed. As said the decomposition should be logical.

For instance you can see the phone number field; it’s rare that you will operate on ISD codes of phone number separately(Until your application demands it). So it would be wise decision to just leave it as it can lead to more complications.

a5.jpg

Rule 4:- Treat duplicate non-uniform data as your biggest enemy

Focus and refactor duplicate data. My personal worry about duplicate data is not that it takes hard disk space, but the confusion it creates.

For instance in the below diagram you can see "5th Standard" and "Fifth standard" means the same. Now you can say due to bad data entry or poor validation the data has come in to your system. Now if you ever want toderive a report they would show them as different entities which is very confusing from end user point of view.

a6.jpg

One of the solutions would be to move the data in to a different master table altogether and refer then via foreign keys. You can see in the below figure how we have created a new master table called as "Standards" and linked the same using a simple foreign key.

a7.jpg

Rule 5:- Watch for data separated by separators.

The second rule of 1st normal form says avoid repeating groups. One of the examples of repeating groups is explained in the below diagram. If you see the syllabus field closely, in one field we have too much data stuffed.These kinds of fields are termed as "Repeating groups". If we have to manipulate this data, the query would be complex and also I doubt performance of the queries.

a8.jpg

These kinds of columns which have data stuffed with separator’s need special attention and a better approach would be to move that field to a different table and link the same with keys for better management.

aa9.jpg

So now let’s apply the second rule of 1st normal form "Avoid repeating groups". You can see in the above figure I have created a separate syllabus table and then made a many-to-many relationship with the subject table.

With this approach the syllabus field in the main table is no more repeating and having data separators.

Rule 6:- Watch for partial dependencies.

aa10.jpg

Watch for fields which are depending partially on primary keys. For instance in the above table we can see primary key is created on roll number and standard. Now watch the syllabus field closely. Syllabus field is associated with a standard and not with a student directly (roll number).

Syllabus is associated with the standard in which the student is studying and not directly with the student. So if tomorrow we want to update syllabus we have to update for each student which is pain staking and not logical. It makes more sense to move these fields out and associate them with the standard table.

You can see how we have move the syllabus field and attached the same to standards table.

This rule is nothing but second normal form "All keys should depend on the full primary key and not partially".

Rule 7:- Choose derived columns preciously

a11.jpg

If you are working on OLTP applications must be getting rid of derive columns would be good thought, until there is some pressing reason of performance. In case of OLAP where we do lot of summations, calculations these kinds of fields are necessary to gain performance.

In the above figure you can see how average field is dependent on marks and subject. This is also one of form of redundancy. So for such kind of fields which are derived from other fields give a thought are they really necessary.

This rule is also termed as 3rd normal form "No columns should depend on other non-primary key columns". My personal thought is do not apply this rule blindly see the situation; it’s not that redundant data is always bad. If the redundant data is calculative data , see the situation and then decide do you want to implement the third normal form.

Rule 8:- Do not be hard on avoidingredundancy, if performance is the key

a12.jpg

Do not make it a strict rule that you will always avoid redundancy. If there is a pressing need of performance think about de-normalization. In normalization you need to make joins with many table and in denormalization the joins reduces and thus increasing performance.

Rule 9:- Multidimensional data is a different beast altogether

OLAP projects mostly deal with multidimensional data. For instance you can see the below figure, you would like to get sales as per country, customer and date. In simple words you are looking at sales figure which have 3 intersections of dimension data.

a13.jpg

For such kind of situations a dimension and fact design is a better approach. In simple words you can create a simple central sales fact table which has the sales amount field and he makes a connection with all dimension tables using a foreign key relationship.

a14.jpg

a15.jpg

Rule 10:- Centralize name value table design

Many times I have come across name value tables. Name and value tables means it has key and some data associated with the key. For instance in the below figure you can see we have currency table and country table. If you watch the data closely they actually only have Key and value.

a16.jpg

For such kind of table creating one central table and differentiating the data by using a type field makes more sense.

Rule 11:- For unlimited hierarchical data self-reference PK and FK

Many times we come across data with unlimited parent child hierarchy. For instance consider a Multi-level marketing scenario where one sales person can have multiple sales people below them. For such kind of scenarios using a self-referencing primary key and foreign key will help to achieve the same.

a17.jpg

This article is not meant to say that do not follow normal forms , but do not follow them blindly , look at your project nature and type of data you are dealing with.

a18.jpg

source:

http://www.c-sharpcorner.com/UploadFile/shivprasadk/11-important-database-designing-rules/#Rule6:-Watchforpartialdependencies.

Did you like this? Share it:

Introducing LocalDB, an improved SQL Express

Introduction

It gives me great pleasure to introduce a new version of SQL Express called SQL Express LocalDB.

LocalDB is created specifically for developers. It is very easy to install and requires no management, yet it offers the same T-SQL language, programming surface and client-side providers as the regular SQL Server Express. In effect the developers that target SQL Server no longer have to install and manage a full instance of SQL Server Express on their laptops and other development machines. Moreover, if the simplicity (and limitations) of LocalDB fit the needs of the target application environment, developers can continue using it in production, as LocalDB makes a pretty good embedded database too.

Background

Before focusing on technical description of LocalDB, I’d like to provide some background on the direction we took building it.

Today SQL Server Express serves two distinct needs. On one hand it is a free edition of SQL Server. The installation, management and programming of SQL Express in this role is expected to be 100% compatible with other editions. It can be used for learning, training and to run relatively small production database (with less than 10GB of data). Upgrade from SQL Express to paid SQL Server editions is a matter of typing in a license key and no installation is required.

But SQL Express is also SQL Server edition for developers writing applications targeting SQL Server. In this role the programming of SQL Express is still expected to be 100% compatible with other SQL Server editions, but SQL Express is supposed to be small, simple, low-footprint, require no configuration or administration, run as non-admin user, etc.

Our approach so far was to try to make SQL Express perform well in both roles. But as SQL Server product matured, and in effect added more complexity, it became harder and harder for SQL Express to be both compatible with other SQL Server editions and small/simple. The challenge is most visible in installation and configuration of SQL Express. In SQL Server "Denali" we decided to change the approach it and introduce a dedicated version of SQL Express for developers – LocalDB that delivers the simplicity and yet is compatible with other editions of SQL Server at the API level.

Also, by making LocalDB a better SQL Express for developers, we hope to be able to improve the regular SQL Express to be a better free SQL Server. We’d be very happy to hear your feedback in this area, especially if you’re using SQL Express as a database server and find any issues caused by the new features that were introduced to fit the needs of developers and desktop environment.

Source: http://blogs.msdn.com/b/sqlexpress/archive/2011/07/12/introducing-localdb-a-better-sql-express.aspx

Did you like this? Share it:

Chinese Programmer Won the Third in Facebook Hacker Cup

clip_image002

On 18 March, 2012 Facebook Hacker Cup was held at Facebook offices in Menlo Park in USA. It began at 10 a.m. and lasted for 3 hours. Competitors are top programmers from the world, each of whom were given three unrelated technical problems and were required to finish within three hours. The Competition Committee determined the results according to their speed and accuracy.

Russian programmer Roman Andreev solved a technical problem with 1 hour and 4 minutes and was the champion of this Facebook Hacker Cup. The silver medal winner is American programmer Tomek Czajka, who solved a technical problem with 1 hour and 5 minutes, just one minute behind Roman Andreev. And Lou Tiancheng from China won the third place, spending one hour and 44 minutes to solve a technical problem.

What’s pity, none of them solve the three problems within given time. Facebook plans to hold the Hacker Cup every spring.

Did you like this? Share it:

Unit Testing is a Means to an End

Most professional software developers these days understand the importance and value of writing and using unit tests. A nice summary of some of the oft-touted and oft-realized benefits of unit testing can be found in the StackOverflow.com thread Is Unit Testing worth the effort? [my only very minor criticism is the mixing of more specialized Test-Driven Development (TDD) with the more general unit test concept]. As with most good things, however, even unit testing enthusiasm can go too far. The benefits of unit testing can lead to overly enthusiastic unit testing developers forgetting that unit tests are not the end themselves, but rather are a means to the real end.

The "end" that most software developers are striving for is delivery of software solutions that make their users’ lives easier and more productive. Unit tests can be extremely valuable in obtaining this end and certainly add to software quality, but the overly zealous unit tester must beware of allowing the unit tests themselves to displace this end goal. It’s all too easy to allow oneself to get so bound up in writing exhaustive and "perfect" unit tests that one puts the true end goal at risk. In the remainder of this post, I look at ways in which developers can allow unit testing to move from helping achieve the desired end to unintentionally displacing the real end and putting it at risk.

Overreaching Unit Testing

Steven Sanderson has written "the benefit of unit testing is correlated with the non-obviousness of the code under test." I largely agree with this sentiment as a general guideline. I see little value in unit testing trivial "get" and "set" methods. Some methods are more readily evaluated via code review than via unit test.

The concept of code coverage can be a useful one as long as it’s not taken too far. Code coverage appears to provide high return for the effort for a while, but there comes a point of diminishing returns when gaining additional code coverage comes at much greater cost and may not be worth that cost. It’s also important to recognize that even the often highly expensive 100% code coverage typically means only all lines of code were executed and does not check all possible paths through the code.

All Code’s Unit Testability is Not Equal

The post Selective Unit Testing – Costs and Benefits clearly articulates well the differences in difficulty (cost) and advantages (benefits) of unit testing of different types of code. In cases where the advantages/benefits of a unit test are high and the cost/effort is low, the value of unit testing is obvious. On the opposite extreme, there are types of code that receive little benefit from unit testing.

Source: http://www.javaworld.com/community/?q=node/8354

Did you like this? Share it:

Hypertable Beats HBase Thoroughly in Performance Test

About Hypertable

clip_image002

Hypertable system includes three components: Hyperspace, Master and Range Server. Hyperspace is a lock service, akin to Google’s Chubby, mainly used for synchronization and testing whether there’s node failure and storing the top-level location information. Master is used to complete task allocation, future load balancing and post-disaster reconstruction (Automatically recover services after Range Server fails), and other functions. Range Server is the actual workers of Hypertable, primarily responsible for providing services for the data in a Range. Moreover, it shoulders the responsibility of reconstruction, ie replaying the local log to restore the former state before its own fault. Additionally, it accesses Hypertable client and other components.

Introduction

Both Hypertable and HBase are scalable open source database products, and their design blueprint based on Google BigTable. The main difference is that Hypertable relies on C++ language, and HBase is written based on Java. The test environment is 16 servers which are connected through Gigabit Network.

Test Environment:

OS: CentOS 6.1

CPU: 2X AMD C32 Six Core Model 4170 HE 2.1 Ghz

RAM: 24GB 1333MHz DDR3

Disk: 4X 2TB SATA Western Digital RE4-GP WD2002FYPS

The NameNode running of Hypertable and HBase is on No.1 test machine, while DataNodes is running on No.4 to No.5 test machine. Meanwhile, RangeServer and RegionServers run on the same set of computers and are configured to use all memory resources. Three Zookeeper and Hyperspace copies run on the No.1 and No.3 test machines. In this test, the table is configured to use Snappy compression, as well as use Bloom filters to load Row Key.

Random Write Test

In the random write test, Hypertable and HBase test writing four different 5TB of data, using the values 10000, 1000, 100 and 10, respectively. At the same item, the key is fixed at 20 bytes and format the random integer into zero fill.

The following chart shows the test results:

clip_image002[6]

The detailed performance test results:

clip_image004

The HBase throws an exception in the key test of 41 billion and 167 billion due to HBase RegionServers concurrent mode failure. No matter how to configure, when the speed that RegionServer produces useless data is faster than the Java garbage collection, the failure above will occur. Creating new garbage collection plan to solve the problem; however, it will take a heavy price for the run-time performance.

Matthew Hertz and Emery D. Berger published “Garbage Collection vs. Explicit Memory Management” at OOPSLA Conference in 2005, which provided a solid faith.

Random Read Test

The test mainly uses a set of random read request test to query throughput. Each system runs two tests, one to test Zipfian distribution, another to uniform distribution. The inserted key/value are fixed size, key to use fixed 20 bytes, and value to use fixed 1KB. The keys range from the integer in ASCII. Each query test returns a pair of keys. Run two tests on each system separately, one to load 5TB data and another to 0.5TB, which makes the experiment to be able to measure the performance of system memory to disk. 4,901,960,784 keys are loaded in 5TB test while 490,196,078 keys in 0.5TB test. The test client runs 128 processes (for a total of 512 process), and keep the maximum 512 queries in the whole testing process at the same time. This means each test issues 100 million queries.

Zipfian Distribution Environment Test

Configure Hypertable query cache to 2GB, and use the default value of block cache and memstore of HBase to keep good performance of HBase. See the following figure:

clip_image006

The detailed performance test results:

clip_image008

The main reason to lead to the difference is that Hypertable provides query cache and HBase can realize query cache as well, but Hypertable is subsystem of HBase. The subsystem generates a lot of garbage. Although it will improve the performance of HBase, it also brings some disadvantages, especially in ultra-large-scale write and large cell calculation of mixed workloads.

Uniform Distribution Test Environment

See the following figure:

clip_image010

The detailed performance test results:

clip_image012

The performance of HBase is close to Hypertable in the uniform distribution test, which should be due to disk IO bottleneck. Some garbage is also produced during the test.

Conclusion

In the past five years, Hypertable community has been working to perfect products. They aim at building Hypertable as a large data field of high-performance, high scalable database solution.

Did you like this? Share it:

Application Testing – Into the Basics of Software Testing!

Application Testing is an activity that every software tester performs daily in his career. These two words are extremely broad in practical aspect. However, only the core and most important areas will be discussed here. The purpose of this article is to touch all the primary areas so that the readers will get all the basic briefing at a single place.

Categories of Applications

Whether it is small calculator software with only the basic arithmetic operations, or an online enterprise solution; there are two categories of applications.
a. Desktop
b. Web

For desktop applications, testing should take into account the UI, business logic, database, reports, roles and rights, integrity, usability and data flow. For web applications, along with all these major areas; testers should give sufficient importance to performance, load and security of the application. So AUT is either desktop software or a website.

Application Testing Tools

According to the best of my knowledge, there are at least 50 testing tools available in market today. These include both paid and open source tools. Moreover, some tools are purpose specific e.g. UI testing, Functional Testing, DB Testing, Load Testing, Performance, Security Testing and Link validation testing etc. However, some tools are strong and provide the facility of testing several major aspects of an application. The general concept of ‘Application Testing’ is its functional testing. So, our focus will be on functional testing tools.

Here is the list of some most important and fundamental features that are provided by almost all of the ‘Functional Testing’ tools.

a. Record and Play
b. Parametrize the Values
c. Script Editor
d. Run (the test or script, with debug and update modes)
c. Report of Run session

Source: http://www.softwaretestinghelp.com/application-testing-%E2%80%93-into-the-basics-of-software-testing/

Did you like this? Share it: