Vengatc techology logs

SQL Performance debugging – Query Execution plan oracle

Posted in Uncategorized by vengatc on March 26, 2013

explain plan SET STATEMENT_ID=’bad3′ for select * from v_carrier_test where c_id=2;

SELECT cardinality “Rows”,

lpad(' ',level-1)||operation||' '||
      options||' '||object_name "Plan"
 FROM PLAN_TABLE

CONNECT BY prior id = parent_id

AND prior statement_id = statement_id
 START WITH id = 0
       AND statement_id = 'bad3'
 ORDER BY id;

What should Google do next with its GoogleVoice to be a game changer?

Posted in Uncategorized by vengatc on January 10, 2012

Will update my ideas here.., wait for it.

OPOA and RIA – What does it mean?

Posted in Architecture by vengatc on June 14, 2011

Recently there was a situation in my work place where we need to standarize or adopt patterns for UI.  And OPOA RIA was taken as a guidline after my recommendation, but  adopting this patterns literally is an offense you can do to the development unless the team totally understand what is conveyed by these terminologies…

Following was my explanation , im documenting it for it can be of use for other people who hit the same road….

[Explanation]

What we are actually trying to do with these terminologies like RIA and OPOA is to get away from the traditional perspective of web development. We are using these terminologies as a guideline to help us validate if we are doing the right thing.

The crux is  that the web development strategy and technologies has evolved and hence the way we think about building the interfaces should also change.

The early  terminologies the industry used  were think and thin client, Desktop was considered thick because of its richness and the amount of  business logic  it carried with. The thin client were the once with  page by page flow where we sacrificed on the richness that a desktop application would offer for the sake of distributablity. Then market wanted distributablity and also the richness that the desktop client could offer, hence technologies like flex/flash and Silverlight were born.

 

But  flex and flash were proprietary  technology and users of those technologies were at the mercy of the browser implementers (safari or IOS) people wanted to stick with standards that would work across all hardware/OS. Hence efforts were spent to have richness without sacrificing the distributablity , framework like YUI ( Yahoo user interface) was a result of it. YUI was  built on top of the standardized JavaScript and HTML which works on all browsers . ExtJS is an matured version of YUI , which helps use built the desktop like richness on the web without sacrificing the distributablity and it stands on standards (HTML & JS).

 

ExtJS was build to help people develop RIA (Rich Internet Application a.k.a Desktop like application on Browsers) as opposed to traditional web1.0 pages.  ExtJS  becomes a terrible/counter- productive tool if we look at it from the perspective of web flow and web1.o page flow….

 

Hence Rich Internet Application / [One Page One Application]  are umbrella terms used for the UI designers/engineers to think web development in the terms of desktop application and have multiple functionality and actions in a single page and bring back the richness that desktop applications offered.

 

> it means we should only have one form (means one browser windows) for the whole application. Venky, am I right?

So in OPOA , What we call as application is subjective. You may choose to call the entire IRIS4 as one application, but someone else can choose to call CnC alone as a application and someone else can Choose to call a  group of command as one application.

More granular we go we tend to hit the web1.0 , at the highest form  we may lose multitasking ability, So we need to strike a balance  in between two and it is totally subjective.

So if we are able to do multiple things on a single page without refresh, and have the look and feel feels close to a desktop application we are in abiding to RIA and OPOA.

 

If we need a guideline/rule to check our action we can ask this question every time we have a doubtà “Will a desktop application do it this way?”

So in this case- opening multiple window…. : In that particular scenario will a desktop application open a new window to show the application functionality, if yes. Then I think we are good.

 [/Explanation]

NP-Complete problem’s Fun.

Posted in Performance by vengatc on May 22, 2011

Recently i had a discussion with my old college friends on Facebook, and we were posting technical questions that would tickle our brain and  help us re-live our college days…

Following was my questions…. Posting it here  as it might help the reader of my blog understand one  beautiful characteristic of  NP Complete problem.

Q: There are sets of problems, if i we find solution to one of them then we have solutions to all of them, what is the technical name for these problem class?

There were multiple reply to it, one interesting one was design pattern…. Then the discussion went into clue (TSP) then we arrived at NP-Complete. Then the discussion went further into exponential Big-O time   etc… But I later sensed the wow factor the question had was gone as part of our discussions… I did some study to bring back the wow factor the question had. Following was my reply…

[Reply]

The real WOW factor that I had when I read these things in my college days is, “WHAT!!!!!! If I solve (in computer terms produce an effective Algo) for one of the problem in the problem class I can solve all of them? How nerdy would be the finder of such things be??? ” [I doubt if I have this level of curiosity now:) ]

These are the current characteristics of NP-Complete problems.
1) There is no known ALGO which can complete in polynomial time (in computational terms no BEST solutions ie. Best in the order of N….- >LogN

2) If we can find solution (Algo –reasonable solution) to one of them you can solve All of them. (the real YAHOO!!! is here!!!}

It means a one NP-Complete problem can be transformed into another NP Complete problem in a polynomial time, isn’t that great. That is what the theorem claims.

It means if I solve one of the NP complete problems I can solve all from the NP Complete list…. http://en.wikipedia.org/wiki/List_of_NP-complete_problems . In other words if I solve the NP complete version of Traveling Sales Person problem I can transform all the other NP complete problems to traveling sales person problem and solve it in polynomial time.
i.e … TSP == Solvable in polynomial time. Then I have solution (best Algo) to NP Complete version of Hamilton Path problem , which no one else has…. Because Hamilton Cycle Solution = Transform2TSP (Hamilton Cycle) (polynomial time) + Solve TSP = (Hence NP complete Hamilton cycle problem is solved in polynomial time) …

To state it practically….if the Subset problem that vels mentioned is NP Complete then it can be converted to the corresponding TSP version in polynomial time and viseversa. (Yahoo!!!!!!!) . And if i have an solution (again a solution for computer programmers is good algo to solve the problem) to the subset problem i can solve TSP…

To make it more fun… see this picture 

My bet on Technology’s future

Posted in Uncategorized by vengatc on April 12, 2010

Telecom:
Voice plans will be optional. Data plan will be what people pay for. Roi on Circuit switch Telecom network will be in negatives and packet networks will be a sharp increase. All voice will be voip.
New Telecom service providers will be competing with traditional umts or cdma service providers with cheap wimax plans. Mobile phones will run on wimax and will be purely data.

Operating systems and gadgets:

Android will be ubiquitous on devices like windows on desktops. Apple iPhone OS will less than half of android market share.Android will run on TV sets, home appliances etc. Even cars will run android in them. Users will have gadget independent, meaning u mobile,TV,car will show chat notification, event notification, incoming call etc. User will be identified not by number but by profile it.
A user can travel any where in the word after checkin into an hotel all he has to do is login With his profile I’d. TV, appliance, car etc will behave personally like his home.

Entertainment:
On demand will be more common than today. User will be able to save his preferred show schedule to his profile and retrieve it in any TV after login.
Mobile will have similar content like TV. Cables will be a distant past. TV will also run on IP. There will be internet channel providers like Comcast who does it on cables and circuit.

Service oriented architecture (SOA) Governance

Posted in Architecture by vengatc on August 22, 2009

SOA Governance

Service oriented architecture  (SOA) , the latest advancement in software architecture and technology enabled architects to embrace the  best of distributed computing and open service architecture.  It allowed disparate teams to operate independently and publish their software artifacts for others to reuse.

The ultimate business objective is achieved by the orchestra between independent artifacts talking to each other to achieve the overall systems object.

Governance

Downside of SOA

Though this orchestra between system increase reusable and increase modularisation , just like any other technology it has bought its own disadvantages with it.

This distributed nature allowed the software to scale to an extent that it is unmanageable. More over when the software artifact is highly distributed the interconnection these artifact have with each other  becomes complex. As the system grows in the SOA paradigm the coupling between independent system also grows. The result is the independent system which originally intended to be independent becomes dependent on each other because of a new problem which is introduced.

The provider  & Consumer issue-  The provider system as it is open architecture allows unidentified consumers to consume their service. Being withing the same organization  the provider cannot modify or discontinue a service untile it is sure that its there are no consumer for the service which it intends to modify. The architecture which intented to advocate independence is actually hypocrate if it also does not have a way to govern the interconnections.

This brings in the need for a new area called SOA Governance.  A way to maintain a repository  of the sharable assets and that providers exposes and a the consumers of the assets. Governance should also give the ablity for the provider to maintain contract for the consumption with the consumers.

Governance is not achievable unless the needed process is introduced into the regular development process. And It should be supported with the needed tools to make it happen in the organization.

An exciting challenge that im designing and driving to get it done for my Organization. Once it is fully embraced by the organization and its people Govrnance becomes an intangible asset to the organization enabling it to fully realize the power of SO Architecture of software design.

Framework for Business Intelligence over RRD files

Posted in Architecture, framework, java, myideas by vengatc on October 31, 2008

Not any of pentaho, Kettle or Talend supports RRD as a datasource for  Business Intelligence(BI). I have designed an Enterprise Information Integration  framework/layer over Multiple RRD datasource. This layer will allow the EII-RRD (the  solution) users to aggregate the data across multiple RRD files and do BI (average/Max/Min) functions on them.


Problem statement:


I will attempt to give a brief description of the problem and the constrains to be considered before deciding a solution for it.

Round Robin Database- is a file base database used to store time and value pair. It is very much used in Network management Solutions where it need to record some values based on the time.  Eg. Network latency every 5 min. The database allows you to store average for various time intervals so that when you updated every 5 min, it automatically updates the hourly or daily averages. Pretty useful in performance management portion of NMS solutions.

Now what is the problem? Yes when you talk of Business Intelligence it is a matter of aggregating data across multiple sources and trying to co-relate to obtain some kind of information which is useful for decision making or analyzing.

So here comes the problem statement- you will have RRD files pertaining to a protocol’s response time for an IP of particular network.  You will have multiple networks like that. So the BI here is given a time frame, grab the average response time of a particular protocol across all machines, in all network.


Constrains to consider before designing.


The problem will be daunting and computational intensive when you consider the time and space complexity. The solution’s main focus is to address memory complexity the second is Time. Memory complexity is must solve and the timing complexity should be reduced to the point were horizontal scalability would kick in when the resource is limited.


Crux of the solution:


Memory- I used the virtual memory concept of design here. I.e. Consider if the user queries like he needs the average graph from 1970-2008 for every 5 min interval imagine the memory that is going to allocated. i.e.. number of 5 mins between 1970 to 2008.


In my design the processing unit will read the time/value pairs from RRDs files and will hand it over to a  Virtual Memory layer. This Virtual Memory layer will promise the processing unit that it has the memory to store all the data (similar to the way the VM in OS does ) but it will allocate memory only if the data is available for that time interval. It is for sure im-material of the user’s request the data will be crowed in around 2008 time frames so the effective memory use will be very less. This kind of Virtual Memory like (promise you have more but do work for less) kind of design is some thing new brought into my design catalogue. It did break the memory issue from from GBs to 1 or 2 MB of usage.


This virtual memory kind of design really helped solve the memory problem, and i loved it and will be using it in my future design.


Time complexity and other aspects of the problem is  noting interesting as I solved it with my usual design experience no new learnings  there.


Whats in it for you?

Blog readers/architects, when ever you have a design for a problem

which allocates a huge chunk of memory in proportional to the input (or some system parameter which has no bounds)  to solve your problem  (and)  When you end up using only portion of it for actually solving the problem because of the distribution characteristics of the input data.


Consider this virtual memory concept of design in your bouquet of design principles. It might help.


For People who are tired of finding open source solutions or proprietary solutions doing BI on RRD , if you wish to get more insight into my solution or want to discuss any aspect of it contact vengateswaran.c@gmail.com . Only technical questions encouraged.

Java Memory and Garbage Collection [GC] – Internals

Posted in Architecture, java, JVM, Performance by vengatc on October 14, 2008

Java 5 has provided architects to scale applications memory wise based on the charactersitics of the application’s memory usage pattern.

Java Garbage collector basics

           The default GC of java is a serial collection GC. i.e when java decides to do the GC your application threads are suspended until the GC thread finishes.

Implications 

           On a single processor machine, this type of GC is good , but on multiple processor machine this is a kill. Imagine  you have your Jboss or IBM WS that runs for banking project , for sure there would be a high hardware investment with muliple processor (not less than 12 processor machine).With this dedicated setup with serial collection your application that ran on 12 processor stops and only one processor is used for the GC activity. Ur applicaitn is in hault. So the throughput of the applicatino is directly impacted by your GC and it worsens with the increase in processors.

So it is a must to customize the GC collections , But remember until you understand the intricasis of the Java Heap and GC dont meddle with the GC collection,leave it to default because a non-expert is more likely to spoil than to increase the throughput.

See the throughput distribution in the below graph..

Java garbage collection design

     What would you do if you are given a chance to decide the GC. IF you have a serial algorithm to sweep to all the objects in the memory and then dealocate the unreferenced objects then the BigO of the algorithm  you design is directly propositonal to the nubmer of objects in the memory. So the time complexity of the algo you design will worsen for larger system.

How Sun Microsystem gets across with this Time complexity issue???

As far as memory conceptions is concerned based on research it is identified that the young object has the highest probability to die first. That means if an object is created recently it is more likely to die first than an object that has survived for a while. Current GC algo efficiently uses this principle of memory usage to product better BigO numbers.

Entire Jave Heap is seggregated into multiple segment to take advantage of this young die first fact.

The figure shows how the heap is seggregated, The entire Heap is seperated into Young, Tenured and Perm space.

GC algo is split into minor and major runs.

Minor run does GC only in the Young space, and major run does GC on Both Young and Tenured space. That is Major run is the maximum time a GC could take and we dont want this to run that often. To avoid doing the major runs , Java GC uses the Young die first fact and runs GC on the Young space. IF the object survives the run it is moved to Tenured. When Tenured is filled then Major run is triggered. This means major run is mostly avoided.

What this implies for architects?

          Intelligently manupulating the young and Tenured size we can inpact the various characteristics of the application.

1. Frequency of GC runs.

2. Time taken for the GC to complete its run.

3. Throughput of the application.

Im not going to explain why it is impacted, readers are expected to understand the relation at this portion of the turtorial.

Java 5 provides you ablity to maniupalte the relative size of the memory segments.

What next?

      Yes i agree the throughput problem of the serial collector is still open. Java 5 has allowd us to tackle this by providing 2 alternative GC Algos to the traditional Serial Collector method.

1. Throughput collector

2.  Concurrent Low Pause Collector

I will attempt to give a short decription of the Above collectors

1. Throughput collector

        The throughput collector is a generational collector similar to the serial collector but with multiple threads used to do the minor collection. The major collections are essentially the same as with the serial collector. By default on a host with N CPUs, the throughput collector uses N garbage collector threads in the minor collection. The number of garbage collector threads can be controlled with a command line option (see below). On a host with 1 CPU the throughput collector will likely not perform as well as the serial collector because of the additional overhead for the parallel execution (e.g., synchronization costs). On a host with 2 CPUs the throughput collector generally performs as well as the serial garbage collector and a reduction in the minor garbage collector pause times can be expected on hosts with more than 2 CPUs.

2. Concurrent Low Pause Collector

The concurrent low pause collector is a generational collector similar to the serial collector. The tenured generation is collected concurrently with this collector. This means the pause in the application is close to nil.

 

 

Aspiring memory manupulators:)

To start with just observe the memory conceptions of your software system. 

java -verbose:gc xyz.jar

[GC 325407K->83000K(776768K), 0.2300771 secs] 
[GC 325816K->83372K(776768K), 0.2454258 secs] 
[Full GC 267628K->83769K(776768K), 1.8479984 secs

Conclusion

Gather enough understanding of the GC behaviour of your system against the hardware. Just remember a system that is best in Single process will be a pain in Multiprocessor.  And the system that is in good in Multiprocessor will be a kill in single. And the effeciency also differs with the applicaitn characterstics.  So leave it to default until you are confortable with the details. 

So GC tuning by architects is a ever on task through out the lifecycle of the project. And it requires practics.

Reference :http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html

As always technical queries alone accepted regarding GC tuning at vengateswaran.c@gmail.com

Framework – Enterprise Information Integration (EII) that I developed

Posted in Uncategorized by vengatc on October 10, 2008

Again a challenge in my current job , here I have a sandwidched role of an architect and a Senior Engineer .I have no scope for giving escuses but crack all the problem that come in the way in achieving what we want.

Requirement

Here we are building a datacenter based project where all the customers internal network information will recide in our company datacenter. So each customer will have his instance of database in the datacenter. Yeah i hear what you say… having instances of databases is a kill.. but we have to choose this so to cut down lot of development effort. So money (Vs) Design huristics??? Money wins.

Problem.

Everything goes good in having a seperate instances of a database until there is no need to aggregate information across multiple databases. We wanted to do BI on the available data. But consider the in numberable databases we have in the datacenter we need a way to run query across databases. I mean a mean to run a same query across databases. I evaluated ETL tools like Kettle, Talend etc. I also tried to hack MYSQL to have a datagreegation layer in easiest way.. But nothing helped, no of the technologies i explored was crisp and clear to my requirement.

In short. I need a tool that takes my query .. aggregates data from mulitple databases, the number of datasources will be increaseing in runtime … and stil have a unified facade kind of simple interface.

Solution

Again as usual… I never give up any oppurtunity in get for writing any innovative frameworks solving business problems. I choose to write the EII(Real time data aggregation framework)from scratch.

(More info on this work… will be updated… got a important work to attend…)

Framework – To do realtime data replication for postgres across firewall

Posted in Architecture, database, java, myideas, replication by vengatc on October 6, 2008

Just wanted to write about a recent framework I developed for database replication with postgres. For readers it would give you an idea to think in this direction if you come across this problem.

Requirement

We had a requirement to replicate data realtime from a postgres database from multiple machines which reside  inside a firewall to a cloud server on the internet.  We evaluated various technologies and tools available on market ,  every solution we came across requires us  to open up port in the firewall. And most of it are not real time.  Most of the tools that we saw in market were ETL kind of tool where you take the data in a batch and replicate it , more over it will not work across  firewall. I was architecting this product , and i have to come up with a solution no matter what.I opted to write my own framework. 

Im a strong believer of build the solution in mind/paper before doing the  code. So i have to develop a replication system that would run onvarious machine and which would replicate data to the central server. 

Im not going to mention the thought process that i put in for each design decision i have taken , but im going to mention what is the end result.

Step 1. I cracked the JDBC libary of postgres, Took the source code from Postgres opensource repository and i read the code flow of the JDBC driver of the postgres.

Static statement Vs Prepared statement… issue.

Java program would use the jdbc libary to construct a static SQL statement or a prepared statement. When it is a static SQL query you have the query in hand. But when it is a prepared statement is it actually inside teh JDBC driver code where the actual Query is prepared before sending to the native methods to postgres. 

I had figured out a place where the entire query leaves the JDBC diver to the native funtions to the database. There i have written a queue to sniff all the querys that leaves the system.

For technical queries regarding sniffing the query from driver  write 2 vengateswaran.c@gmail.com

Step 2. Now that I have a queue of the sniffed query i have to ship it across to the server which is across firewall. So WebService comes to resque here. I published a webservice at the Server to accept query and the client identifier and replicate create the connection and issue the query to the database.

Step 3: So have written a engine that woudl take the queued query at the client end and ship the query to the server across across firewall through webservice. And the server end of the webservice would fire the same query on the server end. 

Multiple client (Master) postgres databases were able to replicate real time data to a single database cluster on the server.

Very high level design.

After end of regrous design and implementation and performance testing, the framework that I designed and implemented effeciently replicates databases from multiple machines into the cloud server across firewall. It really scales up well….to make me happy.

Feel free to contact me[ vengateswaran.c@gmail.com] if you need more insight on the technical aspects of the framework. Only technical queries invited.

Tagged with:
Follow

Get every new post delivered to your Inbox.