Vengatc techology logs

Service oriented architecture (SOA) Governance

Posted in Architecture by vengatc on August 22, 2009

SOA Governance

Service oriented architecture  (SOA) , the latest advancement in software architecture and technology enabled architects to embrace the  best of distributed computing and open service architecture.  It allowed disparate teams to operate independently and publish their software artifacts for others to reuse.

The ultimate business objective is achieved by the orchestra between independent artifacts talking to each other to achieve the overall systems object.

Governance

Disadvantages of SOA Governance

Though this orchestra between system increase reusable and increase modularisation , just like any other technology it has bought its own disadvantages with it.

This distributed nature allowed the software to scale to an extent that it is unmanageable. More over when the software artifact is highly distributed the interconnection these artifact have with each other  becomes complex. As the system grows in the SOA paradigm the coupling between independent system also grows. The result is the independent system which originally intended to be independent becomes dependent on each other because of a new problem which is introduced.

The provider  & Consumer issue-  The provider system as it is open architecture allows unidentified consumers to consume their service. Being withing the same organization  the provider cannot modify or discontinue a service untile it is sure that its there are no consumer for the service which it intends to modify. The architecture which intented to advocate independence is actually hypocrate if it also does not have a way to govern the interconnections.

This brings in the need for a new area called SOA Governance.  A way to maintain a repository  of the sharable assets and that providers exposes and a the consumers of the assets. Governance should also give the ablity for the provider to maintain contract for the consumption with the consumers.

Governance is not achievable unless the needed process is introduced into the regular development process. And It should be supported with the needed tools to make it happen in the organization.

An exciting challenge that im designing and driving to get it done for my Organization. Once it is fully embraced by the organization and its people Govrnance becomes an intangible asset to the organization enabling it to fully realize the power of SO Architecture of software design.

Framework for Business Intelligence over RRD files

Posted in Architecture, framework, java, myideas by vengatc on October 31, 2008

Not any of pentaho, Kettle or Talend supports RRD as a datasource for  Business Intelligence(BI). I have designed an Enterprise Information Integration  framework/layer over Multiple RRD datasource. This layer will allow the EII-RRD (the  solution) users to aggregate the data across multiple RRD files and do BI (average/Max/Min) functions on them.


Problem statement:


I will attempt to give a brief description of the problem and the constrains to be considered before deciding a solution for it.

Round Robin Database- is a file base database used to store time and value pair. It is very much used in Network management Solutions where it need to record some values based on the time.  Eg. Network latency every 5 min. The database allows you to store average for various time intervals so that when you updated every 5 min, it automatically updates the hourly or daily averages. Pretty useful in performance management portion of NMS solutions.

Now what is the problem? Yes when you talk of Business Intelligence it is a matter of aggregating data across multiple sources and trying to co-relate to obtain some kind of information which is useful for decision making or analyzing.

So here comes the problem statement- you will have RRD files pertaining to a protocol’s response time for an IP of particular network.  You will have multiple networks like that. So the BI here is given a time frame, grab the average response time of a particular protocol across all machines, in all network.


Constrains to consider before designing.


The problem will be daunting and computational intensive when you consider the time and space complexity. The solution’s main focus is to address memory complexity the second is Time. Memory complexity is must solve and the timing complexity should be reduced to the point were horizontal scalability would kick in when the resource is limited.


Crux of the solution:


Memory- I used the virtual memory concept of design here. I.e. Consider if the user queries like he needs the average graph from 1970-2008 for every 5 min interval imagine the memory that is going to allocated. i.e.. number of 5 mins between 1970 to 2008.


In my design the processing unit will read the time/value pairs from RRDs files and will hand it over to a  Virtual Memory layer. This Virtual Memory layer will promise the processing unit that it has the memory to store all the data (similar to the way the VM in OS does ) but it will allocate memory only if the data is available for that time interval. It is for sure im-material of the user’s request the data will be crowed in around 2008 time frames so the effective memory use will be very less. This kind of Virtual Memory like (promise you have more but do work for less) kind of design is some thing new brought into my design catalogue. It did break the memory issue from from GBs to 1 or 2 MB of usage.


This virtual memory kind of design really helped solve the memory problem, and i loved it and will be using it in my future design.


Time complexity and other aspects of the problem is  noting interesting as I solved it with my usual design experience no new learnings  there.


Whats in it for you?

Blog readers/architects, when ever you have a design for a problem

which allocates a huge chunk of memory in proportional to the input (or some system parameter which has no bounds)  to solve your problem  (and)  When you end up using only portion of it for actually solving the problem because of the distribution characteristics of the input data.


Consider this virtual memory concept of design in your bouquet of design principles. It might help.


For People who are tired of finding open source solutions or proprietary solutions doing BI on RRD , if you wish to get more insight into my solution or want to discuss any aspect of it contact vengateswaran.c@gmail.com . Only technical questions encouraged.

Java Memory and Garbage Collection [GC] – Internals

Posted in Architecture, JVM, Performance, java by vengatc on October 14, 2008

Java 5 has provided architects to scale applications memory wise based on the charactersitics of the application’s memory usage pattern.

Java Garbage collector basics

           The default GC of java is a serial collection GC. i.e when java decides to do the GC your application threads are suspended until the GC thread finishes.

Implications 

           On a single processor machine, this type of GC is good , but on multiple processor machine this is a kill. Imagine  you have your Jboss or IBM WS that runs for banking project , for sure there would be a high hardware investment with muliple processor (not less than 12 processor machine).With this dedicated setup with serial collection your application that ran on 12 processor stops and only one processor is used for the GC activity. Ur applicaitn is in hault. So the throughput of the applicatino is directly impacted by your GC and it worsens with the increase in processors.

So it is a must to customize the GC collections , But remember until you understand the intricasis of the Java Heap and GC dont meddle with the GC collection,leave it to default because a non-expert is more likely to spoil than to increase the throughput.

See the throughput distribution in the below graph..

Java garbage collection design

     What would you do if you are given a chance to decide the GC. IF you have a serial algorithm to sweep to all the objects in the memory and then dealocate the unreferenced objects then the BigO of the algorithm  you design is directly propositonal to the nubmer of objects in the memory. So the time complexity of the algo you design will worsen for larger system.

How Sun Microsystem gets across with this Time complexity issue???

As far as memory conceptions is concerned based on research it is identified that the young object has the highest probability to die first. That means if an object is created recently it is more likely to die first than an object that has survived for a while. Current GC algo efficiently uses this principle of memory usage to product better BigO numbers.

Entire Jave Heap is seggregated into multiple segment to take advantage of this young die first fact.

The figure shows how the heap is seggregated, The entire Heap is seperated into Young, Tenured and Perm space.

GC algo is split into minor and major runs.

Minor run does GC only in the Young space, and major run does GC on Both Young and Tenured space. That is Major run is the maximum time a GC could take and we dont want this to run that often. To avoid doing the major runs , Java GC uses the Young die first fact and runs GC on the Young space. IF the object survives the run it is moved to Tenured. When Tenured is filled then Major run is triggered. This means major run is mostly avoided.

What this implies for architects?

          Intelligently manupulating the young and Tenured size we can inpact the various characteristics of the application.

1. Frequency of GC runs.

2. Time taken for the GC to complete its run.

3. Throughput of the application.

Im not going to explain why it is impacted, readers are expected to understand the relation at this portion of the turtorial.

Java 5 provides you ablity to maniupalte the relative size of the memory segments.

What next?

      Yes i agree the throughput problem of the serial collector is still open. Java 5 has allowd us to tackle this by providing 2 alternative GC Algos to the traditional Serial Collector method.

1. Throughput collector

2.  Concurrent Low Pause Collector

I will attempt to give a short decription of the Above collectors

1. Throughput collector

        The throughput collector is a generational collector similar to the serial collector but with multiple threads used to do the minor collection. The major collections are essentially the same as with the serial collector. By default on a host with N CPUs, the throughput collector uses N garbage collector threads in the minor collection. The number of garbage collector threads can be controlled with a command line option (see below). On a host with 1 CPU the throughput collector will likely not perform as well as the serial collector because of the additional overhead for the parallel execution (e.g., synchronization costs). On a host with 2 CPUs the throughput collector generally performs as well as the serial garbage collector and a reduction in the minor garbage collector pause times can be expected on hosts with more than 2 CPUs.

2. Concurrent Low Pause Collector

The concurrent low pause collector is a generational collector similar to the serial collector. The tenured generation is collected concurrently with this collector. This means the pause in the application is close to nil.

 

 

Aspiring memory manupulators:)

To start with just observe the memory conceptions of your software system. 

java -verbose:gc xyz.jar

[GC 325407K->83000K(776768K), 0.2300771 secs] 
[GC 325816K->83372K(776768K), 0.2454258 secs] 
[Full GC 267628K->83769K(776768K), 1.8479984 secs

Conclusion

Gather enough understanding of the GC behaviour of your system against the hardware. Just remember a system that is best in Single process will be a pain in Multiprocessor.  And the system that is in good in Multiprocessor will be a kill in single. And the effeciency also differs with the applicaitn characterstics.  So leave it to default until you are confortable with the details. 

So GC tuning by architects is a ever on task through out the lifecycle of the project. And it requires practics.

Reference :http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html

As always technical queries alone accepted regarding GC tuning at vengateswaran.c@gmail.com

Framework – Enterprise Information Integration (EII) that I developed

Posted in Uncategorized by vengatc on October 10, 2008

Again a challenge in my current job , here I have a sandwidched role of an architect and a Senior Engineer .I have no scope for giving escuses but crack all the problem that come in the way in achieving what we want.

Requirement

Here we are building a datacenter based project where all the customers internal network information will recide in our company datacenter. So each customer will have his instance of database in the datacenter. Yeah i hear what you say… having instances of databases is a kill.. but we have to choose this so to cut down lot of development effort. So money (Vs) Design huristics??? Money wins.

Problem.

Everything goes good in having a seperate instances of a database until there is no need to aggregate information across multiple databases. We wanted to do BI on the available data. But consider the in numberable databases we have in the datacenter we need a way to run query across databases. I mean a mean to run a same query across databases. I evaluated ETL tools like Kettle, Talend etc. I also tried to hack MYSQL to have a datagreegation layer in easiest way.. But nothing helped, no of the technologies i explored was crisp and clear to my requirement.

In short. I need a tool that takes my query .. aggregates data from mulitple databases, the number of datasources will be increaseing in runtime … and stil have a unified facade kind of simple interface.

Solution

Again as usual… I never give up any oppurtunity in get for writing any innovative frameworks solving business problems. I choose to write the EII(Real time data aggregation framework)from scratch.

(More info on this work… will be updated… got a important work to attend…)

Framework – To do realtime data replication for postgres across firewall

Posted in Architecture, database, java, myideas, replication by vengatc on October 6, 2008

Just wanted to write about a recent framework I developed for database replication with postgres. For readers it would give you an idea to think in this direction if you come across this problem.

Requirement

We had a requirement to replicate data realtime from a postgres database from multiple machines which reside  inside a firewall to a cloud server on the internet.  We evaluated various technologies and tools available on market ,  every solution we came across requires us  to open up port in the firewall. And most of it are not real time.  Most of the tools that we saw in market were ETL kind of tool where you take the data in a batch and replicate it , more over it will not work across  firewall. I was architecting this product , and i have to come up with a solution no matter what.I opted to write my own framework. 

Im a strong believer of build the solution in mind/paper before doing the  code. So i have to develop a replication system that would run onvarious machine and which would replicate data to the central server. 

Im not going to mention the thought process that i put in for each design decision i have taken , but im going to mention what is the end result.

Step 1. I cracked the JDBC libary of postgres, Took the source code from Postgres opensource repository and i read the code flow of the JDBC driver of the postgres.

Static statement Vs Prepared statement… issue.

Java program would use the jdbc libary to construct a static SQL statement or a prepared statement. When it is a static SQL query you have the query in hand. But when it is a prepared statement is it actually inside teh JDBC driver code where the actual Query is prepared before sending to the native methods to postgres. 

I had figured out a place where the entire query leaves the JDBC diver to the native funtions to the database. There i have written a queue to sniff all the querys that leaves the system.

For technical queries regarding sniffing the query from driver  write 2 vengateswaran.c@gmail.com

Step 2. Now that I have a queue of the sniffed query i have to ship it across to the server which is across firewall. So WebService comes to resque here. I published a webservice at the Server to accept query and the client identifier and replicate create the connection and issue the query to the database.

Step 3: So have written a engine that woudl take the queued query at the client end and ship the query to the server across across firewall through webservice. And the server end of the webservice would fire the same query on the server end. 

Multiple client (Master) postgres databases were able to replicate real time data to a single database cluster on the server.

Very high level design.

After end of regrous design and implementation and performance testing, the framework that I designed and implemented effeciently replicates databases from multiple machines into the cloud server across firewall. It really scales up well….to make me happy.

Feel free to contact me[ vengateswaran.c@gmail.com] if you need more insight on the technical aspects of the framework. Only technical queries invited.

Tagged with:

JVM instrumentation – Performance tuning

Posted in Architecture, JVM, Performance, java by vengatc on October 3, 2008

Performance monitoring

Sun has done an excelent work in integrating remote management into JVM. They have build in SNMP Agent like  capablity into the JVM. The SNMP OID is synonomous to the MBeans and SNMP MIB is synonomous to the MBean Server.

This beautiful capability allows you to connect to the JVM and monitor the crime we have done in the code with regard to the run time memory/CPU and thread utilization. I became a lover of this feature. I always thought to build this capability into the applications we build. This feature is a  real bliss for architects.

Head on Jump

To have a first hand experience with the JVM instrumentation , just follow the simple steps…

1. Have Java 5.0 installed in your system and create an java program that runs until you forcefully stop it. If you have a framework with threadpools, resource management etc.. it is a good example.

2. Run your java program with this additional parameter.

Java -Dcom.sun.management.jmxremote  xyz.jar

This command publishes the MBServer in the JVM as a RMI resource for the Jconsole to connect to.

3. Start Jconsole and connect to the Program in your connection dialogue box.

Jconsole will alow you to monitor the Memory used and the threads used. Identify deadlocks etc etc..

See the peformance scaling for an Enterprise Information Integration framework that i recently wrote.. This is a heavy ETL and EII kind of tool that aggregate data from multiple databases. You would difinitly need to have some performance numbers for its production run.  jconsole tool would help you prove the roboustness your ur framework by showing the memory and thread occupancy and how controlled they are admist heavy load on the framework.

Tagged with:

Flex and Cairngrom Architecture

Posted in Architecture, Flex, java by vengatc on October 3, 2008

Developing RIA with Flex fun..

What and why is RIA?

Like fashion industry technology also swings back and forth. Initial when client server architecture was introduced industry started moving in the direction of developing desktop based rich applicaiton which talks to the server. But later architecture matured and industry wanted many clients to access the application so it rediculously called the desktop application as thickclient. and industry stared moving in the direction of thin client funda like HTML and other dynamic web technologies. Now again industry started missing the richness in the traditional destop based thinkclient , and it also got bored with the HTML based request response model. This gave the birth of RIA… FLash and silverlight.

Their aim is to develop thin-think client that can run in browser and still hide all request response boaring stuff that webclients were experiencing.

Flex was the fourrunner in t his market.

Flex development nightmares?

Developing application with flex builder a sample flex program will all be a easy going task . But the real nightmare kicks in when the applicaiton grows big. When you want to build a real production system out of it. The front end MXML becomes really messy. You will have a single monolythic mxml file where we have to keep writing actionscripts and mxml. We can seperate actionscript into seperate file and mxml into multipel file but you will hit a point were you cry for framework.

Cairngrom would be your rescue.

What is cairngorm?

Cairngrom tries to bring in the tradional software engineering best practises in to the Flex applicaiton development.

It is a way you fit in your

1. Business Delegate

2. Service locator

3. Command

4. Model Locator.

5. Front controller

Kind of patterns in to your flex development. The time to adopt cairngorm frame is bit more but once it is done and when we start fitting in pieces it really pays of.

Following picture gives a architecture of a cairngrom based flex application.

For more details on cairngrom follow some of my favourite links below.

http://www.cairngormdocs.org/cairngormDiagram/index.html

http://www.onjava.com/pub/a/onjava/2003/02/26/flash_remoting.html?CMP=AFC-ak_article&ATT=Flash+Remoting+for+J2EE+Developers

http://sujitreddyg.wordpress.com/2008/01/14/invoking-java-methods-from-adobe-flex/

http://livedocs.adobe.com/blazeds/1/blazeds_devguide/blazeds_devguide.pdf

http://sujitreddyg.wordpress.com/2008/05/16/session-data-management-in-flex-remoting/

http://www.brucephillips.name/blog/index.cfm/2008/6/23/Using-BlazeDS-to-Send-UserDefined-Data-Types-Data-Tranfer-Objects-from-Java-to-Flex

http://renaun.com/blog/2006/07/04/55/

http://www.adobe.com/devnet/flex/articles/cairngorm_pt4_05.html

http://examples.adobe.com/flex3/componentexplorer/explorer.html

Experience with chrome

Posted in reviews by vengatc on September 30, 2008

Like most of the people im also an adrenet fan of google and its tools.When chrom was released i tried it the first day.

My first impression.

1. Great screen space.

2. Super look and feel.

3. Elimination of not frequently used icons from occupying the space for the web page.

I will rate it top in the ergonomics compared to other browsers.

Performance.

1. It sucks in CPU usage. Im disappointed to see the browser take the entire CPU resource. How could Google program the browser this way???  Cant it manage the CPU resource, the most important one before releasing?? We know most of the time firefox hog the CPU and same does the IE. When google released  Chrom is  should have taken care of cpu overload issue with its programming.

Im back to firefox. And  i wish chrom resolve its cpu hog issue in the next release for me to adopt it back in my pc.

I can resolve the chrom hog issue for you google…. pay me for that :)

Agile development process with SCRUM technique

Posted in process by vengatc on September 30, 2008

Agile development process

Principles

1. Indiviuals and interactions over process and tools.

2. Working software over comprehensive documents.

3. Customer collobration over contract negotiation.

4. Responding to change over Following a plan.

Process

                        Agile is a customized RUP (IECT) methology but with the above principles to be the guidlines. Scrum is the practicle implementation technique for Agile methology.

Scrum involves.

1. Project split into multiple sprints [Time frame directed]

2. Project has baglog items.

3. Each sprint has a chosen highpriorty baglog items to deliver.

4. Customers are alowed to updated the baglog items.

5. Custoemr contract delivery based pay model. not deliver the product as a hole.

6. Sprint planning meeting with manager and stakeholders(Chickens) and developers(Pigs) deciding the items in teh baglog tha tis goign to be developed in the sprint.

7. After spring palning meeting work units are  formed by developers based on the baglogs.

8. Daily there is a scrum meeting where each developer says what he has done.

9. New work unit generation is accepted and the new unit may be commited or left uncommited.

10. Each work unit is  2 hours of work item

 

Burn down chart

Burn down chart

 

 

11. Each sprint is give for test for testing (Test driven development ).

 

 

 

 

Reference

http://www.itwales.com/998612.htm

http://www.objectmentor.com/resources/articles/agileProcess.pdf

ApacheDS and Windows

Posted in LDAP, windows by vengatc on August 22, 2008

1. Download apache DS from http://directory.apache.org/apacheds/1.0/

2. Creating a partition in ApacheDS doesnt work well. as shown in http://directory.apache.org/apacheds/1.0/14-basic-configuration-tasks.html#1.4.Basicconfigurationtasks-Addingyourownpartitionresp.suffix

3. So edit example entry with the entry you want.

4. Use JXplorer to create organizations.