Difference between revisions of "Ignacio Valdes Implementation Log/Episode8"

From VistApedia
Jump to: navigation, search
m (navigation links)
(Readability edits)
Line 4: Line 4:
 
Ignacio Valdes  Date: Thu, 14 Aug 2008 17:49:15 -0500
 
Ignacio Valdes  Date: Thu, 14 Aug 2008 17:49:15 -0500
  
Hello all, We had a power outage with the server going totally down.
+
Hello all, We had a power outage with the server going totally down. Here is a terminal session dump of what we had to do to get the taskman re-started. The only thing that isn't as it is below is that one enters 't' and gets taskman up and running. This shows several useful commands such as how to find out where gtm exists as well as showing how I was in the right id but in the wrong user space /home/ivaldes instead of being in /home/vista/EHR
Here is a terminal session dump of what we had to do to get the
 
taskman re-started. The only thing that isn't as it is below is that
 
one enters 't' and gets taskman up and running. This shows several
 
useful commands such as how to find out where gtm exists as well as
 
showing how I was in the right id but in the wrong user space
 
/home/ivaldes instead of being in /home/vista/EHR
 
  
 +
<pre>
 
login as: ivaldes
 
login as: ivaldes
 
ivaldes@IP address password:
 
ivaldes@IP address password:
Line 164: Line 159:
 
GTM>h
 
GTM>h
 
[vista@vista logs]$
 
[vista@vista logs]$
 +
</pre>
  
 
K.S. Bhaskar Date: Thu, 14 Aug 2008 20:10:11 -0400
 
K.S. Bhaskar Date: Thu, 14 Aug 2008 20:10:11 -0400
Line 169: Line 165:
 
Ignacio --
 
Ignacio --
  
You really ought to consider journaling.  See how it's set up on the
+
You really ought to consider journaling.  See how it's set up on the latest Toasters, for example, and see how simple it is.  The Toaster has a small shell script that automatically recovers the database from the journal file on boot up and even starts up Taskman.  Of course, if you like to practice typing... 8-)
latest Toasters, for example, and see how simple it is.  The Toaster has
 
a small shell script that automatically recovers the database from the
 
journal file on boot up and even starts up Taskman.  Of course, if you
 
like to practice typing... 8-)
 
  
 
Regards
 
Regards
Line 180: Line 172:
 
I, Valdes  Date: Fri, 15 Aug 2008 06:15:26 -0700 (PDT)
 
I, Valdes  Date: Fri, 15 Aug 2008 06:15:26 -0700 (PDT)
  
Many years as a software engineer before medical school ruined the joy
+
Many years as a software engineer before medical school ruined the joy of typing as well as video games for me... Can you please post the script to this thread? -- IV
of typing as well as video games for me... Can you please post the
 
script to this thread? -- IV
 
 
 
 
K.S. Bhaskar  Date: Fri, 15 Aug 2008 09:50:11 -0400
 
K.S. Bhaskar  Date: Fri, 15 Aug 2008 09:50:11 -0400
Line 188: Line 178:
 
Ignacio --
 
Ignacio --
  
You can adapt the following to your needs.  You will need to turn on
+
You can adapt the following to your needs.  You will need to turn on before-image journaling.
before-image journaling.
 
  
The script /etc/init.d/wvehrvoe10 is automatically executed by the
+
The script /etc/init.d/wvehrvoe10 is automatically executed by the system when it is booted or shut down:
system when it is booted or shut down:
+
----
------------------------------------------------------------------------
+
<pre>
<pre>
 
 
#! /bin/bash
 
#! /bin/bash
 
### BEGIN INIT INFO
 
### BEGIN INIT INFO
Line 251: Line 239:
  
 
:
 
:
</pre>
+
</pre>
------------------------------------------------------------------------
+
----
  
 
It calls the script /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstart to
 
It calls the script /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstart to
Line 258: Line 246:
 
starts Taskman, and removes journal files that are more than three days
 
starts Taskman, and removes journal files that are more than three days
 
old (this is for a demo; adjust to your needs):
 
old (this is for a demo; adjust to your needs):
------------------------------------------------------------------------
+
----
 
  <pre>
 
  <pre>
 
#!/bin/bash
 
#!/bin/bash
Line 269: Line 257:
 
find g -iname mumps.mjl_* -mtime +3 -exec rm -v {} \;
 
find g -iname mumps.mjl_* -mtime +3 -exec rm -v {} \;
 
  </pre>
 
  </pre>
------------------------------------------------------------------------
+
----
  
 
The script /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstop stops Taskman and
 
The script /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstop stops Taskman and
 
attempts a clean shut down (not always possible):
 
attempts a clean shut down (not always possible):
------------------------------------------------------------------------
+
----
 
  <pre>
 
  <pre>
 
#!/bin/bash
 
#!/bin/bash
Line 287: Line 275:
 
2>/dev/null
 
2>/dev/null
 
  </pre>
 
  </pre>
------------------------------------------------------------------------
+
----
  
 
I use a small script /opt/wvehrvoe10/gtm_V5.3-001_i686/env to set
 
I use a small script /opt/wvehrvoe10/gtm_V5.3-001_i686/env to set
 
environment variables:
 
environment variables:
------------------------------------------------------------------------
+
----
 
  <pre>
 
  <pre>
 
# env - file to be sourced to create VistA environment
 
# env - file to be sourced to create VistA environment
Line 323: Line 311:
 
export gtmroutines="$routines $gtm_dist"
 
export gtmroutines="$routines $gtm_dist"
 
  </pre>
 
  </pre>
------------------------------------------------------------------------
+
----
  
The net of this is that when the Toaster boots, the database is
+
The net of this is that when the Toaster boots, the database is recovered, and Taskman started.  It doesn't matter whether the system was shut down cleanly or whether it crashed.  I suggest that production VistA environments, especially in non-ASP environments, be set up along the lines of the Toaster.
recovered, and Taskman started.  It doesn't matter whether the system
 
was shut down cleanly or whether it crashed.  I suggest that production
 
VistA environments, especially in non-ASP environments, be set up along
 
the lines of the Toaster.
 
  
 
Regards
 
Regards
Line 337: Line 321:
 
Nancy Anthracite  Date: Fri, 15 Aug 2008 10:30:47 -0400
 
Nancy Anthracite  Date: Fri, 15 Aug 2008 10:30:47 -0400
  
Note that that using the script to start and stop VistA itself is not
+
Note that that using the script to start and stop VistA itself is not recommended.
recommended.
 
  
The menu system should be used for starting the system, and if you insist on
+
The menu system should be used for starting the system, and if you insist on using a script, Expect would be preferable as it would use the menu system. Currently AND the correct routine that runs with the option that is used for Taskman in the Menu system is  RESTART^ZTMB.
using a script, Expect would be preferable as it would use the menu system.
 
Currently AND the correct routine that runs with the option that is used for
 
Taskman in the Menu system is  RESTART^ZTMB.
 
  
By using the menu system, you know as best as is possible that patches and
+
By using the menu system, you know as best as is possible that patches and checks and balances will be taken into account.
checks and balances will be taken into account.
 
  
There is a similar startup routine that directly calls routines for starting
+
There is a similar startup routine that directly calls routines for starting VistA for use with Cache circulating.   
VistA for use with Cache circulating.   
 
  
Doing things the "easy way" looks great when you want to do a demo, but for
+
Doing things the "easy way" looks great when you want to do a demo, but for productions systems, think seriously about using the menu system.  You can consolidate several items in the menu system into one menu if that would make it easier for you, but please don't circumvent the checks and balances.
productions systems, think seriously about using the menu system.  You can
 
consolidate several items in the menu system into one menu if that would make
 
it easier for you, but please don't circumvent the checks and balances.
 
 
--  
 
--  
 
Nancy Anthracite
 
Nancy Anthracite
Line 362: Line 337:
 
Nancy --
 
Nancy --
  
Whether for production or for demo purposes, the reason to script
+
Whether for production or for demo purposes, the reason to script Taskman startup is to facilitate the packaging of VistA as an appliance.  Are you saying that the wvehrstart script should use RESTART^ZTMB instead of START^ZTMB?
Taskman startup is to facilitate the packaging of VistA as an
 
appliance.  Are you saying that the wvehrstart script should use
 
RESTART^ZTMB instead of RESTART^ZTMB?
 
  
 
Regards
 
Regards
Line 381: Line 353:
 
Bhaskar,
 
Bhaskar,
  
I was looking through this script.  It looks to me like you are
+
I was looking through this script.  It looks to me like you are preloading responses for the mumps routine.  I was trying to figure out how to do this a year ago and never got a good answer.
preloading responses for the mumps routine.  I was trying to figure
 
out how to do this a year ago and never got a good answer.
 
  
So what are you doing here?  It looks like you are redirecting
+
So what are you doing here?  It looks like you are redirecting standard input.  What does that EOF do?
standard input.  What does that EOF do?
 
  
 
Thanks
 
Thanks
 
Kevin
 
Kevin
  
 +
<pre>
 
#!/bin/bash
 
#!/bin/bash
 
cd `dirname $0`
 
cd `dirname $0`
Line 402: Line 372:
 
ps -ef | grep mumps | grep -v grep | awk '{print $2}' | xargs kill
 
ps -ef | grep mumps | grep -v grep | awk '{print $2}' | xargs kill
 
2>/dev/null
 
2>/dev/null
+
</pre>
 
 
 
K.S. Bhaskar  Date: Fri, 15 Aug 2008 23:42:49 -0400
 
K.S. Bhaskar  Date: Fri, 15 Aug 2008 23:42:49 -0400
Line 409: Line 379:
 
command such as:
 
command such as:
  
 +
<pre>
 
grvb -mbg kvtz <<GLZNOP
 
grvb -mbg kvtz <<GLZNOP
 
oinad
 
oinad
 
mnjbz
 
mnjbz
 
GLZNOP
 
GLZNOP
 +
</pre>
  
it means run the command grvb -mbg kvtz, and as its STDIN (standard
+
it means run the command grvb -mbg kvtz, and as its STDIN (standard input) feed the lines oinad and mnjbz.  The GLZNOP on the command line tells it the marker to look for, and the GLZNOP on a line by itself is a marker that says no more input is available for the command.  EOF is just slightly more readable to programmers than GLZNOP, but the shell doesn't care - it just matches the word after the << and the word on a line by itself.
input) feed the lines oinad and mnjbz.  The GLZNOP on the command line
 
tells it the marker to look for, and the GLZNOP on a line by itself is a
 
marker that says no more input is available for the command.  EOF is
 
just slightly more readable to programmers than GLZNOP, but the shell
 
doesn't care - it just matches the word after the << and the word on a
 
line by itself.
 
  
 
Regards
 
Regards
Line 436: Line 402:
 
Hello,
 
Hello,
  
While using GT.M journaling is a good idea, that doesn't necessarily
+
While using GT.M journaling is a good idea, that doesn't necessarily mean that you can always recover your VistA database. This is due to the fact that GT.M journals on the GT.M level, which is sets and kills. VistA operates at the Fileman and business logic level, where one Fileman command is made up of multiple sets and kills. Unfortunately, VistA nor Fileman has journaling at it's own level.
mean that you can always recover your VistA database. This is due to
 
the fact that GT.M journals on the GT.M level, which is sets and
 
kills. VistA operates at the Fileman and business logic level, where
 
one Fileman command is made up of multiple sets and kills.
 
Unfortunately, VistA nor Fileman has journaling at it's own level.
 
  
So let's say that you have a task in taskman that is executing a
+
So let's say that you have a task in taskman that is executing a Fileman command, which in turn is made up of 10 GT.M sets. Your server dies in the middle of that command, at GT.M set 5. GT.M journaling will allow you to recover to GT.M set 5, but your Fileman call never finished, and you cannot automatically roll back past GT.M set 1 because Fileman has no journal record of it's own, marking set 1. You can manually roll back GT.M past set 1, but that means that YOU the programmer has to know what was being executed, and know to which GT.M set you have to roll back to.
Fileman command, which in turn is made up of 10 GT.M sets. Your server
 
dies in the middle of that command, at GT.M set 5. GT.M journaling
 
will allow you to recover to GT.M set 5, but your Fileman call never
 
finished, and you cannot automatically roll back past GT.M set 1
 
because Fileman has no journal record of it's own, marking set 1. You
 
can manually roll back GT.M past set 1, but that means that YOU the
 
programmer has to know what was being executed, and know to which GT.M
 
set you have to roll back to.
 
  
Now imagine if you have multiple tasks running concurrently when your
+
Now imagine if you have multiple tasks running concurrently when your server goes down. GT.M will recover happy as a clam, but you will have multiple Fileman calls in various states of completion. What if rolling back past one Fileman call puts another Fileman call in an invalid state? To my knowledge, you cannot roll forward or back through a GT.M journal file based on process id (please correct me if I am wrong here). So all your sets and kills across all your processes are interspersed with each other in the GT.M log.
server goes down. GT.M will recover happy as a clam, but you will have
 
multiple Fileman calls in various states of completion. What if
 
rolling back past one Fileman call puts another Fileman call in an
 
invalid state? To my knowledge, you cannot roll forward or back
 
through a GT.M journal file based on process id (please correct me if
 
I am wrong here). So all your sets and kills across all your processes
 
are interspersed with each other in the GT.M log.
 
  
So what do you do? When I have lost a server and ended up with the
+
So what do you do? When I have lost a server and ended up with the results of an incomplete Fileman call, I had to find the incomplete globals and edit them appropriately. Luckily, for my close calls the end user was available to tell me what they were doing. That made it much easier to find what globals were affected. Thus I have never rolled back through a GT.M journal as a result of server failure, I have only moved forward fixing errors as I find them.
results of an incomplete Fileman call, I had to find the incomplete
 
globals and edit them appropriately. Luckily, for my close calls the
 
end user was available to tell me what they were doing. That made it
 
much easier to find what globals were affected. Thus I have never
 
rolled back through a GT.M journal as a result of server failure, I
 
have only moved forward fixing errors as I find them.
 
  
Apologies if you already knew this, but I'm not sure how many people
+
Apologies if you already knew this, but I'm not sure how many people have thought of the ramifications caused by VistA not having a journaling system of its own.  
have thought of the ramifications caused by VistA not having a
 
journaling system of its own.
 
  
 
Branden Tanga
 
Branden Tanga
  
P.S. I know that GT.M has the capabilities for an application to
+
P.S. I know that GT.M has the capabilities for an application to leverage its journal file, in essence bringing the journal file to the level of your business logic. Unfortunately, VistA does not take advantage of anything like that, and the VistA or Fileman routines would have to be edited.
leverage its journal file, in essence bringing the journal file to the
 
level of your business logic. Unfortunately, VistA does not take
 
advantage of anything like that, and the VistA or Fileman routines
 
would have to be edited.
 
  
 
K.S. Bhaskar  Date: Mon, 18 Aug 2008 09:47:22 -0400
 
K.S. Bhaskar  Date: Mon, 18 Aug 2008 09:47:22 -0400
  
Branden, this is not a GT.M issue, but rather, as you note, a
+
Branden, this is not a GT.M issue, but rather, as you note, a VistA/Fileman design issue, in that while the database engine can provide recovery of database state, without the use of transaction processing features by the application code, you are not guaranteed that the database state is Consistent (referring to the ACID transaction properties of Atomicity, Consistency, Isolation and Durability).  I don't know what a transaction might be in the health care arena, but consider transferring $100 from your checking account to your savings account that is implemented by subtracting $100 from your checking account balance and adding $100 to your savings account balance.  In the event of a system crash, either both the subtraction and addition operations should be reflected in the state of the database, or neither should be reflected.  It is not acceptable for one to be reflected and the other not to be reflected.  The MUMPS language provides TStart and TCommit commands that you can bracket your code with and which provides Atomicity.  Thus, if the application logic is correct (in our example, the transfer is implemented as a subtraction from one account and an addition of the same amount to the other account), we have Consistency.
VistA/Fileman design issue, in that while the database engine can
 
provide recovery of database state, without the use of transaction
 
processing features by the application code, you are not guaranteed that
 
the database state is Consistent (referring to the ACID transaction
 
properties of Atomicity, Consistency, Isolation and Durability).  I
 
don't know what a transaction might be in the health care arena, but
 
consider transferring $100 from your checking account to your savings
 
account that is implemented by subtracting $100 from your checking
 
account balance and adding $100 to your savings account balance.  In the
 
event of a system crash, either both the subtraction and addition
 
operations should be reflected in the state of the database, or neither
 
should be reflected.  It is not acceptable for one to be reflected and
 
the other not to be reflected.  The MUMPS language provides TStart and
 
TCommit commands that you can bracket your code with and which provides
 
Atomicity.  Thus, if the application logic is correct (in our example,
 
the transfer is implemented as a subtraction from one account and an
 
addition of the same amount to the other account), we have Consistency.
 
  
As you note, VistA/Fileman does not use MUMPS transaction processing
+
As you note, VistA/Fileman does not use MUMPS transaction processing commands, and therefore, when a database state is recovered from a crash, it can, and likely will, be Inconsistent.  Since VistA has been designed this way, and has operated for years, my guess is that either (a) from an application point of view, transaction Consistency is not important - for example, if a system crashes during registration, perhaps an incomplete registration means that the patient has to be re-registered, but and the consequence is simply an unused serial number or (b) there is application logic to search for and correct Inconsistencies.  
commands, and therefore, when a database state is recovered from a
 
crash, it can, and likely will, be Inconsistent.  Since VistA has been
 
designed this way, and has operated for years, my guess is that either
 
(a) from an application point of view, transaction Consistency is not
 
important - for example, if a system crashes during registration,
 
perhaps an incomplete registration means that the patient has to be
 
re-registered, but and the consequence is simply an unused serial number
 
or (b) there is application logic to search for and correct Inconsistencies.
 
  
It would be good to hear from some application experts on this topic.
+
It would be good to hear from some application experts on this topic. Thank you very much.
Thank you very much.
 
  
 
Regards
 
Regards
Line 522: Line 430:
 
fred trotter Date: Mon, 18 Aug 2008 10:12:00 -0500
 
fred trotter Date: Mon, 18 Aug 2008 10:12:00 -0500
  
Is it a true statement that ACID compliance for VistA could be
+
Is it a true statement that ACID compliance for VistA could be implemented entirely in FileMan? Or would it require more fundamental changes in other places?
implemented entirely in FileMan? Or would it require more fundamental
 
changes in other places?
 
  
The problem with Brandens story is that his workaround for a non-ACID
+
The problem with Brandens story is that his workaround for a non-ACID crash was to leverage extensive knowledge of how VistA works to figure out where it was broken. Essentially these kinds of efforts prevent the "kernelization" of VistA. Important details of how the VistA/MUMPs works are required in order to fix this type of problem. Issues like these ensure that VistA usage grows only as fast as VistA "kernel" expertise, and that grows slowly indeed.
crash was to leverage extensive knowledge of how VistA works to figure
 
out where it was broken. Essentially these kinds of efforts prevent
 
the "kernelization" of VistA. Important details of how the VistA/MUMPs
 
works are required in order to fix this type of problem. Issues like
 
these ensure that VistA usage grows only as fast as VistA "kernel"
 
expertise, and that grows slowly indeed.
 
  
If the VistA project cannot find a way past these kinds of issues it
+
If the VistA project cannot find a way past these kinds of issues it will be eclipsed by other FOSS projects. Either by VistA-based efforts like WebVistA (knowing that it is difficult to tell what that looks like) or by other efforts like OpenMRS, Tolven and ClearHealth proper.
will be eclipsed by other FOSS projects. Either by VistA-based efforts
 
like WebVistA (knowing that it is difficult to tell what that looks
 
like) or by other efforts like OpenMRS, Tolven and ClearHealth proper.
 
  
It seems clear that Baskar has done his part. He has exposed an API
+
It seems clear that Baskar has done his part. He has exposed an API from GTM to handle this issue.
from GTM to handle this issue.
 
  
 
What now?
 
What now?
Line 551: Line 447:
 
Fred --
 
Fred --
  
You are thinking like a programmer and not like a business person.
+
You are thinking like a programmer and not like a business person. Remember that things like ACID properties (and more esoteric things like two phase commit) are technologies intended to assist in business continuity in the face of unplanned events.  As a geek at heart, I keep reminding myself that technology is only a means to an end, and not an end unto itself.  VistA (at least DHCP) existed well before ACID properties and seems to run well.  So, I think the questions to ask (before imposing a requirement of ACIDity) are:
Remember that things like ACID properties (and more esoteric things like
 
two phase commit) are technologies intended to assist in business
 
continuity in the face of unplanned events.  As a geek at heart, I keep
 
reminding myself that technology is only a means to an end, and not an
 
end unto itself.  VistA (at least DHCP) existed well before ACID
 
properties and seems to run well.  So, I think the questions to ask
 
(before imposing a requirement of ACIDity) are:
 
  
Do the business processes of health care require ACID transaction
+
Do the business processes of health care require ACID transaction properties or are the business processes inherently robust in the face of non-Atomicity and non-Consistency?  [Isolation and Durability are not at issue here.]  If this is the case, is a requirement of ACIDity like requiring brake fluid for restaurants?
properties or are the business processes inherently robust in the face
 
of non-Atomicity and non-Consistency?  [Isolation and Durability are not
 
at issue here.]  If this is the case, is a requirement of ACIDity like
 
requiring brake fluid for restaurants?
 
  
If the answer is that the business processes of health care (at least as
+
If the answer is that the business processes of health care (at least as addressed by VistA) are not inherently robust in the face of non-Atomicity and non-Consistency, then what mechanisms currently exist in VistA that provide these requirements?
addressed by VistA) are not inherently robust in the face of
 
non-Atomicity and non-Consistency, then what mechanisms currently exist
 
in VistA that provide these requirements?
 
  
Until we look at the above questions first, looking at ACIDity is like
+
Until we look at the above questions first, looking at ACIDity is like putting the cart before the horse.  Branden was not the first to experience a VistA system crash.  Let's find out what others have done before him after recovering from a crash.
putting the cart before the horse.  Branden was not the first to
 
experience a VistA system crash.  Let's find out what others have done
 
before him after recovering from a crash.
 
  
 
Regards
 
Regards
Line 583: Line 462:
  
 
Fred Trotter asks:
 
Fred Trotter asks:
> ...
+
>Is it a true statement that ACID compliance for VistA could be implemented entirely in FileMan? Or would it require more fundamentalchanges in other places?
> Is it a true statement that ACID compliance for VistA could be
+
 
> implemented entirely in FileMan? Or would it require more fundamental
 
> changes in other places?
 
> ...
 
  
No, it is not a true statement, because other VistA code changes the
+
No, it is not a true statement, because other VistA code changes the database without going thru FileMan calls.
database without going thru FileMan calls.
 
  
 
Fred comments:
 
Fred comments:
  
> ...
+
> It seems clear that Baskar has done his part. He has exposed an API from GTM to handle this issue.
> It seems clear that Baskar has done his part. He has exposed an API
 
> from GTM to handle this issue.
 
> ...
 
  
What Bhaskar exposed was transaction-processing syntax that has been
+
What Bhaskar exposed was transaction-processing syntax that has been in the MUMPS Standard for a long time, but which the VA chose not to use.  GTM of course is to be commended for implementing the MUMPS Standard!  ;-)
in the MUMPS Standard for a long time, but which the VA chose not to
 
use.  GTM of course is to be commended for implementing the MUMPS
 
Standard!  ;-)
 
  
 
Fred asks:
 
Fred asks:
  
> ...
 
 
> What now?
 
> What now?
> ...
 
  
Well, if someone wants to fund a man-year of retrofitting all VA code
+
Well, if someone wants to fund a man-year of retrofitting all VA code with the TS and TC commands, maybe the VA would be willing to change their (SAC) standard, and test and distribute hundreds of transaction-processing changes to their code.  But I doubt it, when they don't even take bug-fixes and functionality enhancements from the outside.
with the TS and TC commands, maybe the VA would be willing to change
 
their (SAC) standard, and test and distribute hundreds of transaction-
 
processing changes to their code.  But I doubt it, when they don't
 
even take bug-fixes and functionality enhancements from the outside.
 
 
 
 
Woodhouse Gregory  Date: Mon, 18 Aug 2008 09:47:36 -0700
 
Woodhouse Gregory  Date: Mon, 18 Aug 2008 09:47:36 -0700
  
Production VistA systems normally use journalling. Other measures
+
Production VistA systems normally use journalling. Other measures include the use of RAID and UPS devices. For historical reasons (lack of uniform support across MUMPS implementations) VistA systems have not used transactions. This is no longer the case, but there is plenty of legacy code out there that does not use transactions. Instead, it was/is necessary to restore journaled globals explicitly.
include the use of RAID and UPS devices. For historical reasons (lack
+
 
of uniform support across MUMPS implementations) VistA systems have
+
In response to Fred's question: Fileman does not provide ACID support directly: this needs to be handled by the underlying MUMPS system. The role of Fileman is to provide a higher level abstraction than MUMPS globals, and to provide various tools (import/export, reporting, query and update, etc.) Screenman and the Classic APIs also provide (character based) UI support.
not used transactions. This is no longer the case, but there is
 
plenty of legacy code out there that does not use transactions.
 
Instead, it was/is necessary to restore journaled globals explicitly.
 
  
In response to Fred's question: Fileman does not provide ACID support 
 
directly: this needs to be handled by the underlying MUMPS system. 
 
The role of Fileman is to provide a higher level abstraction than 
 
MUMPS globals, and to provide various tools (import/export, 
 
reporting, query and update, etc.) Screenman and the Classic APIs 
 
also provide (character based) UI support.
 
 
Metaphors be with you.
 
Metaphors be with you.
 
http://www.gwoodhouse.com
 
http://GregWoodhouse.ImageKind.com
 
 
 
 
fred trotter  Date: Mon, 18 Aug 2008 11:54:55 -0500
 
fred trotter  Date: Mon, 18 Aug 2008 11:54:55 -0500
Line 640: Line 491:
 
K.S. Bhaskar wrote:
 
K.S. Bhaskar wrote:
  
> Fred --
+
> Fred -- You are thinking like a programmer and not like a business person.
 
 
> You are thinking like a programmer and not like a business person.
 
  
 
No exactly the opposite.
 
No exactly the opposite.
  
> As a geek at heart, I keep
+
> As a geek at heart, I keep reminding myself that technology is only a means to an end, and not an end unto itself.  VistA (at least DHCP) existed well before ACID properties and seems to run well.
> reminding myself that technology is only a means to an end, and not an
 
> end unto itself.  VistA (at least DHCP) existed well before ACID
 
> properties and seems to run well.
 
  
 
Under the care and feeding of highly trained experts who do nothing else.
 
Under the care and feeding of highly trained experts who do nothing else.
Line 655: Line 501:
 
My point is not at all that we need ACID, my point is this:
 
My point is not at all that we need ACID, my point is this:
  
If system crashes require in-depth knowledge of MUMPS/FileMan/VistA to
+
If system crashes require in-depth knowledge of MUMPS/FileMan/VistA to fix, then users cannot treat VistA as a "kernel". By "kernel" I mean a reliable platform whose internal workings can safely be ignored if certain requirements are respected (i.e. the right hardware, MUMPS implementation, etc etc.)
fix, then users cannot treat VistA as a "kernel". By "kernel" I mean a
 
reliable platform whose internal workings can safely be ignored if
 
certain requirements are respected (i.e. the right hardware, MUMPS
 
implementation, etc etc.)
 
  
It would be entirely fine for me to have the VistA community say
+
It would be entirely fine for me to have the VistA community say "Backup VistA every hour. If the system crashes, reinstall the most recent good backup, and send a alert that 1 hours worth of data has been potentially lost"
  
"Backup VistA every hour. If the system crashes, reinstall the most
+
That's not great... ACID would be better but that is what you had to do with MySQL for a long time and is an acceptable work-around.
recent good backup, and send a alert that 1 hours worth of data has
 
been potentially lost"
 
  
That's not great... ACID would be better but that is what you had to
+
Unacceptable answer is "Use your extensive understanding of VistA internal state to correct the values of Globals that were in use at the time of the crash"
do with MySQL for a long time and is an acceptable work-around.
 
  
Unacceptable answer is
+
That answer implies that you must be a MUMPS expert to support VistA which is intractable. I am not a C expert but I use the C-based linux kernel all the time.
  
"Use your extensive understanding of VistA internal state to correct
+
I am talking about a business problem in the context of one technical solution, but my concern is about the business problem.
the values of Globals that were in use at the time of the crash"
 
 
 
That answer implies that you must be a MUMPS expert to support VistA
 
which is intractable. I am not a C expert but I use the C-based linux
 
kernel all the time.
 
 
 
I am talking about a business problem in the context of one technical
 
solution, but my concern is about the business problem.
 
  
 
--  
 
--  
Line 689: Line 520:
 
On Aug 18, 2008, at 9:44 AM, George Timson wrote:
 
On Aug 18, 2008, at 9:44 AM, George Timson wrote:
  
> Fred Trotter asks:
+
> Fred Trotter asks: Is it a true statement that ACID compliance for VistA could be implemented entirely in FileMan? Or would it require more fundamental changes in other places?
>> ...
+
 
>> Is it a true statement that ACID compliance for VistA could be
+
> No, it is not a true statement, because other VistA code changes the database without going thru FileMan calls.
>> implemented entirely in FileMan? Or would it require more fundamental
 
>> changes in other places?
 
>> ...
 
> No, it is not a true statement, because other VistA code changes the
 
> database without going thru FileMan calls.
 
  
This a perennial problem with VistA code. I've long argued that
+
This a perennial problem with VistA code. I've long argued that developers should resist the urge to manipulate Fileman globals directly, but even if everyone stopped today, there would still be plenty of code that bypasses Fileman. Another, perhaps more insidious, problem is that developers and systems personnel often manipulate globals to correct errors ("crashes").
developers should
 
resist the urge to manipulate Fileman globals directly, but even if
 
everyone stopped
 
today, there would still be plenty of code that bypasses Fileman.
 
Another, perhaps
 
more insidious, problem is that developers and systems personnel
 
often manipulate globals to
 
correct errors ("crashes").
 
  
>> ...
 
  
> Well, if someone wants to fund a man-year of retrofitting all VA code
+
> Well, if someone wants to fund a man-year of retrofitting all VA code with the TS and TC commands, maybe the VA would be willing to change their (SAC) standard, and test and distribute hundreds of transaction-processing changes to their code.  But I doubt it, when they don't even take bug-fixes and functionality enhancements from the outside.
> with the TS and TC commands, maybe the VA would be willing to change
 
> their (SAC) standard, and test and distribute hundreds of transaction-
 
> processing changes to their code.  But I doubt it, when they don't
 
> even take bug-fixes and functionality enhancements from the outside.
 
  
The SAC has been revised to allow the the use TS and TC, but that
+
The SAC has been revised to allow the the use TS and TC, but that doesn't address the legacy code problem (the issue you address above).
doesn't address the legacy code
 
problem (the issue you address above).
 
 
 
 
Steven McPhelan  Date: Mon, 18 Aug 2008 13:35:06 -0400
 
Steven McPhelan  Date: Mon, 18 Aug 2008 13:35:06 -0400
  
George stated "...if someone wants to fund a man-year of retrofitting all VA
+
George stated "...if someone wants to fund a man-year of retrofitting all VA code with the TS and TC commands.."  I understand that George was making a different point.  I do not think that one man year is even close to sufficient time to rewrite all the existing VA code to be TP compliant.  To make the changes, QA it, and release it would be a very large task indeed. Then it does no good as George implied to undertake such a task and to not put in place the structure to mandate and enforce that all new code from that point forward would only use TP procedures.
code with the TS and TC commands.."  I understand that George was making a
 
different point.  I do not think that one man year is even close to
 
sufficient time to rewrite all the existing VA code to be TP compliant.  To
 
make the changes, QA it, and release it would be a very large task indeed.
 
Then it does no good as George implied to undertake such a task and to not
 
put in place the structure to mandate and enforce that all new code from
 
that point forward would only use TP procedures.
 
  
All of this is predicated upon the assumption that load testing of such
+
All of this is predicated upon the assumption that load testing of such rewritten code to be TP compliant shows that there is no decrease in the number of the transactions filed per time period without the requirement to upgrade the hardware to handle TP vs non-TP processing.  I won't get into the practical issues of how the existing code would handle TP rollbacks because the filing failed.  For good or bad, many VistA programs file data and proceed on with no checks to see if the filing of the data was indeed successful.
rewritten code to be TP compliant shows that there is no decrease in the
 
number of the transactions filed per time period without the requirement to
 
upgrade the hardware to handle TP vs non-TP processing.  I won't get into
 
the practical issues of how the existing code would handle TP rollbacks
 
because the filing failed.  For good or bad, many VistA programs file data
 
and proceed on with no checks to see if the filing of the data was indeed
 
successful.
 
  
 
--  
 
--  
 
Steve
 
Steve
"Rest satisfied with doing well, and leave others to talk of you as they
+
"Rest satisfied with doing well, and leave others to talk of you as they please." - Pythagoras
please." - Pythagoras
 
 
 
 
fred trotter  Date: Mon, 18 Aug 2008 12:45:35 -0500
 
fred trotter  Date: Mon, 18 Aug 2008 12:45:35 -0500
  
> So what do you do? When I have lost a server and ended up with the
+
> So what do you do? When I have lost a server and ended up with the results of an incomplete Fileman call, I had to find the incomplete globals and edit them appropriately. Luckily, for my close calls the end user was available to tell me what they were doing. That made it much easier to find what globals were affected. Thus I have never rolled back through a GT.M journal as a result of server failure, I have only moved forward fixing errors as I find them.
> results of an incomplete Fileman call, I had to find the incomplete
 
> globals and edit them appropriately. Luckily, for my close calls the
 
> end user was available to tell me what they were doing. That made it
 
> much easier to find what globals were affected. Thus I have never
 
> rolled back through a GT.M journal as a result of server failure, I
 
> have only moved forward fixing errors as I find them.
 
  
 
Ok,
 
Ok,
      I will make my question more specific. Is this paragraph
+
I will make my question more specific. Is this paragraph illustrative of how to handle a crash moving forward? If this is how crashes are handled, then this is a problem. If there is another procedure that can be followed, then it is important enough to have a description on the WorldVistA wiki. Or to have a link from the wiki to an already published solution. To help, I have created the page:
illustrative of how to handle a crash moving forward? If this is how
 
crashes are handled, then this is a problem. If there is another
 
procedure that can be followed, then it is important enough to have a
 
description on the WorldVistA wiki. Or to have a link from the wiki to
 
an already published solution. To help, I have created the page:
 
  
http://vistapedia.net/index.php?title=Restoring_a_VistA_installation
+
*http://vistapedia.net/index.php?title=Restoring_a_VistA_installation
  
 
HTH,
 
HTH,
Line 772: Line 557:
 
Going on to discuss the pure technical issue:
 
Going on to discuss the pure technical issue:
  
Is there no way to do this on a meta level? What about executing TS
+
Is there no way to do this on a meta level? What about executing TS and TC commands before and after every routine. So that at a minimum you know roughly in which routine the failure took place.
and TC commands before and after every routine. So that at a minimum
 
you know roughly in which routine the failure took place.
 
  
Perhaps you could have some "named idle journal". So that you could
+
Perhaps you could have some "named idle journal". So that you could automatically roll back to a time when at the least nothing was happening on the system.
automatically roll back to a time when at the least nothing was
 
happening on the system.
 
  
Any time I suggest something like this I usually get back that
+
Any time I suggest something like this I usually get back that something like this already happens, or Baskar tells me that GTM already does something like this. I know I am way way over my head with regards to how MUMPS works....
something like this already happens, or Baskar tells me that GTM
 
already does something like this. I know I am way way over my head
 
with regards to how MUMPS works....
 
  
 
--  
 
--  
Line 792: Line 570:
 
On Aug 18, 2008, at 8:58 AM, K.S. Bhaskar wrote:
 
On Aug 18, 2008, at 8:58 AM, K.S. Bhaskar wrote:
  
> Do the business processes of health care require ACID transaction
+
> Do the business processes of health care require ACID transaction properties or are the business processes inherently robust in the face of non-Atomicity and non-Consistency?  [Isolation and Durability   are not at issue here.]  If this is the case, is a requirement of ACIDity like requiring brake fluid for restaurants?
> properties or are the business processes inherently robust in the face
 
> of non-Atomicity and non-Consistency?  [Isolation and Durability
 
> are not
 
> at issue here.]  If this is the case, is a requirement of ACIDity like
 
> requiring brake fluid for restaurants?
 
  
> If the answer is that the business processes of health care (at
+
> If the answer is that the business processes of health care (at   least as addressed by VistA) are not inherently robust in the face of non-Atomicity and non-Consistency, then what mechanisms currently   exist in VistA that provide these requirements?
> least as
 
> addressed by VistA) are not inherently robust in the face of
 
> non-Atomicity and non-Consistency, then what mechanisms currently
 
> exist
 
> in VistA that provide these requirements?
 
  
This is interesting. It seems uncontroversial that database integrity
+
This is interesting. It seems uncontroversial that database integrity is a requirement for health information systems (for example, we wouldn't want a penicillin allergy to be "lost"). In the ACID model, I would be hard pressed to say which of the four properties (atomicity, consistency, isolation and durability)  can be dispensed with. But what is less obvious is that the ACID approach is the only route to database integrity.  Thee latest ACM Queue  takes this on with a little column whimsically entitled "BASE: an alternative to ACID"
is a requirement for health information systems (for example, we
 
wouldn't want a penicillin allergy to be "lost"). In the ACID model,
 
I would be hard pressed to say which of the four properties
 
(atomicity, consistency, isolation and durability)  can be dispensed
 
with. But what is less obvious is that the ACID approach is the only
 
route to database integrity.  Thee latest ACM Queue  takes this on
 
with a little column whimsically entitled "BASE: an alternative to ACID"
 
  
http://www.acmqueue.com/modules.php?
+
*http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=540&page=1
name=Content&pa=showpage&pid=540&page=1
 
  
Results like the CAP theorem have interested me for some time, given
+
Results like the CAP theorem have interested me for some time, given that I am interested in (developing) alternatives to heavy-handed approaches database consistency like message ordering (frequently employed in HL7).
that I am interested in (developing) alternatives to heavy-handed
 
approaches database consistency like message ordering (frequently
 
employed in HL7).
 
  
Anyway, the CAP theorem is just another version of a well-known
+
Anyway, the CAP theorem is just another version of a well-known dilemma in database programming: in choosing between the 2-phase and 3-phase commit, you are forced to choose between an algorithm that can fail, even when updating the database is safe, and one that can block indefinitely.
dilemma in database programming: in choosing between the 2-phase and
 
3-phase commit, you are forced to choose between an algorithm that
 
can fail, even when updating the database is safe, and one that can
 
block indefinitely.
 
  
"It is never too late to become reasonable
+
"It is never too late to become reasonable and wise; but if the insight comes too late, there is always more difficulty in starting the change." -- Immanuel Kant
and wise; but if the insight comes too late,
 
there is always more difficulty in starting
 
the change." -- Immanuel Kant
 
 
 
 
 
 
Woodhouse Gregory  Date: Mon, 18 Aug 2008 11:23:46 -0700
 
Woodhouse Gregory  Date: Mon, 18 Aug 2008 11:23:46 -0700
  
Free associating a bit, I can't help but think of a famous result in
+
Free associating a bit, I can't help but think of a famous result in (mathematical) model theory called Löb's Theorem. It states that a system cannot assert its own soundness without being inconsistent.
(mathematical) model theory called Löb's Theorem. It states that a
 
system cannot assert its own soundness without being inconsistent.
 
  
 
fred trotter wrote:
 
fred trotter wrote:
  
> Going on to discuss the pure technical issue:
+
> Going on to discuss the pure technical issue: Is there no way to do this on a meta level? What about executing TS and TC commands before and after every routine. So that at a minimum you know roughly in which routine the failure took place.
 
 
> Is there no way to do this on a meta level? What about executing TS
 
> and TC commands before and after every routine. So that at a minimum
 
> you know roughly in which routine the failure took place.
 
  
> Perhaps you could have some "named idle journal". So that you could
+
> Perhaps you could have some "named idle journal". So that you could automatically roll back to a time when at the least nothing was happening on the system.
> automatically roll back to a time when at the least nothing was
 
> happening on the system.
 
  
> Any time I suggest something like this I usually get back that
+
> Any time I suggest something like this I usually get back that something like this already happens, or Baskar tells me that GTM already does something like this. I know I am way way over my head with regards to how MUMPS works....
> something like this already happens, or Baskar tells me that GTM
 
> already does something like this. I know I am way way over my head
 
> with regards to how MUMPS works....
 
  
This is a good question. It shouldn't be difficult to write a meta-
+
This is a good question. It shouldn't be difficult to write a meta- interpreter of the type you describe, though I'm unsure what the   
interpreter of the type you describe, though I'm unsure what the   
 
 
performance implications would be.
 
performance implications would be.
  
Basically, you're running into the legacy code problem. Modern MUMPS
+
Basically, you're running into the legacy code problem. Modern MUMPS implementations do support ACID transactions, but this facility was not available when the bulk of VistA was developed. This has led to a controversy between people arguing that it is not feasible to build transaction support into VistA, and people (like me) that argue that it is essential to do so. Unfortunately, this generally mutates into a highly emotional debate over the use of MUMPS, which is not the point at all.
implementations do support ACID transactions, but this facility was
 
not available when the bulk of VistA was developed. This has led to
 
a controversy between people arguing that it is not feasible to build
 
transaction support into VistA, and people (like me) that argue that
 
it is essential to do so. Unfortunately, this generally mutates into
 
a highly emotional debate over the use of MUMPS, which is not the
 
point at all.
 
  
 
  
 
  
 
fred trotter  Date: Mon, 18 Aug 2008 14:33:08 -0500
 
fred trotter  Date: Mon, 18 Aug 2008 14:33:08 -0500
  
I agree that ACID vs no ACID is probably a waste of time. Any
+
I agree that ACID vs no ACID is probably a waste of time. Any practical suggestions for workarounds for VistA rebuilding?
practical suggestions for workarounds for VistA rebuilding?
+
 
--  
+
-- Fred Trotter
Fred Trotter
+
 
Woodhouse Gregory Date: Mon, 18 Aug 2008 12:39:32 -0700
 
Woodhouse Gregory Date: Mon, 18 Aug 2008 12:39:32 -0700
  
>> So what do you do? When I have lost a server and ended up with the
+
>> So what do you do? When I have lost a server and ended up with the results of an incomplete Fileman call, I had to find the incomplete globals and edit them appropriately. Luckily, for my close calls the end user was available to tell me what they were doing. That made it much easier to find what globals were affected. Thus I have never rolled back through a GT.M journal as a result of server failure, I have only moved forward fixing errors as I find them.
>> results of an incomplete Fileman call, I had to find the incomplete
 
>> globals and edit them appropriately. Luckily, for my close calls the
 
>> end user was available to tell me what they were doing. That made it
 
>> much easier to find what globals were affected. Thus I have never
 
>> rolled back through a GT.M journal as a result of server failure, I
 
>> have only moved forward fixing errors as I find them.
 
  
> Ok,
+
> Ok, I will make my question more specific. Is this paragraph illustrative of how to handle a crash moving forward? If this is how crashes are handled, then this is a problem. If there is another procedure that can be followed, then it is important enough to have a description on the WorldVistA wiki. Or to have a link from the wiki to an already published solution. To help, I have created the page:
>        I will make my question more specific. Is this paragraph
 
> illustrative of how to handle a crash moving forward? If this is how
 
> crashes are handled, then this is a problem. If there is another
 
> procedure that can be followed, then it is important enough to have a
 
> description on the WorldVistA wiki. Or to have a link from the wiki to
 
> an already published solution. To help, I have created the page:
 
  
> http://vistapedia.net/index.php?title=Restoring_a_VistA_installation
+
* http://vistapedia.net/index.php?title=Restoring_a_VistA_installation
  
It's close - far too close for my comfort. Production systems should
+
It's close - far too close for my comfort. Production systems should always be journaled, but I suspect many people here who may be developers, or who may be just "kicking the tires", may not enable journaling.
always be journaled, but I suspect
 
many people here who may be developers, or who may be just "kicking
 
the tires", may not enable journaling.
 
  
 
"Think globally, act locally."
 
"Think globally, act locally."
 
--René Dubos
 
--René Dubos
 
 
 
 
Chris Richardson  Date: Mon, 18 Aug 2008 14:23:35 -0700
 
Chris Richardson  Date: Mon, 18 Aug 2008 14:23:35 -0700
  
Well, guys, there is nothing left to do but contact your congressmen about
+
Well, guys, there is nothing left to do but contact your congressmen about this and start a grass-roots effort to get this funding.  It would be embarrassing if a foreign government might pay for our software to be properly updated.
this and start a grass-roots effort to get this funding.  It would be
 
embarressing if a foreign government might pay for our software to be
 
properly updated.
 
 
 
 
 
 
Branden Tanga  Date: Thu, 21 Aug 2008 20:27:34 -0700 (PDT)
 
Branden Tanga  Date: Thu, 21 Aug 2008 20:27:34 -0700 (PDT)
  
Sorry to bring up a seemingly dead topic, but I haven't kept up with
+
Sorry to bring up a seemingly dead topic, but I haven't kept up with this thread over the past few days.
this thread over the past few days.
 
  
I don't see code that directly edits globals as the major issue. The
+
I don't see code that directly edits globals as the major issue. The main problem as I see it, is that having transactional processing built into Fileman is not good enough to be able to safely roll back and forward through a VistA log. In the same way that a single Fileman call is made up of multiple Mumps sets and kills, a single VistA transaction can be made up of multiple Fileman calls. So you would need code in VistA itself that defines what a "transaction" is. Likely, this definition would be different for each module in VistA. There is no way for a pure programmer like me to denote VistA transactions, you would need domain experts for each module to mark which action or group of actions are a transaction.
main problem as I see it, is that having transactional processing
 
built into Fileman is not good enough to be able to safely roll back
 
and forward through a VistA log. In the same way that a single Fileman
 
call is made up of multiple Mumps sets and kills, a single VistA
 
transaction can be made up of multiple Fileman calls. So you would
 
need code in VistA itself that defines what a "transaction" is.
 
Likely, this definition would be different for each module in VistA.
 
There is no way for a pure programmer like me to denote VistA
 
transactions, you would need domain experts for each module to mark
 
which action or group of actions are a transaction.
 
  
I totally agree, my solution is not optimal. When I had a server
+
I totally agree, my solution is not optimal. When I had a server failure, I was faced with 2 options:
failure, I was faced with 2 options:
 
  
1. Figure out where in the GT.M journal to roll back to
+
# Figure out where in the GT.M journal to roll back to  
2. Figure out how to fix the globals manually, and move on.
+
# Figure out how to fix the globals manually, and move on.
  
Because of the risk that rolling back through the journal may cause
+
Because of the risk that rolling back through the journal may cause other Fileman calls to be incomplete, and the ridiculous amount of time it would take to figure out which exact GT.M set or kill I needed to roll back to, I chose #2. I talked to my end users to figure out what they were doing, edited the necessary globals, and if their actions were "finished", then I considered the database recovery as complete as possible. In short, I had to choose the lesser of 2 evils, which was to fix the globals manually and move on.
other Fileman calls to be incomplete, and the ridiculous amount of
 
time it would take to figure out which exact GT.M set or kill I needed
 
to roll back to, I chose #2. I talked to my end users to figure out
 
what they were doing, edited the necessary globals, and if their
 
actions were "finished", then I considered the database recovery as
 
complete as possible. In short, I had to choose the lesser of 2 evils,
 
which was to fix the globals manually and move on.
 
 
 
 
Skip Ormsby  Date: Fri, 22 Aug 2008 07:01:30 -0400
 
Skip Ormsby  Date: Fri, 22 Aug 2008 07:01:30 -0400
  
If my creaky old brain remembers correctly, one of the reasons for
+
If my creaky old brain remembers correctly, one of the reasons for non-traction processing is because of code like this (before the unsubscripted kills prevention was implemented and the New command, although there are still plenty of times the Kill is used)  
non-traction processing is because of code like this (before the
+
* S DIC=4,DIC(0)="AEMQZ" D ^DIC
unsubscripted kills prevention was implemented and the New command,
+
* ;Now being a good developer since I am making a Classic call I need to do local variable clean up
although there are still plenty of times the Kill is used)
+
*K ^DIC ; Ahh oops
S DIC=4,DIC(0)="AEMQZ" D ^DIC
 
;Now being a good developer since I am making a Classic call I need to
 
do local variable clean up
 
K ^DIC ; Ahh oops
 
  
Generally the favorites were ^DD, ^DIC, and ^DPT in no particular
+
Generally the favorites were ^DD, ^DIC, and ^DPT in no particular order.  Solution - read the journal until you find the unsubscripted Kill and clip it out.  It may take X amount of time before you actually notice that something has disappeared, so you would have journal activity that needs to be applied post the unsubscripted Kill.
order.  Solution - read the journal until you find the unsubscripted
 
Kill and clip it out.  It may take X amount of time before you actually
 
notice that something has disappeared, so you would have journal
 
activity that needs to be applied post the unsubscripted Kill.
 
  
 
-skip
 
-skip
Line 969: Line 654:
 
Steven McPhelan  Date: Fri, 22 Aug 2008 08:29:59 -0400
 
Steven McPhelan  Date: Fri, 22 Aug 2008 08:29:59 -0400
  
That is how I have always handled this problem in the past so very
+
That is how I have always handled this problem in the past so very long ago since I have not had to do this in years.  That is the purpose of the journal which is to bring a backup copy up to date with all the transactions since that backup by dejournaling.  If there was something I knew I dd not want to happen (Skip's K ^DIC example), we would edit the journal file to remove the offending code and then proceed with the normal dejournaling procedures.  Of course if you are not journaling then you do not have this option.
long ago since I have not had to do this in years.  That is the
 
purpose of the journal which is to bring a backup copy up to date with
 
all the transactions since that backup by dejournaling.  If there was
 
something I knew I dd not want to happen (Skip's K ^DIC example), we
 
would edit the journal file to remove the offending code and then
 
proceed with the normal dejournaling procedures.  Of course if you are
 
not journaling then you do not have this option.
 
  
I have not looked in years, are the journal files still just text
+
I have not looked in years, are the journal files still just text files or have they been "updated and improved"?
files or have they been "updated and improved"?
 
  
 
K.S. Bhaskar  Date: Fri, 22 Aug 2008 10:08:17 -0400
 
K.S. Bhaskar  Date: Fri, 22 Aug 2008 10:08:17 -0400
Line 985: Line 662:
 
Branden --
 
Branden --
  
There was some off-list discussion of this topic.  To summarize, when
+
There was some off-list discussion of this topic.  To summarize, when the VA runs VistA, they rely on the MUMPS implementation to restore the database to the state it was in just before the crash (of course, they use computer hardware and operating systems that don't make a habit of crashing).  A combination of their business processes and VistA application logic is such that after a crash, they don't usually need to go in and make changes - in other words, their business processes are in a good state when the MUMPS system recovers the database and VistA is restarted.
the VA runs VistA, they rely on the MUMPS implementation to restore the
 
database to the state it was in just before the crash (of course, they
 
use computer hardware and operating systems that don't make a habit of
 
crashing).  A combination of their business processes and VistA
 
application logic is such that after a crash, they don't usually need to
 
go in and make changes - in other words, their business processes are in
 
a good state when the MUMPS system recovers the database and VistA is
 
restarted.
 
  
I speculate that when units of work need to be done, they are put in
+
I speculate that when units of work need to be done, they are put in queues (in the database) for Taskman background processes to handle, and the design of Taskman is such that when the database is recovered, it picks up unfinished work from the queues.  But this is just a guess.
queues (in the database) for Taskman background processes to handle, and
 
the design of Taskman is such that when the database is recovered, it
 
picks up unfinished work from the queues.  But this is just a guess.
 
  
 
Regards
 
Regards
Line 1,006: Line 672:
 
Steven McPhelan  Date: Fri, 22 Aug 2008 16:11:28 -0400
 
Steven McPhelan  Date: Fri, 22 Aug 2008 16:11:28 -0400
  
If the taskman globals are journaled, they will be recovered and Taskman
+
If the taskman globals are journaled, they will be recovered and Taskman will start where he left off.  However, any existing jobs running at the tiime of the crash will not be restarted.
will start where he left off.  However, any existing jobs running at the
 
tiime of the crash will not be restarted.
 
 
 
 
Skip Ormsby  Date: Fri, 22 Aug 2008 20:15:58 -0400
 
Skip Ormsby  Date: Fri, 22 Aug 2008 20:15:58 -0400
  
As long as the subject is about power outages, at the hospital I was at
+
As long as the subject is about power outages, at the hospital I was at we had no break power for 8 PDP 11-44s that would run the critters for 15-20 minutes, which was long enough for either the main generator to kick in or for one of us to gracefully shut the systems down.  When we went to the MSM/486 configuration, we made sure that all of the parallel bars were zipped tied and the plugs were zipped tied so the didn't accidentally come out of the plug.  And of course the no break power would last a very long time, but we never pushed it past 1/2 hour.  The biggest problem was more in line with a disk controller going nuts, or a nic card that would go berzerk, or bad memory, which in turn would put curd into the data base.  For us it was a case of fix and forget it, because there were other fish to fry.  Never did have a real power outage to the computer circuit, even when the room was flooded from a broken pipe in the ceiling.  Lost the lights, etc., but computer kept right on humming until we shut them down in a speedy, graceful shutdown.  
we had no break power for 8 PDP 11-44s that would run the critters for
 
15-20 minutes, which was long enough for either the main generator to
 
kick in or for one of us to gracefully shut the systems down.  When we
 
went to the MSM/486 configuration, we made sure that all of the parallel
 
bars were zipped tied and the plugs were zipped tied so the didn't
 
accidentally come out of the plug.  And of course the no break power
 
would last a very long time, but we never pushed it past 1/2 hour.  The
 
biggest problem was more in line with a disk controller going nuts, or a
 
nic card that would go bazerk, or bad memory, which in turn would put
 
curd into the data base.  For us it was a case of fix and forget it,
 
because there were other fish to fry.  Never did have a real power
 
outage to the computer circuit, even when the room was flooded from a
 
broken pipe in the ceiling.  Lost the lights, etc., but computer kept
 
right on humming until we shut them down in a speedy, graceful shutdown.
 
  
 
-skip
 
-skip

Revision as of 22:13, 7 May 2009

The Intracare Implementation Log (Back to Episode 7) (Back to Log Homepage) (On to Episode 9)

Power outage restart

Ignacio Valdes Date: Thu, 14 Aug 2008 17:49:15 -0500

Hello all, We had a power outage with the server going totally down. Here is a terminal session dump of what we had to do to get the taskman re-started. The only thing that isn't as it is below is that one enters 't' and gets taskman up and running. This shows several useful commands such as how to find out where gtm exists as well as showing how I was in the right id but in the wrong user space /home/ivaldes instead of being in /home/vista/EHR

login as: ivaldes
ivaldes@IP address password:
Last login: Thu Aug 14 16:59:22 2008 from south124.ich.local
[ivaldes@vista ~]$ su
Password:
[root@vista ivaldes]# su vista
[vista@vista ivaldes]$ gtm

GTM>S DUZ=9

GTM>D ^XUP

Setting up programmer environment
GTM>W $ZS
150374954,XUP+4^XUP,%GTM-E-REQRUNDOWN, Error accessing database /home/vista/EHR/
g/mumps.dat.  Must be rundown on cluster node vista.ich.local.
GTM>h
[vista@vista ivaldes]$ echo $GTM_DIST

[vista@vista ivaldes]$ alias GTM
alias GTM='/usr/local/gtm/mumps -direct'
[vista@vista ivaldes]$ alias gtm
alias gtm='/usr/local/gtm/mumps -direct'
[vista@vista ivaldes]$ /usr/local/gtm/mupip rundown
[vista@vista ivaldes]$ /usr/local/gtm/mupip rundown -r "*"
%GTM-I-MUFILRNDWNSUC, File /home/vista/EHR/g/mumps.dat successfully rundown
[vista@vista ivaldes]$ gtm

GTM>S DUZ=9

GTM>D ^XUP

Setting up programmer environment
This is a TEST account.

Terminal Type set to: C-VT320

Select OPTION NAME: EVE
     1   EVE       Systems Manager Menu
     2   EVENT CAPTURE (ECS) EXTRACT AU  ECX ECS SOURCE AUDIT     Event Capture
(ECS) Extract Audit
     3   EVENT CAPTURE DATA ENTRY  ECENTER     Event Capture Data Entry
     4   EVENT CAPTURE EXTRACT  ECXEC     Event Capture Extract
     5   EVENT CAPTURE MANAGEMENT MENU  ECMGR     Event Capture Management Menu
Press <RETURN> to see more, '^' to exit this list, OR
CHOOSE 1-5: 1  EVE     Systems Manager Menu

WARNING -- TASK MANAGER DOESN'T SEEM TO BE RUNNING!!!!

Select Systems Manager Menu Option: taskman Management

WARNING -- TASK MANAGER DOESN'T SEEM TO BE RUNNING!!!!

Select Taskman Management Option: taskman Management Utilities

Select Taskman Management Utilities Option: r
    1    Remove Taskman from WAIT State
    2    Restart Task Manager
CHOOSE 1-2: 2  Restart Task Manager
ARE YOU SURE YOU WANT TO RESTART TASKMAN? NO//YES  (YES)
Restarting...%GTM-E-JOBFAIL, JOB command failure
%GTM-I-TEXT, Error redirecting stdout (creat) to _ZTM0.mjo
%SYSTEM-E-ENO13, Permission denied

%GTM-E-JOBFAIL, JOB command failure
%GTM-I-TEXT, Failed to set STDIN/OUT/ERR for the job

GTM>h
[vista@vista ivaldes]$ whoami
vista
[vista@vista ivaldes]$ pwd
/home/ivaldes
[vista@vista ivaldes]$ cd /home/vista
[vista@vista ~]$ cd log
bash: cd: log: No such file or directory
[vista@vista ~]$ echo $gtmgbldir
/home/vista/EHR/g/mumps.gld
[vista@vista ~]$ cd EHR
[vista@vista EHR]$ ls
env2  g  logs  o  r  WVEHR-gui  WVEHR-gui.log  WVEHR-VOE1.0-GTM-Routines.tgz
[vista@vista EHR]$ cd logs
[vista@vista logs]$ ls
XWBTCPL.mje  XWBTCPL.mjo
[vista@vista logs]$ gtm

GTM>S DUZ=9

GTM>D ^XUP

Setting up programmer environment
This is a TEST account.

Terminal Type set to: C-VT320

Select OPTION NAME: EVE
     1   EVE       Systems Manager Menu
     2   EVENT CAPTURE (ECS) EXTRACT AU  ECX ECS SOURCE AUDIT     Event Capture
(ECS) Extract Audit
     3   EVENT CAPTURE DATA ENTRY  ECENTER     Event Capture Data Entry
     4   EVENT CAPTURE EXTRACT  ECXEC     Event Capture Extract
     5   EVENT CAPTURE MANAGEMENT MENU  ECMGR     Event Capture Management Menu
Press <RETURN> to see more, '^' to exit this list, OR
CHOOSE 1-5: 1  EVE     Systems Manager Menu

WARNING -- TASK MANAGER DOESN'T SEEM TO BE RUNNING!!!!

Select Systems Manager Menu Option: taskman Management

WARNING -- TASK MANAGER DOESN'T SEEM TO BE RUNNING!!!!

Select Taskman Management Option: taskman Management Utilities

Select Taskman Management Utilities Option: r
    1    Remove Taskman from WAIT State
    2    Restart Task Manager
CHOOSE 1-2: 2  Restart Task Manager
ARE YOU SURE YOU WANT TO RESTART TASKMAN? NO//YES  (YES)
Restarting...TaskMan restarted!

Select Taskman Management Utilities Option: mtm  Monitor Taskman

Checking Taskman.   Current $H=61222,63435  (Aug 14, 2008@17:37:15)
                      RUN NODE=61222,63423  (Aug 14, 2008@17:37:03)
Taskman is current..
Checking the Status List:
  Node      weight  status      time       $J
 EHR:vista          RUN      T@17:37:03   3863      Main Loop

Checking the Schedule List:
     Taskman has 1 task scheduled.
     It is not overdue.

Checking the IO Lists:
     There are no tasks waiting for devices.

Checking the Job List:
     There are no tasks waiting for partitions.
     For EHR:CACHEWEB there is 1 tasks.  Out Of Service

Checking the Task List:
     There are no tasks currently running.
     On node EHR:vista there is  1 free Sub-Manager(s). Status: Run

Enter monitor action: UPDATE// ^

Select Taskman Management Utilities Option: halt

Do you really want to halt? YES//

Logged out at Aug 14, 2008 5:37 pm
GTM>h
[vista@vista logs]$

K.S. Bhaskar Date: Thu, 14 Aug 2008 20:10:11 -0400

Ignacio --

You really ought to consider journaling. See how it's set up on the latest Toasters, for example, and see how simple it is. The Toaster has a small shell script that automatically recovers the database from the journal file on boot up and even starts up Taskman. Of course, if you like to practice typing... 8-)

Regards -- Bhaskar

I, Valdes Date: Fri, 15 Aug 2008 06:15:26 -0700 (PDT)

Many years as a software engineer before medical school ruined the joy of typing as well as video games for me... Can you please post the script to this thread? -- IV

K.S. Bhaskar Date: Fri, 15 Aug 2008 09:50:11 -0400

Ignacio --

You can adapt the following to your needs. You will need to turn on before-image journaling.

The script /etc/init.d/wvehrvoe10 is automatically executed by the system when it is booted or shut down:


#! /bin/bash
### BEGIN INIT INFO
# Provides:          wvehrvoe10
# Required-Start:    $local_fs
# Required-Stop:     $local_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: PIP V0.1
# Description:       Starts and Stops WorldVisA EHR VOE/ 1.0
### END INIT INFO

# Author: K.S. Bhaskar <bhas...@worldvista.org>

# Do NOT "set -e"

NAME=wvehrvoe10
PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="WorldVistA EHR VOE/ 1.0"
SCRIPTNAME=/etc/init.d/$NAME

#
# Function that starts WorldVistA EHR VOE/ 1.0
#
do_start()
{
        su -c /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstart wvehr

}

#
# Function that stops WorldVistA EHR VOE/ 1.0
#
do_stop()
{
        su -c /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstop wvehr

}

case "$1" in
  start)
        do_start
        ;;
  stop)
        do_stop
        ;;
  restart|force-reload)
        do_stop
        do_start
        ;;
  *)
        echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
        exit 3
        ;;
esac

:

It calls the script /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstart to recover the database (effectively a no-op if it was shut down cleanly, starts Taskman, and removes journal files that are more than three days old (this is for a demo; adjust to your needs):


#!/bin/bash
cd `dirname $0`
rm -f tmp/*.mj[oe]
source ./env
$gtm_dist/mupip journal -recover -backward g/mumps.mjl \
 && $gtm_dist/mupip set -journal="enable,on,before" -file g/mumps.dat \
 && ./run START^ZTMB
find g -iname mumps.mjl_* -mtime +3 -exec rm -v {} \;
 

The script /opt/wvehrvoe10/gtm_V5.3-001_i686/wvehrstop stops Taskman and attempts a clean shut down (not always possible):


#!/bin/bash
cd `dirname $0`
source ./env
./run STOP^ZTMKU <<EOF
y
y
h
EOF
sleep 5
ps -ef | grep mumps | grep -v grep | awk '{print $2}' | xargs kill
2>/dev/null
 

I use a small script /opt/wvehrvoe10/gtm_V5.3-001_i686/env to set environment variables:


# env - file to be sourced to create VistA environment
#
# This temporary version of the commands to set up the VistA
# environment assumes that the parent and child use the same
# version of GT.M.

export gtmver=`basename $PWD`
if [[ -d ../parent ]] ; then
  pushd ../parent/$gtmver 1>/dev/null
  source ./env
  popd 1>/dev/null
fi

tmp=`dirname $PWD`
tmp0="$PWD/o($PWD/p $PWD/r $tmp/p $tmp/r)"

# If there is an existing $routines, this environment comes before it
if [[ -n $routines ]] ; then
  export routines="$tmp0 $routines"
else
  export routines="$tmp0"
fi

# If a mumps.dat exists (vs. mumps.dat.gz) then this a usable environment
if [[ -f $PWD/g/mumps.dat ]] ; then export vista_home=$tmp ; fi

source gtm/gtmprofile
export gtmgbldir=$PWD/g/mumps.gld
export gtmroutines="$routines $gtm_dist"
 

The net of this is that when the Toaster boots, the database is recovered, and Taskman started. It doesn't matter whether the system was shut down cleanly or whether it crashed. I suggest that production VistA environments, especially in non-ASP environments, be set up along the lines of the Toaster.

Regards -- Bhaskar


Nancy Anthracite Date: Fri, 15 Aug 2008 10:30:47 -0400

Note that that using the script to start and stop VistA itself is not recommended.

The menu system should be used for starting the system, and if you insist on using a script, Expect would be preferable as it would use the menu system. Currently AND the correct routine that runs with the option that is used for Taskman in the Menu system is RESTART^ZTMB.

By using the menu system, you know as best as is possible that patches and checks and balances will be taken into account.

There is a similar startup routine that directly calls routines for starting VistA for use with Cache circulating.

Doing things the "easy way" looks great when you want to do a demo, but for productions systems, think seriously about using the menu system. You can consolidate several items in the menu system into one menu if that would make it easier for you, but please don't circumvent the checks and balances. -- Nancy Anthracite

K.S. Bhaskar Date: Fri, 15 Aug 2008 10:35:02 -0400

Nancy --

Whether for production or for demo purposes, the reason to script Taskman startup is to facilitate the packaging of VistA as an appliance. Are you saying that the wvehrstart script should use RESTART^ZTMB instead of START^ZTMB?

Regards -- Bhaskar

Nancy Anthracite Date: Fri, 15 Aug 2008 10:57:24 -0400

RESTART instead of START, yes. -- Nancy Anthracite


kdtop Date: Fri, 15 Aug 2008 15:51:03 -0700 (PDT)

Bhaskar,

I was looking through this script. It looks to me like you are preloading responses for the mumps routine. I was trying to figure out how to do this a year ago and never got a good answer.

So what are you doing here? It looks like you are redirecting standard input. What does that EOF do?

Thanks Kevin

#!/bin/bash
cd `dirname $0`
source ./env
./run STOP^ZTMKU <<EOF
y
y
h
EOF
sleep 5
ps -ef | grep mumps | grep -v grep | awk '{print $2}' | xargs kill
2>/dev/null

K.S. Bhaskar Date: Fri, 15 Aug 2008 23:42:49 -0400

The bash construct (which works on many shells) is, when there is a command such as:

	
grvb -mbg kvtz <<GLZNOP
oinad
mnjbz
GLZNOP

it means run the command grvb -mbg kvtz, and as its STDIN (standard input) feed the lines oinad and mnjbz. The GLZNOP on the command line tells it the marker to look for, and the GLZNOP on a line by itself is a marker that says no more input is available for the command. EOF is just slightly more readable to programmers than GLZNOP, but the shell doesn't care - it just matches the word after the << and the word on a line by itself.

Regards -- Bhaskar

kdtop Date: Sat, 16 Aug 2008 06:54:00 -0700 (PDT)

VERY helpful! Thanks. This opens all kinds of possibilities....

Thanks again, Kevin

Branden Tanga Date: Sun, 17 Aug 2008 04:50:52 -0700 (PDT)

Hello,

While using GT.M journaling is a good idea, that doesn't necessarily mean that you can always recover your VistA database. This is due to the fact that GT.M journals on the GT.M level, which is sets and kills. VistA operates at the Fileman and business logic level, where one Fileman command is made up of multiple sets and kills. Unfortunately, VistA nor Fileman has journaling at it's own level.

So let's say that you have a task in taskman that is executing a Fileman command, which in turn is made up of 10 GT.M sets. Your server dies in the middle of that command, at GT.M set 5. GT.M journaling will allow you to recover to GT.M set 5, but your Fileman call never finished, and you cannot automatically roll back past GT.M set 1 because Fileman has no journal record of it's own, marking set 1. You can manually roll back GT.M past set 1, but that means that YOU the programmer has to know what was being executed, and know to which GT.M set you have to roll back to.

Now imagine if you have multiple tasks running concurrently when your server goes down. GT.M will recover happy as a clam, but you will have multiple Fileman calls in various states of completion. What if rolling back past one Fileman call puts another Fileman call in an invalid state? To my knowledge, you cannot roll forward or back through a GT.M journal file based on process id (please correct me if I am wrong here). So all your sets and kills across all your processes are interspersed with each other in the GT.M log.

So what do you do? When I have lost a server and ended up with the results of an incomplete Fileman call, I had to find the incomplete globals and edit them appropriately. Luckily, for my close calls the end user was available to tell me what they were doing. That made it much easier to find what globals were affected. Thus I have never rolled back through a GT.M journal as a result of server failure, I have only moved forward fixing errors as I find them.

Apologies if you already knew this, but I'm not sure how many people have thought of the ramifications caused by VistA not having a journaling system of its own.

Branden Tanga

P.S. I know that GT.M has the capabilities for an application to leverage its journal file, in essence bringing the journal file to the level of your business logic. Unfortunately, VistA does not take advantage of anything like that, and the VistA or Fileman routines would have to be edited.

K.S. Bhaskar Date: Mon, 18 Aug 2008 09:47:22 -0400

Branden, this is not a GT.M issue, but rather, as you note, a VistA/Fileman design issue, in that while the database engine can provide recovery of database state, without the use of transaction processing features by the application code, you are not guaranteed that the database state is Consistent (referring to the ACID transaction properties of Atomicity, Consistency, Isolation and Durability). I don't know what a transaction might be in the health care arena, but consider transferring $100 from your checking account to your savings account that is implemented by subtracting $100 from your checking account balance and adding $100 to your savings account balance. In the event of a system crash, either both the subtraction and addition operations should be reflected in the state of the database, or neither should be reflected. It is not acceptable for one to be reflected and the other not to be reflected. The MUMPS language provides TStart and TCommit commands that you can bracket your code with and which provides Atomicity. Thus, if the application logic is correct (in our example, the transfer is implemented as a subtraction from one account and an addition of the same amount to the other account), we have Consistency.

As you note, VistA/Fileman does not use MUMPS transaction processing commands, and therefore, when a database state is recovered from a crash, it can, and likely will, be Inconsistent. Since VistA has been designed this way, and has operated for years, my guess is that either (a) from an application point of view, transaction Consistency is not important - for example, if a system crashes during registration, perhaps an incomplete registration means that the patient has to be re-registered, but and the consequence is simply an unused serial number or (b) there is application logic to search for and correct Inconsistencies.

It would be good to hear from some application experts on this topic. Thank you very much.

Regards -- Bhaskar


fred trotter Date: Mon, 18 Aug 2008 10:12:00 -0500

Is it a true statement that ACID compliance for VistA could be implemented entirely in FileMan? Or would it require more fundamental changes in other places?

The problem with Brandens story is that his workaround for a non-ACID crash was to leverage extensive knowledge of how VistA works to figure out where it was broken. Essentially these kinds of efforts prevent the "kernelization" of VistA. Important details of how the VistA/MUMPs works are required in order to fix this type of problem. Issues like these ensure that VistA usage grows only as fast as VistA "kernel" expertise, and that grows slowly indeed.

If the VistA project cannot find a way past these kinds of issues it will be eclipsed by other FOSS projects. Either by VistA-based efforts like WebVistA (knowing that it is difficult to tell what that looks like) or by other efforts like OpenMRS, Tolven and ClearHealth proper.

It seems clear that Baskar has done his part. He has exposed an API from GTM to handle this issue.

What now?

-- Fred Trotter

K.S. Bhaskar Date: Mon, 18 Aug 2008 11:58:25 -0400

Fred --

You are thinking like a programmer and not like a business person. Remember that things like ACID properties (and more esoteric things like two phase commit) are technologies intended to assist in business continuity in the face of unplanned events. As a geek at heart, I keep reminding myself that technology is only a means to an end, and not an end unto itself. VistA (at least DHCP) existed well before ACID properties and seems to run well. So, I think the questions to ask (before imposing a requirement of ACIDity) are:

Do the business processes of health care require ACID transaction properties or are the business processes inherently robust in the face of non-Atomicity and non-Consistency? [Isolation and Durability are not at issue here.] If this is the case, is a requirement of ACIDity like requiring brake fluid for restaurants?

If the answer is that the business processes of health care (at least as addressed by VistA) are not inherently robust in the face of non-Atomicity and non-Consistency, then what mechanisms currently exist in VistA that provide these requirements?

Until we look at the above questions first, looking at ACIDity is like putting the cart before the horse. Branden was not the first to experience a VistA system crash. Let's find out what others have done before him after recovering from a crash.

Regards -- Bhaskar

George Timson Date: Mon, 18 Aug 2008 09:44:51 -0700 (PDT)


Fred Trotter asks: >Is it a true statement that ACID compliance for VistA could be implemented entirely in FileMan? Or would it require more fundamentalchanges in other places?


No, it is not a true statement, because other VistA code changes the database without going thru FileMan calls.

Fred comments:

> It seems clear that Baskar has done his part. He has exposed an API from GTM to handle this issue.

What Bhaskar exposed was transaction-processing syntax that has been in the MUMPS Standard for a long time, but which the VA chose not to use. GTM of course is to be commended for implementing the MUMPS Standard!  ;-)

Fred asks:

> What now?

Well, if someone wants to fund a man-year of retrofitting all VA code with the TS and TC commands, maybe the VA would be willing to change their (SAC) standard, and test and distribute hundreds of transaction-processing changes to their code. But I doubt it, when they don't even take bug-fixes and functionality enhancements from the outside.

Woodhouse Gregory Date: Mon, 18 Aug 2008 09:47:36 -0700

Production VistA systems normally use journalling. Other measures include the use of RAID and UPS devices. For historical reasons (lack of uniform support across MUMPS implementations) VistA systems have not used transactions. This is no longer the case, but there is plenty of legacy code out there that does not use transactions. Instead, it was/is necessary to restore journaled globals explicitly.

In response to Fred's question: Fileman does not provide ACID support directly: this needs to be handled by the underlying MUMPS system. The role of Fileman is to provide a higher level abstraction than MUMPS globals, and to provide various tools (import/export, reporting, query and update, etc.) Screenman and the Classic APIs also provide (character based) UI support.

Metaphors be with you.

fred trotter Date: Mon, 18 Aug 2008 11:54:55 -0500

K.S. Bhaskar wrote:

> Fred -- You are thinking like a programmer and not like a business person.

No exactly the opposite.

> As a geek at heart, I keep reminding myself that technology is only a means to an end, and not an end unto itself. VistA (at least DHCP) existed well before ACID properties and seems to run well.

Under the care and feeding of highly trained experts who do nothing else.

My point is not at all that we need ACID, my point is this:

If system crashes require in-depth knowledge of MUMPS/FileMan/VistA to fix, then users cannot treat VistA as a "kernel". By "kernel" I mean a reliable platform whose internal workings can safely be ignored if certain requirements are respected (i.e. the right hardware, MUMPS implementation, etc etc.)

It would be entirely fine for me to have the VistA community say "Backup VistA every hour. If the system crashes, reinstall the most recent good backup, and send a alert that 1 hours worth of data has been potentially lost"

That's not great... ACID would be better but that is what you had to do with MySQL for a long time and is an acceptable work-around.

Unacceptable answer is "Use your extensive understanding of VistA internal state to correct the values of Globals that were in use at the time of the crash"

That answer implies that you must be a MUMPS expert to support VistA which is intractable. I am not a C expert but I use the C-based linux kernel all the time.

I am talking about a business problem in the context of one technical solution, but my concern is about the business problem.

-- Fred Trotter

Woodhouse Gregory Date: Mon, 18 Aug 2008 10:11:17 -0700

On Aug 18, 2008, at 9:44 AM, George Timson wrote:

> Fred Trotter asks: Is it a true statement that ACID compliance for VistA could be implemented entirely in FileMan? Or would it require more fundamental changes in other places?

> No, it is not a true statement, because other VistA code changes the database without going thru FileMan calls.

This a perennial problem with VistA code. I've long argued that developers should resist the urge to manipulate Fileman globals directly, but even if everyone stopped today, there would still be plenty of code that bypasses Fileman. Another, perhaps more insidious, problem is that developers and systems personnel often manipulate globals to correct errors ("crashes").


> Well, if someone wants to fund a man-year of retrofitting all VA code with the TS and TC commands, maybe the VA would be willing to change their (SAC) standard, and test and distribute hundreds of transaction-processing changes to their code. But I doubt it, when they don't even take bug-fixes and functionality enhancements from the outside.

The SAC has been revised to allow the the use TS and TC, but that doesn't address the legacy code problem (the issue you address above).

Steven McPhelan Date: Mon, 18 Aug 2008 13:35:06 -0400

George stated "...if someone wants to fund a man-year of retrofitting all VA code with the TS and TC commands.." I understand that George was making a different point. I do not think that one man year is even close to sufficient time to rewrite all the existing VA code to be TP compliant. To make the changes, QA it, and release it would be a very large task indeed. Then it does no good as George implied to undertake such a task and to not put in place the structure to mandate and enforce that all new code from that point forward would only use TP procedures.

All of this is predicated upon the assumption that load testing of such rewritten code to be TP compliant shows that there is no decrease in the number of the transactions filed per time period without the requirement to upgrade the hardware to handle TP vs non-TP processing. I won't get into the practical issues of how the existing code would handle TP rollbacks because the filing failed. For good or bad, many VistA programs file data and proceed on with no checks to see if the filing of the data was indeed successful.

-- Steve "Rest satisfied with doing well, and leave others to talk of you as they please." - Pythagoras

fred trotter Date: Mon, 18 Aug 2008 12:45:35 -0500

> So what do you do? When I have lost a server and ended up with the results of an incomplete Fileman call, I had to find the incomplete globals and edit them appropriately. Luckily, for my close calls the end user was available to tell me what they were doing. That made it much easier to find what globals were affected. Thus I have never rolled back through a GT.M journal as a result of server failure, I have only moved forward fixing errors as I find them.

Ok, I will make my question more specific. Is this paragraph illustrative of how to handle a crash moving forward? If this is how crashes are handled, then this is a problem. If there is another procedure that can be followed, then it is important enough to have a description on the WorldVistA wiki. Or to have a link from the wiki to an already published solution. To help, I have created the page:

HTH, -FT

fred trotter Date: Mon, 18 Aug 2008 13:07:19 -0500

Going on to discuss the pure technical issue:

Is there no way to do this on a meta level? What about executing TS and TC commands before and after every routine. So that at a minimum you know roughly in which routine the failure took place.

Perhaps you could have some "named idle journal". So that you could automatically roll back to a time when at the least nothing was happening on the system.

Any time I suggest something like this I usually get back that something like this already happens, or Baskar tells me that GTM already does something like this. I know I am way way over my head with regards to how MUMPS works....

-- Fred Trotter

Woodhouse Gregory Date: Mon, 18 Aug 2008 11:15:40 -0700

On Aug 18, 2008, at 8:58 AM, K.S. Bhaskar wrote:

> Do the business processes of health care require ACID transaction properties or are the business processes inherently robust in the face of non-Atomicity and non-Consistency? [Isolation and Durability are not at issue here.] If this is the case, is a requirement of ACIDity like requiring brake fluid for restaurants?

> If the answer is that the business processes of health care (at least as addressed by VistA) are not inherently robust in the face of non-Atomicity and non-Consistency, then what mechanisms currently exist in VistA that provide these requirements?

This is interesting. It seems uncontroversial that database integrity is a requirement for health information systems (for example, we wouldn't want a penicillin allergy to be "lost"). In the ACID model, I would be hard pressed to say which of the four properties (atomicity, consistency, isolation and durability) can be dispensed with. But what is less obvious is that the ACID approach is the only route to database integrity. Thee latest ACM Queue takes this on with a little column whimsically entitled "BASE: an alternative to ACID"

Results like the CAP theorem have interested me for some time, given that I am interested in (developing) alternatives to heavy-handed approaches database consistency like message ordering (frequently employed in HL7).

Anyway, the CAP theorem is just another version of a well-known dilemma in database programming: in choosing between the 2-phase and 3-phase commit, you are forced to choose between an algorithm that can fail, even when updating the database is safe, and one that can block indefinitely.

"It is never too late to become reasonable and wise; but if the insight comes too late, there is always more difficulty in starting the change." -- Immanuel Kant


Woodhouse Gregory Date: Mon, 18 Aug 2008 11:23:46 -0700

Free associating a bit, I can't help but think of a famous result in (mathematical) model theory called Löb's Theorem. It states that a system cannot assert its own soundness without being inconsistent.

fred trotter wrote:

> Going on to discuss the pure technical issue: Is there no way to do this on a meta level? What about executing TS and TC commands before and after every routine. So that at a minimum you know roughly in which routine the failure took place.

> Perhaps you could have some "named idle journal". So that you could automatically roll back to a time when at the least nothing was happening on the system.

> Any time I suggest something like this I usually get back that something like this already happens, or Baskar tells me that GTM already does something like this. I know I am way way over my head with regards to how MUMPS works....

This is a good question. It shouldn't be difficult to write a meta- interpreter of the type you describe, though I'm unsure what the performance implications would be.

Basically, you're running into the legacy code problem. Modern MUMPS implementations do support ACID transactions, but this facility was not available when the bulk of VistA was developed. This has led to a controversy between people arguing that it is not feasible to build transaction support into VistA, and people (like me) that argue that it is essential to do so. Unfortunately, this generally mutates into a highly emotional debate over the use of MUMPS, which is not the point at all.


fred trotter Date: Mon, 18 Aug 2008 14:33:08 -0500

I agree that ACID vs no ACID is probably a waste of time. Any practical suggestions for workarounds for VistA rebuilding?

-- Fred Trotter

Woodhouse Gregory Date: Mon, 18 Aug 2008 12:39:32 -0700

>> So what do you do? When I have lost a server and ended up with the results of an incomplete Fileman call, I had to find the incomplete globals and edit them appropriately. Luckily, for my close calls the end user was available to tell me what they were doing. That made it much easier to find what globals were affected. Thus I have never rolled back through a GT.M journal as a result of server failure, I have only moved forward fixing errors as I find them.

> Ok, I will make my question more specific. Is this paragraph illustrative of how to handle a crash moving forward? If this is how crashes are handled, then this is a problem. If there is another procedure that can be followed, then it is important enough to have a description on the WorldVistA wiki. Or to have a link from the wiki to an already published solution. To help, I have created the page:

It's close - far too close for my comfort. Production systems should always be journaled, but I suspect many people here who may be developers, or who may be just "kicking the tires", may not enable journaling.

"Think globally, act locally." --René Dubos

Chris Richardson Date: Mon, 18 Aug 2008 14:23:35 -0700

Well, guys, there is nothing left to do but contact your congressmen about this and start a grass-roots effort to get this funding. It would be embarrassing if a foreign government might pay for our software to be properly updated.


Branden Tanga Date: Thu, 21 Aug 2008 20:27:34 -0700 (PDT)

Sorry to bring up a seemingly dead topic, but I haven't kept up with this thread over the past few days.

I don't see code that directly edits globals as the major issue. The main problem as I see it, is that having transactional processing built into Fileman is not good enough to be able to safely roll back and forward through a VistA log. In the same way that a single Fileman call is made up of multiple Mumps sets and kills, a single VistA transaction can be made up of multiple Fileman calls. So you would need code in VistA itself that defines what a "transaction" is. Likely, this definition would be different for each module in VistA. There is no way for a pure programmer like me to denote VistA transactions, you would need domain experts for each module to mark which action or group of actions are a transaction.

I totally agree, my solution is not optimal. When I had a server failure, I was faced with 2 options:

  1. Figure out where in the GT.M journal to roll back to
  2. Figure out how to fix the globals manually, and move on.

Because of the risk that rolling back through the journal may cause other Fileman calls to be incomplete, and the ridiculous amount of time it would take to figure out which exact GT.M set or kill I needed to roll back to, I chose #2. I talked to my end users to figure out what they were doing, edited the necessary globals, and if their actions were "finished", then I considered the database recovery as complete as possible. In short, I had to choose the lesser of 2 evils, which was to fix the globals manually and move on.

Skip Ormsby Date: Fri, 22 Aug 2008 07:01:30 -0400

If my creaky old brain remembers correctly, one of the reasons for non-traction processing is because of code like this (before the unsubscripted kills prevention was implemented and the New command, although there are still plenty of times the Kill is used)

  • S DIC=4,DIC(0)="AEMQZ" D ^DIC
  •  ;Now being a good developer since I am making a Classic call I need to do local variable clean up
  • K ^DIC ; Ahh oops

Generally the favorites were ^DD, ^DIC, and ^DPT in no particular order. Solution - read the journal until you find the unsubscripted Kill and clip it out. It may take X amount of time before you actually notice that something has disappeared, so you would have journal activity that needs to be applied post the unsubscripted Kill.

-skip "we have met the enemy and he is us." - Pogo

Steven McPhelan Date: Fri, 22 Aug 2008 08:29:59 -0400

That is how I have always handled this problem in the past so very long ago since I have not had to do this in years. That is the purpose of the journal which is to bring a backup copy up to date with all the transactions since that backup by dejournaling. If there was something I knew I dd not want to happen (Skip's K ^DIC example), we would edit the journal file to remove the offending code and then proceed with the normal dejournaling procedures. Of course if you are not journaling then you do not have this option.

I have not looked in years, are the journal files still just text files or have they been "updated and improved"?

K.S. Bhaskar Date: Fri, 22 Aug 2008 10:08:17 -0400

Branden --

There was some off-list discussion of this topic. To summarize, when the VA runs VistA, they rely on the MUMPS implementation to restore the database to the state it was in just before the crash (of course, they use computer hardware and operating systems that don't make a habit of crashing). A combination of their business processes and VistA application logic is such that after a crash, they don't usually need to go in and make changes - in other words, their business processes are in a good state when the MUMPS system recovers the database and VistA is restarted.

I speculate that when units of work need to be done, they are put in queues (in the database) for Taskman background processes to handle, and the design of Taskman is such that when the database is recovered, it picks up unfinished work from the queues. But this is just a guess.

Regards -- Bhaskar


Steven McPhelan Date: Fri, 22 Aug 2008 16:11:28 -0400

If the taskman globals are journaled, they will be recovered and Taskman will start where he left off. However, any existing jobs running at the tiime of the crash will not be restarted.

Skip Ormsby Date: Fri, 22 Aug 2008 20:15:58 -0400

As long as the subject is about power outages, at the hospital I was at we had no break power for 8 PDP 11-44s that would run the critters for 15-20 minutes, which was long enough for either the main generator to kick in or for one of us to gracefully shut the systems down. When we went to the MSM/486 configuration, we made sure that all of the parallel bars were zipped tied and the plugs were zipped tied so the didn't accidentally come out of the plug. And of course the no break power would last a very long time, but we never pushed it past 1/2 hour. The biggest problem was more in line with a disk controller going nuts, or a nic card that would go berzerk, or bad memory, which in turn would put curd into the data base. For us it was a case of fix and forget it, because there were other fish to fry. Never did have a real power outage to the computer circuit, even when the room was flooded from a broken pipe in the ceiling. Lost the lights, etc., but computer kept right on humming until we shut them down in a speedy, graceful shutdown.

-skip "we have met the enemy and he is us." - Pogo

Episode 7 Log Homepage Episode 9