-1- Minutes of the 1st meeting - Montpellier, September 5 -2- First status report dated 20 September. -3- Minutes of the 2nd meeting - Montpellier, December 6
The above documents are appended.
-1- CNUSC will send an organisation chart to be included in the final report. -2- CNUSC will define an emergency procedure available from any node 24 hours a day, 7 days a week. -3- CNUSC will define a new procedure for sending messages to LINKFAIL. -4- Eric Thomas will change the 'REPLY' option of LINKFAIL to 'REPLY to SENDER'. -5- CNUSC and CUNY will co-ordinate their technique to restart the line. The objective is to have less than 5% downtime (monthly average). If a regular restart is chosen, it should be 5 minutes or less. -6- It is recommended to CNUSC to investigate the possibility to have a dedicated computer as EARN international node. -7- CNUSC is requested to produce statistics on lines down time (including CPU down time). -8- CNUSC is asked to look at the possibility to have trained operators 24 hours a day, 7 days a week. -9- CNUSC and CERN are requested to investigate the problem of idle line even when queues are not empty and to propose a solution. -10- It is recommended to CERN to use RSCS V2 to communicate on lines with RSCS V2 at the other end. -11- CNUSC is requested to suppress the priority ageing on files queued on international lines, and, more generally, to implement all EARN directives.
-1- Done. See previous minutes/status report. -2- Done. See previous minutes/status report. -3- Done since January 15, 1989. -4- Done. See previous minutes/status report. -5- Done. See previous minutes/status report. Notes: A) CNUSC restarts the line every 3 minutes and CUNY every ----- 5 minutes. B) Due to satellite problems, the line between FRMOP22 and CUNYVM is down at least once every day (but not for a long time). -6- No immediate action. See first status report. Checkpoint next April. -7- LMON, the Eric Thomas's procedure cannot easily be adapted to JES2. Other ways (such as NETMASTER) are investigated, No implementation date available now. Checkpoint next April. -8- Not possible. See first status report for more details. -9- The problem disappeared with the use of the 64 Kb/s line since February 2nd. -10- Done. See previous minutes/status report. -11- Done. See previous minutes/status report, and report on implementation of directives/recommendations.
Participants: J.L. Ambrosino CNUSC - Montpellier Michel Auffret CNUSC - Montpellier Alain Auroux EARN - Paris Jose Maria Blasco GMD - Bonn Jean-Loic Delhaye CNUSC - Montpellier Olivier Martin CERN - Geneve Eric Thomas CERN - Geneve
As Montpellier is the most important EARN international node, with nearly 50% of the international lines, including both 64 Kb/s lines, the availability of this node should be very high, particularly on the most heavily used lines (to USA, CERN and Germany).
In the past few months, there have been complaints on the availability of this node, and more precisely on the availability of the lines to CUNY and to CERN.
The objective of this meeting was to review the validity of these complaints, the actions already taken by Montpellier, and to recommend further actions if needed.
-1- Difficulty to contact people in Montpellier having technical networking responsibilities on EARN. J.M. Blasco distributed an organization chart of GMD Bonn showing the responsibilities of various people involved in EARN. ACTION: J.L. Delhaye will provide the same information for Montpellier. -2- When a problem in Montpellier is detected from another node, it is difficult to get an answer/explanation from Montpellier when Dominique Dumas is absent (week-end, night...). ACTION: Montpellier will define an emergency procedure available from any node 24 h/day. This procedure will not be through telephone, as many people have difficulty to access foreign telephone numbers. -3- Messages posted on LINKFAIL by Montpellier are not explanatory enough and are sent using a standard form. ACTION: Montpellier will define a new procedure for sending messages to LINKFAIL: - Reasons for failures will be explained - Messages will be signed, and it will be possible to reply to the sender (for example to ask for additional information). - Scheduled downtime (maintenance) will be announced only once. - Link down will not be reported if shorter than one hour - Unscheduled link down will be explained afterwards if longer than one hour (at least on the major links, and during prime shift). ACTION: Eric Thomas will change the 'REPLY' option of LINKFAIL to 'REPLY to SENDER'. -4- On the line MOP-CUNY, the technique used at both ends of the line to restart it is different (MOP restarts every 10 minutes, and CUNY when a cut down is detected). This leads to an unacceptable down time. ACTION: CNUSC and CUNY will coordinate their technique to restart the line. The objective is to have less than 5% downtime (monthly average). If a regular restart is chosen, it should be 5 minutes or less. -5- The line MOP-CERN was not reliable enough during the past months. A new communication controller is installed at CERN since 17/8. No specific action is needed. -6- Based on other international nodes experience, a small, dedicated computer is generally more reliable than a big, general purpose, computer. As FRMOP22 is the EARN international node with the greater traffic and the greater number of lines, it would be better to have all EARN international lines in Montpellier on a small, dedicated computer. RECOMMENDATION It is recommended to CNUSC to investigate the possibility to have a dedicated computer as EARN international node. ACTION CNUSC is requested to produce statistics on lines down time (including CPU down time). -7- FRMOP22 operators are present only from 6:00 to 20:00 (week days) and 6:00 to 13:00 (Saturday). No operator on Sunday. ACTION CNUSC is asked to look at the possibility to have trained operators 24 hours a day, 7 days a week. -8- It is a protocol problem on the CNUSC-CERN line: it happens that, after a line failure and a restart from CERN, JES fails in attempting to re-establish the connection. This means that, although the line is up, it is in wait state at both ends, and no files are sent, even when the queues are not empty. Automatic procedures to restart the line are useless, it is then needed for operators at both ends to speak to each other. If this happens when it is no operator at CNUSC, the line stays down until the next day (or next Monday if during week-end). ACTION: - See point 7 above - CNUSC and CERN are requested to investigate this problem, and to find a solution. - It is recommended to CERN to use RSCS V2 to communicate on lines with RSCS V2 at the other end. -9- CNUSC uses a priority aging on files queued on the EARN lines: this may cause a huge file to be sent before smaller ones. This is in conflict with the EARN directive number 1, adopted by the BoD in April 1988. ACTION: CNUSC is requested to suppress the priority aging on files queued on international lines, and, more generally, to implement all EARN directives.
-1- CNUSC will send an organization chart to be included in the final report. Due date: 12 Sep. -2- CNUSC will define an emergency procedure available from any node 24 hours/day, 7 days/week. Due date for procedure definition: 12 Sept. -3- CNUSC will define a new procedure for sending messages to LINKFAIL. Due date for procedure definition: 12 Sept. -4- Eric Thomas will change the 'REPLY' option of LINKFAIL to 'REPLY to SENDER'. -5- CNUSC and CUNY will coordinate their technique to restart the line. The objective is to have less than 5% downtime (monthly average). If a regular restart is chosen, it should be 5 minutes or less. -6- It is recommended to CNUSC to investigate the possibility to have a dedicated computer as EARN international node. -7- CNUSC is requested to produce statistics on lines down time (including CPU down time). -8- CNUSC is asked to look at the possibility to have trained operators 24 hours a day, 7 days a week. -9- CNUSC and CERN are requested to investigate the problem of idle line even when queues are not empty and to propose a solution. -10- It is recommended to CERN to use RSCS V2 to communicate on lines with RSCS V2 at the other end. -11- CNUSC is requested to suppress the priority aging on files queued on international lines, and, more generally, to implement all EARN directives.
-1- CNUSC will send an organization chart to be included in the final report. Due date: 12 Sep. Done. Enclosed are the organisation charts from GMD - Bonn, Montpellier and CERN
Name Function EARN Phone Jose Maria Blasco All DEARN JMBLASCO@DEARN +49 228/81996-51 Manfred Bogen Concerns, MABOGEN@DEARN +49 228/81996-50 VM, RSCS, LISTSERV, EARNCC. Jochen Hirsch Lines,NCP- GRZ028@DBNGMD21 +49 228/81996-44 System-Generation Peter Sylvester EARNunder GRZ027@DBNGMD21 +49 228/81996-45 MVS, NJE, JES2 Peter Wunderling SNI, SNA, XI, GRZ017@DBNGMD21 +49 228/81996-46 X.25,AGFNet. Klaus Birkenbihl Director of GRZ003@DBNGMD21 +49 228/81996-41 Computing Center
Name Function EARN Phone Dominique Dumas EARN Country BRUCH@FRMOP11 +33 67 14 14 14 Coordinator (NCC) Jean-Paul Sauter Telecom. lines SAUTER@FRMOP11 +33 67 14 14 14 and NCP, SNA, XI Jean Oudeville etc... JEAN@FRMOP11 +33 67 14 14 14 Jean-Louis Ambrosino Operations manager AMBROSI@FRMOP11 +33 76 14 14 14 Jean-Loic Delhaye Technical director DELHAYE@FRMOP11 +33 76 14 14 14 Jean-Claude Ippolito Director of CNUSC IPPOLIJ@FRMOP11 +33 67 14 14 14
Olivier Martin All CEARN and CERN MARTIN@CEARN +41 22 83-4894 with respect to the EARN service at Cern, in general. Central Cern Operations Everything CONSOLE@CERNVM +41 22 83-5011 (e.g. line problems) should normally go through them. Eric Thomas On a voluntary ERIC@LEPICS +41 22 83-4992 basis, only! ERIC@CEARN (LISTSERV, etc). Dave Underhill Cern Operations (IBM) DJUCT@CERNVM +41 22 83-4920 Mick Draper Network Operations MICK@CERNVM +41 22 83-3348
-2- CNUSC will define an emergency procedure available from any node 24 hours/day, 7 days/week. Due date for procedure definition: 12 Sept. PUPITRE@FRMOP11 will be the emergency contact, starting 1st of October. The owner of this network address will be responsible for any further action needed. -3- CNUSC will define a new procedure for sending messages to LINKFAIL. Due date for procedure definition: 12 Sept. Starting 1st of November, the above mentioned network address (PUPITRE@FRMOP11) will be the LINKFAIL interface. Operators are presently being trained to efficiently send messages to and use information from LINKFAIL. -4- Eric Thomas will change the 'REPLY' option of LINKFAIL to 'REPLY to SENDER'. Done. Today all LINKFAIL peers are Reply-To=Sender. -5- CNUSC and CUNY will coordinate their technique to restart the line. The objective is to have less than 5% downtime (monthly average). If a regular restart is chosen, it should be 5 minutes or less. Starting 1st of October, Montpellier will restart all EARN international lines every 5 minutes. CUNY has been requested by Montpellier to react to line failure as quickly as possible. -6- It is recommended to CNUSC to investigate the possibility to have a dedicated computer as EARN international node. After a 6 month period, Montpellier will review the results of the actions taken which should improve the availability of the Montpellier international lines. If the improvement is not good enough (down time higher than 5% on major lines) Montpellier will study the possibility to have a dedicated computer as EARN international node. It should be noted that the implementation of the OSI migration plan will have an impact on the infrastructure. -7- CNUSC is requested to produce statistics on lines down time (including CPU down time). This will be done starting 1st of October. -8- CNUSC is asked to look at the possibility to have trained operators 24 hours a day, 7 days a week. For budget reasons, this is not possible. However, night operators will be trained to take simple actions, and to call at any time the trained operator on duty if his presence is needed. -9- CNUSC and CERN are requested to investigate the problem of idle line even when queues are not empty and to propose a solution. Since the meeting, the cause of the problem seems to have been identified (monitoring of NACK) and remedies will be taken before the the 1st of October. -10- It is recommended to CERN to use RSCS V2 to communicate on lines with RSCS V2 at the other end. RSCS V2 already installed at CEARN, and will be operational soon. -11- CNUSC is requested to suppress the priority aging on files queued on international lines, and, more generally, to implement all EARN directives. Will be done no later than 1st of October.
Participants : J.L. Ambrosino M. Auffret A. Auroux D. Dumas J. Oudeville J.P. Sauter
References : Minutes of the meeting and first status report dated 20 September.
-1- CNUSC will send an organization chart to be included in the final report. -2- CNUSC will define an emergency procedure available from any node 24 hours a day, 7 days a week. -3- CNUSC will define a new procedure for sending messages to LINKFAIL. -4- Eric Thomas will change the 'REPLY' option of LINKFAIL to 'REPLY to SENDER'. -5- CNUSC and CUNY will coordinate their technique to restart the line. The objective is to have less than 5% downtime (monthly average). If a regular restart is chosen, it should be 5 minutes or less. -6- It is recommended to CNUSC to investigate the possibility to have a dedicated computer as EARN international node. -7- CNUSC is requested to produce statistics on lines down time (including CPU down time). -8- CNUSC is asked to look at the possibility to have trained operators 24 hours a day, 7 days a week. -9- CNUSC and CERN are requested to investigate the problem of idle line even when queues are not empty and to propose a solution. -10- It is recommended to CERN to use RSCS V2 to communicate on lines with RSCS V2 at the other end. -11- CNUSC is requested to suppress the priority aging on files queued on international lines, and, more generally, to implement all EARN directives.
The resources needed to implement some of the recommended actions were initially underestimated, and corresponding implementation dates shifted.
CNUSC is now committing to meet the new dates below.
-1- Done. The chart was included in the first status report. (20 September) -2- Done. PUPITRE@FRMOP11 is the emergency contact since November 1st. However, it is not used very much. A mail is sent on December 15. to LINKFAIL to inform every body. -3- For the moment the messages to LINKFAIL are sent from NETOPER at FRMOP11, through an automatic procedure initiated by the operator. No mail should be sent to this ID. A new procedure will be define and operational on January 1st, 1989 to send messages from PUPITRE at FRMOP11 and operators will be trained to use it. -4- Done. See the 1st status report. -5- Done. Since the 1st of November, CNUSC restarts the line with CUNY every 3 minutes. Since the 1st of December, CUNY restarts the line every 5 minutes. This additional delay was due to the lack of answer from CUNY to the initial requests from CNUSC. Note : Due to satellite problems, the line between FRMOP22 and ---- CUNYVM is down at least once every day (but not for a long time). -6- No immediate action. See first status report. Checkpoint next April. -7- LMON, the Eric Thomas's procedure will be installed on FRMOP11 next January, tested in February and the results published every month starting in March. -8- Not possible. See first status report for more details. -9- The problem of the 9600 Kb line between CEARN and FRMOP22 is not identify, neither by CEARN nor by CNUSC but the problem should be solved with the SNI 64Kb line. -10- Done. See the 1st status report. -11- The CNUSC suppressed the priority aging on October 1st. The directive number 3 cannot be applied by JES2 sites: JES2 is capable of multi-streaming but it is impossible to dedicate a stream to large files. Holding large files during the day is not today technically possible.