scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Human-computer Studies \/ International Journal of Man-machine Studies in 1970"


Journal ArticleDOI
TL;DR: The logarithmic characteristics of acoustic signal in five bands are extracted as features and the measure of similarity between the words of standard and control sequences is calculated by the words maximizing a definite functional using dynamic programming.
Abstract: Experiments on the automatic recognition of 203 Russian words are described The experimental vocabulary includes terms of the language, ALGOL -60 together with others The logarithmic characteristics of acoustic signal in five bands are extracted as features The measure of similarity between the words of standard and control sequences is calculated by the words maximizing a definite functional using dynamic programming The average reliability of recognition for one speaker obtained for experiments using 5000 words is 0·95 The computational time for recognition is 2-4 sec

214 citations



Journal ArticleDOI
TL;DR: A new definition of “recognition” is discussed: this is based on the generation of a stable oscillation in the feedback network, and some tentative comparisons between digital learning nets and the human brain are drawn.
Abstract: The paper initially discusses some of the differences between conventional computer learning algorithms and digital learning nets. Particular attention is given to the division of all sensory information into pattern information and identity information and the relationships of these quantities during a learning operation. Generalization in learning nets is discussed where digital nets are compared with threshold elements. The enhancement of the net's behaviour by means of feedback is the salient topic in the second half of the paper. A new definition of “recognition” is discussed: this is based on the generation of a stable oscillation in the feedback network. Experiments on small learning nets are described to illustrate that nets with feedback can recognize and recall sequences. The final section of the paper draws some tentative comparisons between digital learning nets and the human brain.

8 citations


Journal ArticleDOI
TL;DR: The program automatically generates its own problems in simple arithmetic; evaluates responses to questions by students; provides hints when possible; and uses teaching strategy options and the student's past history in order to decide the type of question to ask him next.
Abstract: This paper describes a digital computer program (COACH) written in Extended ALGOL that operates on the Burroughs B5500 Computer. The program automatically: (1) generates its own problems in simple arithmetic; (2) evaluates responses to questions by students; (3) provides hints when possible; and (4) uses teaching strategy options and the student's past history in order to decide the type of question to ask him next. The specific problem difficulty metrics, problem generation methods, and teaching strategies used in COACH are explained in detail. The program operates interactively and teletypes are used for on-line communication between the student and the computer.

8 citations


Journal ArticleDOI
TL;DR: The use of a computer-standard environment system as a possible solution to the shortage of therapists/researchers in the area of psychopathology is examined in this paper, and the use of such a system could be a solution to many other problems associated with both research and treatment.
Abstract: The shortage of therapists/researchers in the area of psychopathology is examined, and the use of a computer-standard environment system as a possible solution to this shortage and many other problems associated with both research and treatment are considered. Current applications of the computer and other automatic devices in psychotherapy are reviewed, including work with autistic children, phobics and juvenile delinquents. A unified assessment-treatment approach is discussed within the context of a computer-standard environment system considering limitations imposed by the current technology.

7 citations


Journal ArticleDOI
TL;DR: NASA has undertaken a comprehensive program of research on the major aspects of computer handling of speech, including studies of what minimal requirements speech input/output systems must satisfy and how useful certain limited speech recognizers and synthesizers may be.
Abstract: Speech communication with computers is a vital new subject of research which should markedly contribute to versatile man-machine interaction. A comprehensive long-range program must be undertaken if the day is to come when the computer will satisfactorily duplicate human speech communication processes and engage in purposeful, meaningful dialogue with man. NASA has undertaken a comprehensive program of research on the major aspects of computer handling of speech. Included are studies of what minimal requirements speech input/output systems must satisfy and how useful certain limited speech recognizers and synthesizers may be. Studies of automatic speech recognition sponsored by NASA include the LISPER limited speech recognition system, fundamental time-based techniques for speech parameter extraction, experimental comparison of speech parameter extractors, orthogonalized damped sinusoidal analysis of speech, speech perception models, and recognition algorithms. Continuous speech processing studies are being directed towards basic prosodic/prosodemic studies and methods for the segmentation of speech. Speech synthesis studies are being delayed until significant success in recognition is attained. Successful systems for man-machine speech dialogue must rely on extensive continued studies of requisite linguistic models and human communication processes. The broad-based approaches taken in the NASA program should lead to truly versatile speech communication with computers.

7 citations


Journal ArticleDOI
Robert J. Baron1
TL;DR: A classification of various clinical syndromes is suggested which, although neither complete nor unique, forms a reasonable basis from which to develop a better understanding of the mechanisms which underlie the Syndromes.
Abstract: A model for the elementary visual processing networks of the human brain is presented. Two classes of networks are considered: information processing networks, and control networks. The information processing networks select and transform regions of the visual field, and deliver the resultant information patterns for storage, recognition and recall by the permanent memory store. Each of these functions is performed continuously in time. The control networks regulate the flow of information through and determine the specific transformation to be performed by the information processing networks. All decisions regarding information processing are made by the control networks. Specific emphasis is placed on the logic and control algorithms which govern the behavior of the system. Prominent features of the model are: (1) the selection control networks and the memory store are able to directly interact during a visual search; (2) the selection networks are able to generate perspective transformations of the visual field depending on how previous images were stored; (3) control information is stored by the memory system which describes exactly how the current visual images are abstracted from the visual field; and (4) higher order networks are able to access specific stored sensory and control information by making inputs to the permanent memory store. Each of these features is possible because storage, recognition and recall are independent and externally regulated functions of the memory store. The excellent capacity which the model has for storage and recognition is a direct consequence of its structure and organization. Various clinical syndromes of the visual system are considered. It is shown that similar syndromes result from damage to the proposed model in a very natural way. Based on the model, a classification of various clinical syndromes is suggested which, although neither complete nor unique, forms a reasonable basis from which to develop a better understanding of the mechanisms which underlie the syndromes. Computer simulations are described which show the behavior of the proposed model. Two scan strategies—one for general search and one for exact identification of complex visual patterns—are simulated. The regions selected during a visual scan depend on the nature of the scan algorithm. Using a fixed algorithm, the scanned regions vary quite notably when, for example, noise is added to the input image. The fidelity of recall depends on how the visual field is scanned: the closer a particular region is attended, the more accurate is the recalled image. The system stimulated here is able to identify complex visual images even in the presence of noise.

7 citations


Journal ArticleDOI
TL;DR: The results of a demonstration of a simple segmentation algorithm show that speech segmentation as defined is possible by non-human means.
Abstract: A brief argument is presented for the need for automatic speech segmentation both to facilitate automatic speech recognition and for its theoretical linguistic importance. The problem of speech segmentation in the acoustic domain using a digital computer is examined in detail, that is, of determining an acoustic partition in time which has linguistic relevance. This problem is viewed, in more general terms, as that of detecting transitions, in a globally non-stationary process, from one local stationary state to another. Non-stationary analyses are approximated by considering short fixed length time series sections as seen through a window which moves by a fixed increment. Various non-stationary signal representations are explored in order to establish a feature space suitable for applications to segmentation. Spectral representations are generated only as a reference space for comparison of an automatic segmentation procedure with the linguistically determined segmentation of any given speech sample. Temporal representations of the zero crossings of speech signals are explored in detail. In particular the central sample moments of the reciprocal zero crossings as a function of time are used as input to a simple segmentation algorithm. The results of a demonstration of this algorithm show that speech segmentation as defined is possible by non-human means.

6 citations


Journal ArticleDOI
David J. Hall1
TL;DR: This paper reviews 17 projects briefly, and attempts to draw some conclusions about the basic techniques in this field of man-machine-graphics research.
Abstract: Stanford Research Institute has been actively engaged in man-machine-graphics research for several years. During that time, government and commercial projects have been carried out that utilize new techniques, resulting in the nucleus of a new body of knowledge. This paper reviews 17 projects briefly, and attempts to draw some conclusions about the basic techniques in this field. Software and hardware aspects are considered, and some indications are given about trends that might be expected in the future. Some applications discussed are mass data reduction, textmanipulation, production scheduling, multivariate data analysis and graphic data presentation, speech synthesis, robot navigation strategy, meta-compiler systems and high-level-language design for interactive displays.

5 citations


Journal ArticleDOI
TL;DR: The system is quasi-adaptive in the sense that the characteristic parameters of a given word uttered by a certain speaker can be measured and displayed on a set of Nixie tubes, and it is very easy (operating on the keyboard) to fit the recognizer to the speaker according to the displayed data.
Abstract: A relatively simple real-time recognizer of spoken words is described. The main characteristics of this system are the following: (1) the vocabulary of accepted words is settled with simple operations on the panel of the machine; (2) the system is quasi-adaptive in the sense that the characteristic parameters of a given word uttered by a certain speaker can be measured and displayed on a set of Nixie tubes, and it is very easy (operating on the keyboard) to fit the recognizer to the speaker according to the displayed data; (3) at present, a maximum of 15 words can be classified, but, owing to the modularity of the system, other units can be added in order to enlarge the accepted vocabulary. The coder consists of a bank of active filters and a set of circuits translating spectral information into a set of binary digits. The processor is composed (a) of a set of combinational units which evaluate the Hamming distance of the input patterns from the characteristic sequences of phonemes or other “tracts” of words; (b) of a set of sequential units which complete the classification of the uttered word by analysing the time evolutions of the combinational network outputs. The panel controls which make it possible to fix the accepted vocabulary and to adapt the recognizer to the speaker operate on the connections between combinational and sequential units and on the operation parameters of all the units. The machine, used for recognizing the ten digits from 0 to 9 and five other words spoken in Italian, reaches an efficiency larger than 99% on condition of its being previously adapted to the speaker. Programmed according to the average characteristics of male speakers and for classifying words spoken by voices not used in the preceding stage of learning, it reaches average efficiencies of about 90%. (from 85 to 99%. for a given speaker),

5 citations


Journal ArticleDOI
TL;DR: The strategies involved in what has become known as “rule synthesis” and the effect of these on the use to which speech synthesis might be put are focused on.
Abstract: The present paper is divided into three parts: (a) the synthesizer, (b) control of the synthesizer and (c) use of synthesizers. There is no attempt to give a detailed account of the history of speech synthesis (for this, see: Flanagan, 1965 and Mattingly, 1968), nor any account of the details of computer programming for the control of synthesizers: what this paper is mainly concerned with are the strategies involved in what has become known as “rule synthesis” and the effect of these on the use to which speech synthesis might be put.

Journal ArticleDOI
TL;DR: Methods of normalizing and adapting speech data, in the amplitude, frequency and time domains, are described and discussed and results obtained for the recognition of the digits by a wide range of speakers are given.
Abstract: Methods of normalizing and adapting speech data, in the amplitude, frequency and time domains, are described and discussed. Means of automatic gain control and automatic spectrum normalization have been implemented. The problem of adjusting phoneme boundaries is considered, together with the need for normalization in the time domain. Results obtained for the recognition of the digits by a wide range of speakers are given.

Journal ArticleDOI
TL;DR: A revised model, expressed in information theory terms, is put forward for one of the processes represented in the proposed flow diagram: the Ф-process, and a number of experimental results indicate that statistical performance measures derived from this model behave according to simple laws.
Abstract: Some previous theories of the causes of non-randomness in "random generation" behavior are reviewed. On the assumption that such behavior is regulated towards the goal of producing an unpredictable series of events, a flow diagram for the human operator in this type of task is proposed. It is argued that a study of the various kinds of non-randomness, and of the various constraints which are present in the statistical structure of “at random” behavior should throw light on the functional organization of the information processing system responsible for generating the behavior. A revised model, expressed in information theory terms, is put forward for one of the processes represented in the proposed flow diagram: the Ф-process. A number of experimental results are briefly reported which indicate that statistical performance measures derived from this model behave according to simple laws. Results are also presented to illustrate what appears to be a general tendency, in normal subjects, to behave in such a way that the amount of information feedback taken into account in choosing each action is maximized. The “feedback-maximizing” effect seems to occur in tasks which contain an clement of exploratory choice and in which the individual might be expected to adopt a policy of quasi-random, unstereotyped searching for information about some aspect of the situation. It may correspond to a state of mind in which the individual is “paying attention” to one or more aspects of his own choice behavior, or of the interaction between himself and his environment. The relevance of this line of research to clinical problems is indicated, with special reference to schizophrenic thought disorder. Apart from the possible merits and limitations of the Ф-process model, the experimental results may have value as empirical observations.

Journal ArticleDOI
TL;DR: A system implemented at C.I.S.R.O. which assists the human operator to make measurements on the shape of rosette leaves and shows the power of such a system as an aid to classifying pictorial objects.
Abstract: In attempting to classify the various types of the plant species Chondrilla juncea certain measurements on the shape of rosette leaves from the plant have been considered important. This paper discusses a system implemented at C.S.I.R.O. which assists the human operator to make these measurements. In the first stage of processing, the leaves are scanned; the second stage involves an interactive program allowing the operator at a graphic display to control execution of the various routines for extracting the measurements. Details of the program are given. The emphasis in the paper is directed towards showing the power of such a system as an aid to classifying pictorial objects.

Journal ArticleDOI
TL;DR: The performance of operators, especially in terms of transmission rate and the effects of various types of errors, in using a man-machine speech communication link is reported.
Abstract: The performance of operators, especially in terms of transmission rate and the effects of various types of errors, in using a man-machine speech communication link is reported. An abbreviated description is also given of the word recognizer and of the results of tests with 30 reference and 12 unknown talkers, and using a limited vocabulary consisting mainly of digits.

Journal ArticleDOI
TL;DR: The overall intelligibility of the resynthesized speech was not significantly different from that produced by more conventional methods, however, the pattern of errors was slightly different.
Abstract: Several methods of estimating speech synthesizer parameters (formant frequencies and amplitudes) have been evaluated. The technique employed was to analyse the speech waveform by means of a filter bank, and to store the resulting spectra in a computer. The parameters were estimated by program. They were used to control a parallel formant terminal analogue speech synthesizer. The intelligibility of the resulting sounds was measured by audition. It was found that tracking the parameters outwards from the centre of syllables led to slightly more intelligible synthetic speech than tracking the parameters continuously forwards or backwards in time. When a method of parameter estimation was employed which was designed to give priority to reliable extraction of the second formant frequency, the overall intelligibility of the resynthesized speech was not significantly different from that produced by more conventional methods. The pattern of errors, however, was slightly different.

Journal ArticleDOI
TL;DR: The philosophy behind using spoken Morse code for man-machine interaction is described and the merits of this method of communication, as opposed to more conventional speech recognition techniques, is discussed.
Abstract: A voice-operated typewriter for the severely disabled is described. The operator speaks a form of Morse code into the machine and the sequence of sounds is classified in terms of the durations of the sound pulses and the silent intervals. The philosophy behind using spoken Morse code for man-machine interaction is described and the merits of this method of communication, as opposed to more conventional speech recognition techniques, is discussed.