Modelado y Calibración

Embed Size (px)

Citation preview

  • 8/12/2019 Modelado y Calibracin

    1/40

  • 8/12/2019 Modelado y Calibracin

    2/40

    48 Introduction to PCM Telemetering Systems, 2nd edition

    Objectives

    This chapter examines topics related to modeling data for calibration andanalysis purposes. At the end of this discussion, the student will be able to:

    1. Understand the need for and general strategy of instrumentcalibration

    2. Perform a least squares fit of a polynomial to a data set

    3. Justify the order of the model chosen to fit the data

    4. Estimate the statistical confidence of a data point

    We will use standard statistical tools such as those found in MicrosoftExceland other packages. The student is encouraged to use these packageswhenever possible rather than making the computations by hand.

    Basics

    Before beginning the mathematical techniques associated with calibrationand data modeling, we first look at individual issues related to calibrationand data modeling. The two uses have different approaches and needs evenif the mathematical tools are basically the same.

    Calibration

    Calibration has several official definitions. Calibrationis (1) the adjustmentof a device so that the output is within a specific range for a particular valueof the input; (2) a comparison of the indication of an instrument under test,or registration of the meter under test, with an appropriate standard; or (3)the process of determining the numerical relationship, within an overallstated uncertainty, between the observed output of a measurement systemand the value, based on standard sources, of the physical quantity beingmeasured.4

    From this, we see that two approaches can be followed: certifying thatthe measurement is within some preapproved standard range for the mea-

    surement and determining the amount of deviation from the desired stan-dard and expressing that quantity numerically. In the first case, theinstrument can be adjusted to correct improper readings. In the secondcase, adjustment may be possible but it may also be adequate to leave theinstrument alone and apply the numerical corrections to the result toachieve the correct result. Both techniques require that known standard

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    3/40

    Modeling and Calibration 49

    inputs are applied to the instrument and the resulting output observed. Inthis chapter, we will concentrate on the analysis-based method.

    There are two general strategies for developing the calibration equation:(1) analytically model the transfer functions for each stage in the data col-lection process (TS, TC, and Tmin Figure 3.1), and develop a process transferfunction, Tprocess, such as:

    (3.1)

    and fit any free parameters such as gain, sensitivity, and bandwidth to thedata set; or (2) develop an overall equation describing the mapping usingpolynomial equations of the form:

    (3.2)

    and fit the coefficients without trying to model the underlying characteristicsof these devices.

    The first strategy results in a predefined equation with a certain, andhopefully small, number of unknown parameters that must be empiricallydetermined. They may have to be determined individually for each deviceto account for device-to-device variations. The second strategy results in

    FIGURE 3.1Measurement devices included in the calibration process.

    Sensor

    SignalConditioner

    MeasurementDevice

    AppliedStimulus

    TS

    TC

    TM

    Voltage or currentconverted to a number

    CalibrationRegion

    T T T T process S C M=

    T a Vprocess ii

    i

    =

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    4/40

    50 Introduction to PCM Telemetering Systems, 2nd edition

    a polynomial or power law model that the engineer develops to producereasonable mapping to the data. The mapping may not have physicalsignificance as in the first strategy. For expediency, most engineers prefer

    the second technique. We do not try to fully understand and account forall the potential effects at the individual component level; rather, we tryto model the overall trend, thereby treating the measurement devices asa system.

    Sensor Example

    Let us begin the calibration discussion by considering a capacitive raingauge (CRG) as a typical sensor that needs calibration. The CRG works by

    using a captured volume of water to make a capacitor. The output voltageacross the CRG is proportional to the stored water volume. As rain fallsduring a storm, the gauge fills and the output voltage gives an integratedrainfall measurement. An example of a reading from a CRG during asimulated rainstorm is shown in Figure 3.2. The actual volume, the mea-sured volume, and the output voltage level are given. We can see a slightdifference between the actual volume and the measured volume of waterin the gauge.

    We have found from experience that this difference is also a function of

    external air temperature. To understand how the CRG is behaving, we peri-odically add known quantities of water to the gauge and measure the output.One such calibration measurement set is given in Table 3.1and the inputoutput relationship is illustrated in Figure 3.3.The relationship is basicallylinear and two related questions arise: is this truly linear and, if so, what arethe slope and intercept values? We will use mathematical and statistical

    FIGURE 3.2Simulated output of a capacitive rain gauge during a rainstorm.

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    0 100 200 300 400 500 600

    Time (seconds)

    True Gauge Volume

    Measured Output

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    5/40

    Modeling and Calibration 51

    techniques to answer these questions so that we can apply the results tomake accurate measurements.

    Calibration Range

    The sensor has a natural range of valid input and output levels as illustratedin Figure 3.4.First, it has a minimum input signal level necessary to excitethe sensor. It also has a maximum input signal level above which the sensorfails. Between them is a normal sensor response region where the user willdesire to have measurements made. This is the region that will have thecalibration function defined to match the sensor. On either side of that

    boundary, signal conditions may exceed the sensor s specifications.

    TABLE 3.1

    Rain Gauge Data

    Exact input (mm) Measured output (mm)

    0 0.075 4.37

    10 10.3115 16.3820 22.1925 27.7030 33.0735 38.2740 43.5645 48.7650 54.04

    FIGURE 3.3Actual versus measured output values for the CRG.

    Input Volume (mm)

    0 10 20 30 40 50

    0

    10

    20

    30

    40

    50

    60

    OutputVolu

    me(mm)

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    6/40

    52 Introduction to PCM Telemetering Systems, 2nd edition

    These specifications define the usable signal level and environmentalvariables. For example, a sensor may have temperature, humidity, andvibration specifications that cannot be exceeded for the sensor to operateproperly. If the sensor exceeds these specifications, varying results willoccur. A device that is slightly out of specification will usually give inac-curate readings but the device is not permanently damaged. It will continueto operate when it is returned to normal specifications. However, it mayneed to be recalibrated to achieve accurate readings again. If the device issubject to extreme conditions, the sensor may be permanently damaged

    and will have to be replaced.

    Measurement Calibration Process

    Calibration is the process whereby specific inputs are applied to a measure-ment system and the corresponding outputs are measured. The input is theknown quantity or the independent variable while the measurement outputis the unknown quantity or dependent variable. The input values must relateto a known and reliable standard. This standard may come from a locallaboratory and provide a reasonably close approximation to the exact defi-nition for the quantity. Alternatively, the laboratory standard may be cali-

    brated against another standard, which is calibrated against a primarystandard as defined by a national standards laboratory. In all cases, we wishto truththe measurement.

    Calibration should be done regularly. Most instruments have a calibra-tion intervalwhich is defined as the maximum length of time between

    FIGURE 3.4Sensor calibration and sensitivity regions.

    Total SensorFailure

    CalibrationCurve

    Sensor Output

    Valid CalibrationInterval for NormalSensor Conditions

    Calibration Invalid,Sensor Unchanged

    Calibration Invalid,Sensor Characteristics

    May Change

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    7/40

    Modeling and Calibration 53

    calibration services during which each standard and test and measuringequipment is [sic] expected to remain within specific performance levelsunder normal conditions of handling and use.4 Calibrations are also

    applied if the measurement system is subject to large stresses. In someapplications, it is standard procedure to calibrate the measurements at thestart of each experiment or major use to prevent subsequent problems. Thiscan be especially important for uses involving human health and safety.Calibration is not a random process. A preferred method such as thefollowing is usually adopted:

    1. Start the standard input at the middle of the inputoutput range.

    2. Record input and output values at the desired step interval until

    the maximum input level is reached.

    3. Start to decrease the input level and record input and output valuesat the desired step interval until the minimum input level is reached.

    4. Start to increase the input level and record input and output valuesat the desired step interval until the midpoint input level is reachedagain.

    In all cases, the maximum and minimum input levels are not to be passed.5

    This will result in a calibration curve in which hysteresis error and othersensitivity changes can be determined.

    Calibration Variables

    The purpose of calibration and modeling is to relate the measured quantityto the measurand via some form of mathematical relationship. Mathematicalmapping will relate the measurand to a physical output variable such asvoltage or current. As in the CRG example, this mapping is also a functionof temperature, and to a certain extent, age. In the most general form, this

    mapping will look like:

    (3.3)

    The exact form of the functional mapping is up to the analyst to develop.The mapping can be linear or nonlinear, depending on the type of systemused. We will examine a technique to perform the analysis regardless of thevariables chosen. For example, let the output variable be voltage, V, the

    environmental variables be the temperature, T, and the humidity,H. Supposewe know from physical analysis of the existence of a second-order relation-ship between the measurand and the system output voltage. The mappingfunction could then be of the form:

    (3.4)

    measurand= ( _output variable; environmental_variables)

    measurand a a V a V a T a H = + + + +0 1 22

    3 4

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    8/40

    54 Introduction to PCM Telemetering Systems, 2nd edition

    This will allow us to fit the expected functional variation plus the envi-ronmental influence. Naturally, to fully calibrate the instrumentation overthis parameter space, a great number of measurements over all combinations

    of the variables must be made. This effort is justified if the instrumentationshould make very precise measurements or must be used over a wide rangeof conditions. If the instrumentation is to be operated in well controlledconditions, environmental effects may be ignored.

    Difference between Calibration and Usage

    During calibration, the measurand is replaced by a known standard and itis treated as the independent variable. The measurement output is the

    unknown and is treated as the dependent variable. In actual usage, the rolesare reversed. In usage mode, the measurement process output is the knownquantity and the measurand is the unknown. At this point the calibrationprocess is reversed to deduce the correction required to make a correctmeasurement. Depending upon how the modeling is performed, this differ-ence implies that the modeling equation may need to be inverted for usewith the measurement data.

    Data ModelingIn many aspects, data modeling and developing calibration mapping aresimilar processes. We will now discuss the differences and extensions of the

    basic concept.

    Difference between Calibration and Data Modeling

    In the calibration process, the mathematical equations used to model theprocess work on the specifics of the inputoutput process of the measure-

    ment system. The next step is to perform mathematical modeling of the entiredata set. There may be an underlying physical theory that unites the data.The least squares techniques discussed later in this chapter can also beapplied to the entire data set to fit the data to the model. The goal is thesame: develop a numerical technique that can be used to represent theentirety of the data set and allow the user to work with the equation andnot with individual data points. Generally, the calibrated measurand valuesare used in the modeling process instead of raw uncalibrated measurements.While the two steps could, in principle, be combined, usually they are not.

    Keeping them separate allows easier manipulation of different model classes.

    Modeling as Filtering

    The least squares technique discussed later in this chapter can also be appliedin the filtering and signal processing methods to be developed in Chapter5. In a sense, the least squares technique can act as a filter to smooth the

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    9/40

    Modeling and Calibration 55

    data through noise that may be present in the measurement process. This isimportant because real-world measurement will always include a certainamount of uncertainty and noise. Therefore, these techniques are more

    important than simply modeling the underlying data process.

    Error Types

    Various types of errors can affect the accuracy and precision of the measure-ment process. This section covers the types of errors that can be found in

    measurements. The next section explores statistical ways of describing ran-dom errors. Other types of errors can be quantified and compensated for byproper calibration.

    Systematic Errors

    If there is bias in the measurement equipment or observer making the mea-surement, an offset can develop between the true value and the measuredvalue. This is illustrated in Figure 3.5. The line on the right side of the figurerepresents the true value that the measurement process is attempting todiscover. The systematic error acts to displace the measurements away fromthe true value to the new value. One can think of the systematic error as aDC voltage applied to the measurement process. In this case, the systematicerror displaces the measurements toward the value in the middle of thefigure. By proper calibration, this displacement can be discovered and theappropriate compensation applied.

    FIGURE 3.5Relationship between random and systematic errors with respect to true value.

    Mean Value of Measurements True Value

    Systematic Error

    Measurement Uncertainty

    Distribution o fActual Measurements

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    10/40

    56 Introduction to PCM Telemetering Systems, 2nd edition

    Random Errors

    The second effect shown in Figure 3.5 is caused by random errors in the

    measurement process. Random errors can come from a number of sourcessuch as electrical noise and they tend to obscure the central value, even whena large number of measurements are made. The best we can do is estimatethe underlying value. From a voltage point of view, random errors do nothave DC values such as systematic errors have. Random errors provide adisplacement to the measurement. However, a random error will not beapparent based on only one value. Many samples of a measurement areneeded to see the extent of the random error. The mathematical fitting pro-cedure tends to smooth the measurements and represent estimates for the

    underlying values.

    Interference

    Sometimes a signal that is not directly part of the measurement process willseep into the electronics and cause interference. This stray signal can biasthe result but it is difficult to remove via calibration alone because theinterfering signal is not part of the system. Additionally, it not always likea random error because the underlying interference may have a deterministic

    structure. The best solution for interference is to shield the measurementprocess from it so that the interference does not corrupt the measurement.

    Hysteresis Error

    Starting from the middle of the measurand range and scanning from low tohigh and then back to low across the measurand range may be the best wayto perform calibration to determine whether a hysteresis error occurred inthe measurement system as illustrated in Figure 3.6.Because the measure-

    ment system may react differently as the measurand increases from a lowto a high value than it does as the measurand changes in the opposite sense,we define hysteresis error as the maximum separation due to hysteresis

    between up-scale-going and downscale-going indications of the measuredvalue after transients have decayed.4(see also Fraden2and Liptk5). Thehysteresis error, eH, is defined in terms of maximum voltage separation,HH,and the scale factor, SF, using:

    (3.5)

    Dead Band Error

    At certain areas, the measurement system is insensitive to changes in themeasurand. These regions of insensitivity produce a dead band error,

    eHH

    SFH=

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    11/40

    Modeling and Calibration 57

    defined as the range through which an analog quantity can vary withoutinitiating response.4 (see also Fraden2 and Liptk5). This concept is illus-trated in Figure 3.7. A small flat spot in the middle of the graph representsthe dead band that can cause measurement inaccuracy.

    FIGURE 3.6Hysteresis error in measurement process.

    FIGURE 3.7Dead band error in measurement process.

    Measurand

    IN I

    N

    S

    HN

    H

    Up-sca

    le-goingMea

    suremen

    t

    Down-scale-goi

    ngMeas

    urement

    Scale

    Facto

    r

    OutputMeasurement

    Measurand

    Dead Band

    Actua

    lRespo

    nse

    Desired

    Respo

    nse

    OutputMeasurement

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    12/40

    58 Introduction to PCM Telemetering Systems, 2nd edition

    Statistical Concepts

    To properly characterize the effects of noise and better quantify the resultsof data modeling, we must use certain basic statistical measures that dealwith both single measurements and whole sets of measurements. The Gaus-sian statistical assumption will be used because it serves as the normalassumption and describes most of the effects that we are concerned with.

    Basic Measurement Model

    The normal assumption made in measurement is that the actual measure-ment is the linear sum of a true value plus noise (superposition model). Ifwe make a number of measurements of the same point, X0, the resultingvector of measurements, , can be written as:

    (3.6)

    A noise vector, , associated with the measurement process modifies themeasurement of the true value, X

    0

    . In principle, if we know the value of thenoise vector elements, we can invert the equation and uniquely recover thetrue value. In practice, we can estimatethe noise and then arrive at our bestestimate for the true value. This process is illustrated in Figure 3.8in whicha series of measurements are made at each point along the x-axis. One ofthese points is enlarged to show the distribution of the measurements con-tained in that point. In the next section, we will describe the probabilityconcepts used to describe the noise process.

    Probability Concepts

    The noise process is described by probability measures. The important functionsthat describe the probability that a noise will take on certain values are theprobability density and the probability distribution. We will also concentrate onGaussian probability functions since they are the most widely used. This will

    be a brief discussion. For more information, consult standard references.1,3,7,8

    Relative Frequency

    One of the most intuitive methods for representing the concept of the prob-ability that an event will occur is using the relative frequency probabilitydefinition. If A is an event, for example, winning at 21, losing a baseballgame, or having a meteor hit your car, the probability, p(A),out of a totalsample of N events is related to the number of times the event A occurs,n(A), by the equation:

    v

    X

    r r

    X X N= +0

    r

    N

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    13/40

    Modeling and Calibration 59

    (3.7)

    We can see that we really need a large sample for the concept of probabilityto be meaningful. A single measurement is not enough. We will need tomake reasonableapproximations since we do not have the time to makean infinite number of measurements.

    Probability Density

    Generally, we wish to know the probability of events at many locations. Thisgives rise to the concept of a probability density function (PDF). Thisfunction can be continuous or discrete. Noise tends to be a continuousquantity so we will concentrate on the continuous form.

    Let us start by extending the relative frequency concept and make ahistogram of the relative frequency of a number of measurements. Forexample, suppose we design an experiment where the true measure-ment value is known to be 12.5 in magnitude and each measurement hasa little noise added. Let us make 100 measurements and plot the relativefrequency of each value obtained as shown in Figure 3.9.Since we knowthe correctvalue should be 12.5, the plot gives a relative frequency ofthe noise values. The histogram intervals are chosen by computing the

    bin width, , knowing the total number of points, N, and the range of thepoints by using the relationship:6

    FIGURE 3.8Series of measurements with one measurement selected to show the distribution of measure-ments comprising one of the data points.

    0 2 4 10

    average

    -2

    0

    2

    4

    6

    8

    10

    12

    14

    .

    ..

    ...

    .

    .

    .

    0.5

    0.3

    0.1

    -0.1

    -0.3

    -0.5Independent Variable

    p An A

    NN( ) lim

    ( )=

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    14/40

    60 Introduction to PCM Telemetering Systems, 2nd edition

    (3.8)

    If we extend this notion to a case where an infinite number of noisemeasurements are made and then plot the histogram, we will arrive at theprobability density function,p(x)for the noise. This function describes howthe random variable that we are measuring is spread. The stronger thefunction, the higher the probability that the quantity we are measuring will

    be found in that region. The PDF for a random variable, x, will have anaverage value or mean value,, given by:

    (3.9)

    This is also sometimes called the first moment of the variable x. The PDFcan also be used to compute the variance of the random variable, 2, byusing the equation:

    (3.10)

    The PDF has two properties that distinguish its function from an arbitraryfunction. The PDF is nonnegative; that is, a probability is always a positive

    FIGURE 3.9Relative frequency of measurements with noise added.

    11 11.5 12 12.5 13 13.5 14 14.5

    0

    2

    4

    6

    8

    10

    12

    14

    = ( )max minx x

    N

    =

    xp x dx( )

    =

    ( ) ( )x p x dx2

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    15/40

    Modeling and Calibration 61

    number. The integral of the PDF is unity. We can use the PDF to computeprobabilities of events once we know the functional form. How do wedetermine the correct PDF for a set of measured data since we do not have

    an infinite number of measurements? Some commercial software packagesallow the user to enter the measured data and they rank order the bestestimates for the mathematical PDF to represent the data.

    Once we have the PDF, we can compute the probability that the continuousvariable xcan be found between the limits aand bby using the relationship:

    (3.11)

    As the limits a and b approach each other, the probability of finding xbetween them becomes zero; that is, the probability that a continuous vari-able will be found at exactly one point is zero. Also, as the limits a and bapproach infinity, the probability approaches unity; that is, if one covers thenumber line, the probability is 1 that xwill be found somewhere.

    Can the function h(x) = 1/2 [u(x) u(x 2)] be a valid PDF for a randomvariable x? The function u(z) is the unit step function. The area of h(x)computes to 1 and it is nonnegative so it has the correct mathematical

    properties. We could interpret it as a uniform probability that xis between0 and 2. Using this PDF, what is the probability that 0.125 < x1? ApplyingEquation (3.11), we obtain p(0.125 < x1) = 0.438.

    Cumulative Distribution Function

    Another way to represent the probability information is with the cumulativedistribution function(CDF). It does not contain any information differentfrom the PDF; it merely represents it in a different way. The PDF and CDF

    are related by the definition of the CDF. The CDF, F(a),for a random variablewith a probability densityp(x)is:

    (3.12)

    That is, the CDF measures the probability that xis less than the point a.We can extend this to computing the probability that xis between two limits,

    aand b, as:

    (3.13)

    We can see the relationship between the PDF and the CDF for the aboveexample in Figure 3.10.For any CDF, the minimum value is 0 and the max-

    p a x b p x dx

    a

    b

    ( ) ( )< =

    F a p x a p x dx

    a

    ( ) ( ) ( )= =

    p a x b F b F a( ) ( ) ( )< =

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    16/40

    62 Introduction to PCM Telemetering Systems, 2nd edition

    imum value is 1. The CDF is also a nondecreasing function. Using theprevious example, the CDF values for the two end points are F(0.125) = 0.063and F(1) = 0.5. The probability that xis between aand bcomputes to 0.438

    as it did above.

    Gaussian PDF and Noise Model

    The Gaussian PDF is commonly used to model noise processes. Manynoise processes are actually accumulations of multitudes of individual

    FIGURE 3.10PDF and CDF for a random variable.

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

    http://www.crcnetbase.com/action/showImage?doi=10.1201/9781420040883.ch3&iName=master.img-000.jpg&w=263&h=401
  • 8/12/2019 Modelado y Calibracin

    17/40

    Modeling and Calibration 63

    interactions. Under the central limit theorem, these accumulated interac-tions give the resulting noise a Gaussian PDF. The Gaussian PDF is given

    by the equation:

    (3.14)

    We can use Equations (3.11), (3.12), and (3.13) to make computations.However, the integrals are generally not computable in closed form. How-ever, some well known related functions can make the computations forGaussian PDFs easier.

    The erfFunction

    Because the Gaussian PDF is difficult to compute, there are tables ofrelated functions. The first is the error function, which is usually just calledthe erf function. There are generally two similar definitions of the erfintegral.1,8 We will use the following integral equation, which is morecommonly used today:

    (3.15)

    This is the integral of a Gaussian PDF from 0 to the value x. It is not thesame Gaussian as in Equation (3.14) but can be obtained by a transformationof variables. This is shown schematically in Figure 3.11.The error functionhas the following properties:

    erf(

    x) =

    erf(x)

    erf() = 1

    The erf function is commonly available in computing packages such asMathcad* It is also tabulated in Table A.1 in the appendices.

    The error function has a related function known as the complementaryerror function or erfc. This function is computed from the erfusing:

    (3.16)

    For a Gaussian variable xbetween the limits aand b, we can compute theprobability that xis between the limits by using the erfand erfcfunctions asfollows:

    * Mathcad is a copyright of MathSoft Engineering & Education, Inc., 2001.

    p xx

    ( ) exp= ( )

    1

    2 2

    2

    2

    erf x e duux

    ( )= 22

    0

    erfc x erf x( ) ( )= 1

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    18/40

    64 Introduction to PCM Telemetering Systems, 2nd edition

    (3.17)

    For example, let = 2.5 and 2 = 0.5. What is the probability that themeasurement xis between 1.5 and 3.5? Using the erffunction:

    (3.18)

    What is the probability that the measurement is greater than 10? Usingthe erffunction, we compute:

    (3.19)

    The answer is not exactly zero but the computation is close enough.

    FIGURE 3.11Gaussian PDF with erfand Qfunctions indicated.

    p a x b erfb

    erfa

    erfcb

    erfca

    < ( )=

    =

    1

    2 2

    1

    2 2

    1

    2 2

    1

    2 2

    p x erf erf( . . ). .

    * .

    . .

    * .

    .

    1 5 3 51

    2

    3 5 2 5

    2 0 5

    1

    2

    1 5 2 5

    2 0 5

    0 8427

    < =

    =

    p x p x erf>( )= < =

    10 101

    2

    1

    2

    10 2 5

    2 0 5

    0

    ( ).

    * .

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

    http://www.crcnetbase.com/action/showImage?doi=10.1201/9781420040883.ch3&iName=master.img-001.jpg&w=299&h=217
  • 8/12/2019 Modelado y Calibracin

    19/40

    Modeling and Calibration 65

    The QFunction

    Another function related to the erf(x) and erfc(x) is the Qfunction which isdefined by the integral equation:

    (3.20)

    If the error function is the integral of the Gaussian PDF from 0 to the pointx, then the Qfunction is related to the integral of the PDF from xto infinityas shown in Figure 3.11.The Qfunction is related to the erfand erfcby the

    following equations:

    (3.21)

    (3.22)

    Similar to Equation (3.17), we can compute probabilities with the Qfunction

    (3.23)

    The Q function is also tabulated in Table A.1 in the appendices. UsingEquations (3.21) and (3.22), we can also compute the Qfunction using theerfand erfcfunctions available in analysis packages such as Mathcad.

    Electronic NoiseElectronic noise is typical. Normally, engineers assume that the noise can bemodeled as a Gaussian process. The system has a natural bandwidth, B,measured in Hertz. Normally, the noise is a zero-mean process; that is, thenoise does not have a DC offset. We parameterize the noise by an equivalentsystem temperature, Tsys. This is not a physical temperature; it is a convenientmeasure of the total noise produced. For example, a device that produces asmuch noise as a black body source at 290 K is said to have a noise temperatureof that same 290 K regardless of the physical temperature of the device. The

    system temperature is used to compute the noise spectral density, N0,whichdescribes the noise process in the frequency domain. The spectral density inwatts/Hertz is computed from the system temperature and Boltzmannsconstant, k, using:

    (3.24)

    Q x e dyy

    x

    ( )=

    122

    2

    Q x erfcx

    ( )=

    1

    2 2

    Q x erf x

    ( )=

    1

    21

    2

    p a x b Qa

    Qb

    ( )< =

    N kTsys0=

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    20/40

    66 Introduction to PCM Telemetering Systems, 2nd edition

    Boltzmanns constant is 228.6 dBW/k-Hz. From this, we can computethe variance of the Gaussian process using the spectral density and the

    bandwidth:

    (3.25)

    The relations used when we examined system noise related to data trans-mission will be used again.

    Mean, Variance, and Standard Deviation Estimates

    In Equations (3.9) and (3.10), we saw the definitions of the mean, , and thevariance, 2. Related to the variance is the standard deviation, . Thesedefinitions are based on having the full probability density function and,essentially, an infinite number of measurements. Of course, this is impossibleto attain in practice. This section covers practical estimates for these param-eters. In the next section, we will see how well we know these results.

    Parameter Estimation

    In real systems, we need to estimate the mean and the variance based upona finite number of measurements. We may not even know the underlyingPDF. A typical estimate for the mean, or , for a set of Nmeasurements,{xi}, assuming that each measurement is equally probable, is to use thecustomary equation for computing an average:

    (3.26)

    In a similar manner, we can compute the estimated variance, , of thedata set by using:

    (3.27)

    We can use the data in Table 3.2 to compute the estimated mean andestimated variance. Applying Equations (3.26) and (3.27), respectively, wearrive at an estimated mean of 12.4024 and estimated variance of 0.3999. Thequestion is, how good are the estimates? The error and uncertainty in themean and confidence intervals are used to answer this question.

    Error in the Mean

    The previous example involved a finite number of samples and a singleestimate for the mean and variance. Suppose we ran the experiment again.

    2 0= N B

    x

    < > = ==

    x N xii

    N1

    1

    < > =

    ( )=

    2 21

    1

    1Nx xi

    i

    N

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    21/40

    Modeling and Calibration 67

    It is not hard to believe that we would obtain slightly different measures forthe mean and variance. If we experiment multiple times, we would buildup a distribution of mean values as illustrated in Figure 3.12. They clusteraround the mean of the meanswhich is a better estimate of the true valuethan any individual sample mean would be. The variance in the distributionof the mean values is called the error in the mean. It is computed in termsof the estimated variance, , and the number of measurements, N, in thedata set using:3

    (3.28)

    The error in the mean for the data set in Table 3.2 is 0.0080. As we can seefrom Equation (3.28), the number of data points in the measurement setinfluences the error in the mean. The greater the number of points in thedata set, the smaller the error in the mean will be.

    TABLE 3.2

    Sample Data for Computing Mean and Variance

    12.7112 12.2444 12.9988 13.0394 11.655111.8167 12.4733 12.6544 13.2575 12.419312.9993 12.5057 12.2144 13.0580 12.033713.5600 12.8323 11.9057 12.9222 11.877812.2843 11.2862 12.6215 13.5281 11.546112.2451 13.0092 12.8680 11.3763 12.005911.9353 12.6319 12.2218 12.8240 11.771913.3503 12.7698 12.6444 12.9008 11.554011.2474 11.5000 13.2573 13.2929 12.223212.2079 11.1374 11.9733 12.1102 12.6152

    FIGURE 3.12Distribution of mean values about the mean of the means.

    2

    2

    =< >

    N

    Mean Value of Measured Mean Values

    Distribution of MeasuredMean Values

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    22/40

    68 Introduction to PCM Telemetering Systems, 2nd edition

    Uncertainty in the Mean

    Just as the standard deviation is the square root of the variance, the uncer-tainty in the meanis the square root of the error in the mean, or:3

    (3.29)

    For the example data set in Table 3.2,the uncertainty in the mean is 0.0894.

    Confidence Intervals

    While the error and the uncertainty in the mean are easy to compute, theydo not really tell the user how close they might be to the true value, exceptin the most general way. A more intuitive measure is the confidence intervalfor the estimate as illustrated in Figure 3.13. The true value for the mean, ,and the estimated mean, , are shown and their difference is an estimationerror. Based on the Nsamples and the uncertainty in the mean, we can definea confidence interval around the estimated mean.3This confidence intervalis intended to represent the region in which we expect to find the true meanvalue if we can make an infinite number of measurements.

    The width of the confidence interval is a function of the uncertainty in themean and the level of confidence we want in the result. If we wish to be veryconfident of the result, then the interval must be broader than it would be ifwe were willing to accept a lower level of confidence. If we define as thepercent level of uncertainty in the estimate, then we will have a 100-minus-percent confidence in the estimate. For example, a 5% uncertainty correspondsto a 95% confidence level. Typical values that indicate we are confident of theresult are 95% or 99%. If is our estimate for the mean of the measurementsand is the uncertainty in the mean, we estimate that the true value, , willbe found with 100-minus-

    percent confidence in a region bounded by

    (3.30)

    wherez/2 is a parameter based on the confidence level chosen. Basically,z/2isa measure of the number of standard deviations about the mean that we will

    FIGURE 3.13Confidence interval relative to true mean and the estimate of the mean.

    = < >2

    N

    +z z/ /2 2

    Confidence Interval

    Estimation Error

    - z/2/ N + z/2/ N

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    23/40

    Modeling and Calibration 69

    search to find the true value. The use of z/2 is based on the assumption thatmore than 30 measurements are in the data set and that we have a Gaussianprocess. Values forz/2are given in Table A.2 in the appendices. In the table, P

    is the Gaussian probability or the confidence level desired. Thez/2is read fromthe column with the desired confidence level. In an example fromTable 3.2, the95% confidence interval is 0.1753 around the estimated mean 12.4024. That is,with 95% confidence, we believe the true value lies between 12.227 and 12.578.

    Suppose only a smaller number of measurements are available, or we donot know the variance in the data, or we are not sure Gaussian statistics stillhold. Do we have a way to compute the confidence interval? If the numberof measurements is below 30 or the Gaussian assumption does not hold, atdistribution can be used. In that case, the values from the tdistribution

    shown in Table A.3 are used. The value forz/2is replaced with t/2which isa function both of the confidence level and the number of degrees of freedom,

    , in the measurement set. The number of degrees of freedom is given by= N1. The confidence interval bound equation is then:

    (3.31)

    Number of Measurements Required

    The number of measurements, N, is a critical parameter and the confidenceof the results directly scales with N. The main question is, how many mea-surements are sufficient? Certainly, we need more than one. Are 10, 100, or1000 measurements sufficient and how do we know? One approach is to usethe confidence interval. If we know the confidence interval we want and theapproximate variance in the data, we can solve for the number of measure-ments needed.

    Least Squares Fitting

    The least squares method is the usual procedure for fitting a model to a dataset, especially when measurement noise makes the exact nature of the equa-tion a bit uncertain or when we do not have an a priorimodel for the system.This method can be used to model both the calibration of the instrumentationsystem and the underlying process producing the data. This section concen-

    trates on fitting polynomials to the data. Once this procedure is understood,other types of functions (trigonometric, orthogonal polynomials, etc.) can beused and the fitting procedure determined. Since the procedure can be used

    both in calibration and data analysis, we will not make a distinction betweenthem during the development of the method. As a practical matter, thenecessary equations are developed here but they are standard functions inanalysis packages such as Excel and similar programs.

    +t t/ /2 2

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    24/40

    70 Introduction to PCM Telemetering Systems, 2nd edition

    Least Squares Definition

    The least squares process provides the best estimate for the parameters

    to specify a model. The method does not tell the user which model touse. A number of methods can determine which of several models isrelatively better but no method can indicate which model is the correctone in an absolute sense. The least squares method is based on minimiz-ing the mean square error between the data and the chosen model.8,9Thecomputations involve a set of data points consisting of an independentvariable {xi} and a dependent variable {yi}. The model produces a set ofestimates of the dependent variable, , based on the independent vari-able. The mean square error, mse, between the data and its estimate is

    defined as:

    (3.32)

    The mseis a function of the quality of the data and the model selected. Ifan inappropriate model is chosen, the mse can be large, even if the data arerelatively noise free. To fit the model, a sufficient number of points are

    needed. This number must be greater than the number of parameters deter-mined in the fitting procedure. We can also weight the fit in Equation (3.32)

    by dividing the variance in each point. Unless the variances of the pointsare significantly different, this step will not greatly affect the final results sowe will not use it.

    Linear Least Squares Mean Square Error Base

    To see how least squares equations are developed, let us first consider a

    linear fit to data. In this case, the model becomes:

    (3.33)

    We next apply the model to the mseEquation (3.32). This gives a model-specific mseequation of:

    (3.34)

    To minimize the mse, we take the partial derivative of the msein Equation(3.34) with respect to each of the fit parameters, a0and a1,giving two equa-tions of the form:

    yi

    mseN

    y yi ii

    N

    = ( )=

    1 21

    y a a xi i= +0 1

    mseN

    a a x yi ii

    N

    = + ( )=1 0 1

    2

    1

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    25/40

    Modeling and Calibration 71

    (3.35)

    We will then solve the system of equations for the fit parameters. Thesystem to be solved is:

    (3.36)

    We can reorganize the equations in Equation (3.36) as follows:

    (3.37)

    It is often easier to manipulate equations if they are written in matrix form,as follows:

    (3.38)

    For example, we will next apply this to the data in Table 3.1. Table 3.3shows the sums needed for using Equation (3.38). Solving the matrix equa-tion for the coefficients gives a0 = 0.1991 and a1= 1.094. The sample datapoints and the fit to the data are shown in Figure 3.14.

    Linear Least Squares Statistical BaseThe matrix formulation of the least squares fit is not the only way to structurethe solution. The summations listed below can be used to determine coeffi-cients for the linear fit.3They can also help assess the quality of the fit sothat the summations are used for more than solving for coefficients. Therequired summations are:

    + ( ) ==

    a a a x yj i iiN

    0 1

    2

    1

    0

    a a x y

    a a x y x

    i i

    i

    N

    i i ii

    N

    0 1

    1

    0 11

    0

    0

    + ( )=

    + ( )

    =

    =

    =

    Na a x y

    a x a x x y

    i

    i

    N

    i

    i

    N

    ii

    N

    ii

    N

    i ii

    N

    0 1

    1 1

    01

    1

    2

    1 1

    + =

    + =

    = =

    = = =

    N x

    x x

    a

    a

    y

    x y

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    =

    = =

    =

    =

    =

    1

    1

    2

    1

    0

    1

    1

    1

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    26/40

    72 Introduction to PCM Telemetering Systems, 2nd edition

    1. Average value of the independent variables:

    (3.39)

    2. Average value of the dependent variables:

    (3.40)

    TABLE 3.3

    Example Least Squares Computation

    x y x2

    xy

    0 0.07 0 05 4.37 25 21.85

    10 10.31 100 103.115 16.38 225 245.720 22.19 400 443.825 27.7 625 692.530 33.07 900 992.135 38.27 1225 1339.4540 43.56 1600 1742.445 48.76 2025 2194.250 54.04 2500 2702

    Totals 275 298.72 9625 10477.1

    FIGURE 3.14Sample data from Table 3.1and least squares fit to the data.

    Calibration Setting (mm)0 10 20 30 40 50

    -10

    0

    10

    20

    30

    40

    50

    60

    x

    N

    x ii

    N

    ==

    1

    1

    yN

    yii

    N

    ==

    11

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    27/40

    Modeling and Calibration 73

    3. Number of degrees of freedom for Ndata points:

    (3.41)

    4. Spread of the xiaround their mean

    (3.42)

    5. Spread of theyiaround their mean

    (3.43)

    6. Cross-product of the {xi} with the {yi}

    (3.44)

    With these summations, the linear fit coefficients are:

    (3.45)

    Table 3.4shows the application of these summations to the sample dataset from Table 3.1. Using Equation (3.45), we obtain the coefficients a0 =0.1991 and a1 = 1.094 which are the same values as those found by thematrix method.

    Quality of the Fit

    The fit for the coefficients is only one factor. We need to answer two ques-tions: are the coefficients well determined and is this model correct? This

    section will examine both issues. One of the first tests to try in determiningwhether a fit is reasonable is to plot the residuals across the data set as isdone in Figure 3.15.The residuals, i, are the differences between the mea-sured dependent variable and the model output for each point:

    (3.46)

    = N 2

    SS x xXX ii

    N

    = ( )=

    21

    SS y yYY ii

    N

    = ( )=

    21

    SS x x y yXY i ii

    N

    = ( ) ( )=

    1

    aSS

    SS

    a y a x

    XY

    XX1

    0 1

    =

    =

    i i iy y=

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    28/40

    74 Introduction to PCM Telemetering Systems, 2nd edition

    Ideally, the plot of the residuals should look like a plot of random points.

    There should be about as many points with positive residuals as there arewith negative residuals. Also, there should not be any obvious structure inthe residuals, as illustrated in Figure 3.16, which indicates that the modelmight not be appropriate. Figure 3.15 shows a few more positive residualsthan negative residuals. With only 11 points, it is difficult to definitivelydetermine whether a systematic problem exists.

    TABLE 3.4

    Summations for Least Squares Fit

    x y

    0 0.07 625 677.15915 4.37 400 455.7273

    10 10.31 225 252.695515 16.38 100 107.763620 22.19 25 24.831825 27.7 0 0.000030 33.07 25 29.568235 38.27 100 111.136440 43.56 225 246.0545

    45 48.76 400 432.072750 54.04 625 672.0909

    Mean 25 27.156SSXX 2750

    SSXY 3009.1

    FIGURE 3.15Residual plot showing error between data and model as a function of data point number.

    (x x)2 (x x)(y y)

    Measurement Number

    1 2 3 4 5 6 7 8 9 10

    -1

    -0.5

    0

    0.5

    1

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    29/40

    Modeling and Calibration 75

    In addition to residual plots, numeric indicators can be used as well toindicate the quality of the modeling process. The correlation coefficient isused to determine the quality of the fit and thefstatistic is used to indicatethe appropriateness of the model.

    Correlation Coefficients

    To determine the quality of the fit, we can use the analysis of variance(ANOVA) technique to provide indicators of how well the coefficients are

    determined. This analysis is found in many standard statistical computersoftware packages and in spreadsheet programs such as Excel. To start theanalysis, we define several more summations that will be needed in additionto the earlier equations. The summations this time require the model for thedependent variable to produce the estimated points, , given by Equation(3.33). We can then compute the regression variability:

    (3.47)

    The error in the fit is given by the difference between the measureddependent variable and the model output. With this, we compute theerror variability:

    (3.48)

    Equation (3.43) can be interpreted as the total variability in the measure-ment set. The quantities SSYY,SSR, and SSEare related by SSYY = SSR+ SSE.The ratio SSE/SSYY can be interpreted as the proportion of the fit error tothe total spread in the data. Table 3.5gives the computations using Equations(3.33), (3.47), and (3.48) for the data set inTable 3.1.

    FIGURE 3.16Nonrandom error residuals.

    Position Position

    Incorrect Model Assumed Heteroscedastic Error

    yi

    SSR y yii

    N

    = ( )=

    2

    1

    SSE y yi ii

    N

    = ( )=

    21

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    30/40

    76 Introduction to PCM Telemetering Systems, 2nd edition

    The squared correlation, R2measures the proportion of the total variabilityin the dependent variable that is accounted for by the fit. R2is always lessthan 1 but the closer to 1 the value is, the better the linear fit. The correlationis computed using:

    (3.49)

    A related parameter is the correlation coefficient, r, that measures thetightness of the fit and whether it is a positive correlation or an anticorrela-tion. The correlation coefficient is computed using:

    (3.50)

    The sgn()is the function that takes the sign of the argument. Using thedata in Table 3.5, R2= 0.9993 and r= 0.9997. This implies a nearly perfectfit of the model to the data. Once the coefficients of the model have beendetermined, the confidence intervals on each coefficient can also be deter-mined. To do this, we first determine the standard error in the overallmodel using:

    (3.51)

    and the standard error in each coefficient from:

    TABLE 3.5

    Data for Determining Quality of Fit

    SSR SSE SSYY

    0.1991 748.3209 0.0724 733.67115.2720 478.9254 0.8136 519.2184

    10.7431 269.3955 0.1876 283.800016.2142 119.7313 0.0275 116.130021.6853 29.9328 0.2547 24.664827.1564 0.0000 0.2955 0.295532.6275 29.9328 0.1958 34.971138.0985 119.7313 0.0294 123.512943.5696 269.3955 0.0001 269.0793

    49.0407 478.9254 0.0788 466.717154.5118 748.3209 0.2226 722.7299Totals 3292.6119 2.1781 3294.7901

    yi

    RSSR

    SS

    SSE

    SSYY YY

    2 1= =

    r a R= sgn( )1 2

    sSS a SS

    NYY XY =

    1

    2

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    31/40

    Modeling and Calibration 77

    (3.52)

    To compute the 95% confidence interval, we use a tdistribution with=N2 degrees of freedom. The two intervals are given by:

    (3.53)

    In the example we have been examining, s = 0.4919, SE(a1)= 0.009381, andSE(a0) = 0.2775. For a 95% confidence interval, we use t0.025 with = 9 or2.262 from Table A.3. The confidence intervals are i1 = 1.094 0.021 and i0 =-0.199 0.628.

    f Statistic

    Another parameter that is frequently computed with the model fitting pro-cedure is the f statistic. This computation will not tell the user whether amodel is correct in an absolute sense. It can be used to indicate whether amodel is adequate to explain the data, and it can be used to determine whetherone model fits the data better, worse, or about the same as another model.

    To perform the computation, we need the results of Equations (3.47) and(3.48). We also define kas the number of parameters in the fitting procedure.For a linear fit, k= 2, the fstatistic is computed from those results and thenumber of measurements, N, using:

    (3.54)

    In our example, f= 7558. Is this a good result? To make a decision, weneed an F-distribution table such as Tables A.4A through A.4C in the appen-dices. These tables list the critical values f(m,n) where gives the desireduncertainty level for the result. The decision rule is that if:

    (3.55)

    the model adequately describes the data at the 1- confidence level. In ourexample, if wanted to have 95% confidence that the model adequately fits

    SE as

    SS

    SE a sN

    x

    SS

    XX

    XX

    ( )

    ( )

    1

    0

    21

    =

    = +

    i a t SE a

    i a t SE a

    1 1 0 025 1

    0 0 0 025 0

    =

    =

    .

    .

    ( )

    ( )

    fSSR

    kSSE

    N k

    = +( )1

    f f k N k> + ( , )1

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    32/40

    78 Introduction to PCM Telemetering Systems, 2nd edition

    the data, then = 0.05. In the example, k= 2 and N k+ 1 = 10. The criticalvalue table listsf(2,10) = 4.1 so this model does a very good job of explainingthe data at a confidence level even better than 99%.

    Nonlinear Fits

    So far in this chapter, we have examined linear fits. In this section, we willextend the fitting procedure to nonlinear models. First, we will look atparametric models and then nonlinear polynomial models. The f statisticfrom the previous section can be used to determine the quality of the modelin both cases.

    Parametric Models

    The least squares method can be applied to more than linear equations.Suppose we had a model for a data set in the form:

    (3.56)

    The xis the position variable while Tis for temperature,His for humidity,

    and Pis for pressure. The mean square error equation is then:

    (3.57)

    The partial derivative of Equation (3.57) is then taken with respect to eachvariable as was done earlier. This leads to a system of equations that can besolved for the coefficients aj as was done in the linear equation case but this

    time with more coefficients.

    Power Series Models

    We can apply the derivation used for minimizing the mean square error tohigher order polynomial models as well. The general form for the model foreach output value, , at each input point, xi, will be:

    (3.58)

    As with the linear model, we take the partial derivative of the msewithrespect to each of the fit parameters, aj. This will yield a system of linearequations that can be solved for the coefficients. The general form for thematrix that must be solved is:

    y a a x a T a H a P= + + + +0 1 2 3 4

    mseN

    a a x a T a H a P yi i i i ii

    N

    = + + + + ( )=

    1 0 1 2 3 4 21

    yi

    y a xi j ij

    j

    M

    ==0

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    33/40

    Modeling and Calibration 79

    (3.59)

    For example, let us fit the data in Table 3.6. Figure 3.17 shows both thedata and the fit. The data appear to follow a second-order equation. Thenecessary set of equations would appear in matrix form:

    (3.60)

    The coefficients can be found by solving the matrix equation:

    (3.61)

    TABLE 3.6

    Data for Polynomial Fit

    x y

    0 3.0331 3.68052 5.58853 4.61924 3.61225 0.89976 2.92387 7.38468 13.34949 20.5442

    10 27.4637

    N x x x

    x x x x

    x x x x

    x x x x

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    = = =

    = = = =

    = = = =

    = = =

    1

    2

    1

    3

    1

    1

    2

    1

    3

    1

    4

    1

    2

    1

    3

    1

    4

    1

    5

    1

    3

    1

    4

    1

    5

    1

    6

    L

    L

    L

    ii

    N

    i

    i

    N

    i i

    i

    N

    i i

    i

    N

    i i

    i

    N

    a

    a

    a

    a

    y

    x y

    x y

    x y=

    =

    =

    =

    =

    =

    1

    0

    1

    2

    3

    1

    1

    2 2

    1

    3 2

    1

    L

    L L L L L

    L

    L

    N x x

    x x x

    x x x

    a

    a

    a

    y

    x y

    x

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i

    i

    N

    i i

    i

    N= =

    = = =

    = = =

    =

    =

    =

    1

    2

    1

    1

    2

    1

    3

    1

    2

    1

    3

    1

    4

    1

    0

    1

    2

    1

    1

    ii i

    i

    N

    y2 2

    1=

    a

    a

    a

    0

    1

    2

    111 55 385

    55 385 3025

    385 3025 25333

    50 2327

    587 9025

    5584 0176

    =

    .

    .

    .

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    34/40

    80 Introduction to PCM Telemetering Systems, 2nd edition

    This yields the following model for the data set, which is also used to

    generate the line in Figure 3.17:

    (3.62)

    Using this model and Equations (3.47), (3.43), and (3.49), the value for SSRis 1275.8 and SSYYis 1277.4 while R

    2is 0.999. Using Equation (3.48) we findthe SSEis 1.540 and using Equation (3.54), we find thatfis 2486. This valuegreatly exceeds the values in the table so this is a good model to fit the data.

    Cautions with Least Squares

    Like any other mathematical technique, the least squares method cannot beapplied blindly with the expectation that good results follow. This sectiondiscusses basic cautions to take with this technique.

    Model Selection

    The least squares method and its associated statistical analysis will not tellthe user whether one model is correct to the exclusion of all others. Thebest that the user can hope for is a statistical indicator that indicates atsome level of statistical confidence whether a model is consistent with thedata. If the user picks several incorrect models to investigate, it is possiblethat none will really fit the data. The method and the statistics will notfixthis problem.

    FIGURE 3.17Data (markers) and second-order least squares fit (line) to the data.

    Position

    0 2 4 6 8 10

    -30

    -25

    -20

    -15

    -10

    -5

    0

    5

    10

    Measurement

    . . .y x x= + 2 7243 2 2824 0 5344 2

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    35/40

    Modeling and Calibration 81

    The range of the model also needs to be correct. The model is not validoutside the data range. Therefore, greatly extending the range of the modelcould lead to a divergence between the model and where the underlying

    process is headed. This concept is illustrated in Figure 3.18.

    Outlying Points

    Suppose that the dataset has a single point that is corrupted by a largeamount of noise. The effect of the bad point will differ depending onwhere the point is located in the data set. Figure 3.19 illustrates the effectof a bad point at the upper edge and at the middle of the data set. The

    bad point at the upper edge pulls the fit toward the data set. However,when the bad point is located in the middle, the line is pulled a much

    smaller distance. This indicates that least squares is sensitive to outlying

    FIGURE 3.18Attempting to extend the least squares fit beyond the data range.

    FIGURE 3.19Effects of outlying points on least squares fit when an outlier occurs at the edge and the middle.

    F(x)

    x

    Fitting Region

    Fit Curve

    "True" Curve

    -2

    2

    6

    10

    14

    0 2 4 6 8 10

    Original Fit

    Modified Fit

    Outlying Point

    -2

    2

    6

    10

    14

    0 2 4 6 8 10

    Original Fit

    Modified Fit

    Outlying Point

    (a) Outlying Point at Edge of Region (b)Outlying Point at Middle of Region

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    36/40

    82 Introduction to PCM Telemetering Systems, 2nd edition

    data points and that outliers at the edges of the data set can have the

    greatest effect on the fit.

    Overfitting the Model

    Mathematics will allow fitting of Ndata points up to (N 1)-degree poly-nomial. However, if the user does not know of an underlying physical model,he or she may be tempted to try to fit the data with a higher order modelthan necessary. We can get a sense of this by looking at the SSE computation.Typically, the SSE will decrease with increasing model order until it reaches

    the point where it begins to flatten out as illustrated in Figure 3.20. Usingthefstatistic and comparing the SSEsbetween models will tell the user whenincreasing the order of the fit does not substantially improve the quality ofthe fit. Once the statistics tell the user that the model accounts for the data,it is probably best to stop increasing the order of the model unless there isan underlying physical reason for including more terms.

    References1. Couch, L.W., Digital and Analog Communication Systems, 6th ed., Prentice-Hall,

    Upper Saddle River, NJ, 2001.2. Fraden, J.,Handbook of Modern Sensors: Physics, Designs, and Applications, Amer-

    ican Institute of Physics, New York, 1997.

    FIGURE 3.20Change in SSE with fit order.

    0 1 2 3 4

    Order of Fit

    SSE

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    37/40

    Modeling and Calibration 83

    3. Gonick, L. and Smith, W., The Cartoon Guide to Statistics, Harper Perennial, NewYork, 1993.

    4. The IEEE Standard Dictionary of Electrical and Electronics Terms, 6th ed., Institute

    of Electrical and Electronics Engineers, Piscataway, NJ, 1997.5. Instrument Society of America, Instrument terminology and performance, in

    Process Measurement and Analysis, Liptk, B.G., Ed., Chilton, Radnor, PA, 1995.6. Klaassen, K.B., Electronic Measurement and Instrumentation, Cambridge Univer-

    sity Press, New York, 1996.7. Lathi, B.P., Modern Digital and Analog Communication Systems, 3rd ed., Oxford

    University Press, New York, 1998.8. Papoulis, A., Probability, Random Variables, and Stochastic Processes, McGraw Hill,

    New York, 1965.9. Rabinovich, S., Measurement Errors: Theory and Practice, American Institute of

    Physics, New York, 1995.

    Problems

    1. For each of the following error sources, comment on whetherthey would primarily affect precision, accuracy, or both about

    equally: systematic error, random error, hysteresis error, andinterference.

    2. Make a histogram of data with noise by using a standard analysispackage such as Excel, Mathcad, or MATLAB. Select a truevaluesuch as 10 and add random noise to it. Do this 100 times and plotthe results in a histogram. If Gaussian noise was added, does theresult look like a Gaussian? Experiment with different noise typesthat are available with your computer package.

    3. Given a set of measurements corrupted with Gaussian noise wherethe measurements have a mean value of 10 and a variance of 2,estimate the following probabilities:

    A. The measurement is between 9 and 11

    B. The measurement is between 8 and 12

    C. The measurement is between 5 and 15

    D. The measurement is greater than 16 or less than 4.

    4. Consider measuring a constant 5-V power supply with a noisy voltmeter. The meter adds zero-mean Gaussian noise with a varianceof 0.1 volt2to each measurement. What are the upper and the lowermeasurement limits such that the probability of finding the mea-surement outside of these limits is no more than 0.01?

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    38/40

    84 Introduction to PCM Telemetering Systems, 2nd edition

    5. For the following set of measurements, estimate the mean, thevariance, the estimated error in the mean, and the estimated uncer-tainty in the mean. Also find the 95% and 99% confidence intervals

    on the mean.

    6. If a system has a noise temperature of 290 K and a bandwidth of100 kHz, determine the noise spectral density and the variance of

    the process.

    7. Determine the necessary equations to perform a least squares fit tothe model:

    y(t) =acos(2f0t) + bsin(2f0t)

    Assume that the frequency f0is known and the constants aand bare to be found by the fit. How would you modify the procedureif the frequency were also unknown?

    8. Determine a model to fit the graph given in Figure P3.1. Assumethat the noise in the measurements is effectively zero at each point.

    7.894826 5.883171 7.659018 6.0456626.240702 8.185223 6.77516 7.1918457.293361 5.947835 8.317885 7.2950757.283844 7.902557 7.343442 6.6752035.455724 5.811363 7.297515 7.52899

    FIGURE P3.1Data plot for Problem 8.

    0 0.5 1 1.5 2

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    39/40

    Modeling and Calibration 85

    9. In an experimental data set with zero-mean Gaussian noise addedto each measurement, we have two strange points that appear tolie far from the rest of the data. The noise has a variance of 0.1. For

    the first of the two points, the distance between the mean of thedata and the point is 0.3 while for the second point, the distanceis 0.6. Would you discard either, both, or neither of these pointsand why did you make that determination?

    10. In the text, we developed a case to adequately explain the data inTable 3.1as a linear fit. Suppose we wished to use a third-order fitto the data just to see what happens. Compute the necessary coef-ficients a0, a1, a2, and a3. Compute the f value as well. Does thismodel adequately explain the data? How does it compare with the

    linear fit?

    11. A second-order fit was presented for the data in Table 3.6.Try thefit with a linear polynomial. Is the linear fit a better fit than thesecond-order fit? Explain your reasoning.

    12. Try fitting the data in Table 3.6 as a third-order polynomial. Is thethird-order fit a better fit than the second-order fit? Explain yourreasoning.

    13. A measurement set includes the following data. The data were

    collected at 10 settings and 10 points were collected at each setting.Each measurement has noise added but the noise variance can beassumed to be the same at each of the settings. With this data set:

    A. Find the mean, variance, error in the mean, uncertainty in themean, and 95% confidence interval for the mean value at eachof the ten settings.

    B. Using the mean values, perform first-, second-, third-, andfourth-order least squares fits to the data. Determine which fitis preferred using thef statistic.

    C. Plot your choice for the best fit to the data. Plot the residualsbetween the mean values and the model you have chosen.

    Use a computer-based analysis package to make the computationseasier.

    A. Setting X1= 0.5; voltage values: 0.626, 0.083, 0.139, 0.506, -0.580,0.071, 0.033, 0.124, 0.257, 0.309

    B. Setting X2= 1.025; voltage values: 0.831, 0.168, 0.007, 0.249, 0.929,

    0.822, 0.588, 0.611, 0.444, 0.173C. Setting X3= 1.875; voltage values: 0.911, 0.971, 0.820, 1.417, 0.413,

    1.332, 1.045, 1.253, -0.037, 0.309

    D. Setting X4= 3.425; voltage values: 2.003, 1.823, 1.272, 1.431, 2.152,1.751, 1.259, 1.186, 1.487, 1.492

    2002 by Taylor & Francis Group, LLC

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013

  • 8/12/2019 Modelado y Calibracin

    40/40

    86 Introduction to PCM Telemetering Systems, 2nd edition

    E. Setting X5= 6.050; voltage values: 2.221, 2.048, 1.330, 1.422, 2.338,2.110, 2.508, 1.595, 1.843, 2.163

    F. Setting X6 = 10.125; voltage values: 2.577, 2.591, 2.885, 3.078,2.993, 1.990, 2.151, 3.550, 2.320, 2.042

    G. Setting X7 = 16.025; voltage values: 2.867, 3.368, 2.911, 3.421,3.958, 3.204, 3.478, 2.892, 3.063, 2.559

    H. Setting X8 = 24.125; voltage values: 4.264, 3.892, 3.525, 3.316,3.539, 2.869, 3.257, 3.183, 3.643, 3.270

    I. Setting X9 = 34.800; voltage values: 3.655, 3.287, 4.684, 4.686,4.749, 4.724, 3.703, 4.253, 4.016, 3.170

    J. Setting X10 = 48.425; voltage values: 4.779, 4.189, 4.110, 3.834,

    5.024, 4.723, 4.444, 3.997, 4.475, 4.133

    Downloadedby

    [UniversidadIndustrialDe

    Santander]at07:2027Nov

    ember2013